AGOX's modules are designed for usage of various structure search methods. Some basic modules are used everywhere, but do not need to be defined in the input script - Candidate: an extended ASE atoms object

Mandatory modules for an input script are: - Environment: define the chemical information of the search system - Such information could be the number of atoms, element types, confinement_cell in which atoms are allowed to be placed - It is possible to define a template or a seed here. For example, if you want to search Au20, you could provide a structure of Au13 as a template, can search the remaining 7 Au atoms. - Database: stores candidates/structures that have been evaluated during a search - Generator (or Collector(Generator)): define how a new structure is generated/modified - Evaluator: define how good/bad the modified structure is - Normally a lower total energy means a better structure

Mandatory submodules (not directly passing to function) for an input script are: - ASE Calculator: an ase calculator used by Evaluator

Optional modules for an input script are: - Sampler: define which structure to be processed by Generator - Collector: create a pool of structures based on Generator - Postprocessor: postprocess of generated structures from Generator before evaluation by Evaluator - Acquisitor: select which structures should be evaluated

Optional submodules for an input script are: - Model: machine learning models such as neural network, gaussian process model. - It could be used as inputs for Acquisitor and Postprocessor - It is normally attached to an Database to get/update training data - Training is triggered after Evaluator and before the update of Database via the model.training_observer in a model. - By default, the model uses all structures in the database for training.

Complementary energy

The feature vector is based on the local density of element Z around the central atom i

$$ \begin{array}{ccc} \rho _ i^Z(\lambda) = & \sum_{j \neq i, Z_j=Z} \dfrac{1}{\lambda}\text{exp}(-r_{ij}/\lambda) f_c(r_{ij}) \\ \\ f_c(r) = & \begin{cases} \dfrac{1}{2} \text{cos} (\pi \dfrac{r}{r_c}) + \dfrac{1}{2}, & r \le r_c \\ 0, & r > r_c \end{cases} \end{array} $$

where $\lambda$ is a hyperparameter and can have multiple values like 0.5Å, 1Å, 1.5Å, ...

Taking a silicate (Mg$_2$SiO$_4$)$_x$ system as an example, the whole feature vector is sorted by atomic numbers and is constructed as $$ \textbf{f}_i = [ \underbrace{\rho _ i^O(\lambda_1), \rho _ i^O(\lambda_2), \dots, \rho _ i^O(\lambda_n)} _ \text{Oxygen neighbors}, \underbrace{\rho _ i^{Mg}(\lambda_1), \rho _ i^{Mg}(\lambda_2), \dots, \rho _ i^{Mg}(\lambda_n)} _ \text{Magnesium neighbors}, \underbrace{\rho _ i^{Si}(\lambda_1), \rho _ i^{Si}(\lambda_2), \dots, \rho _ i^{Si}(\lambda_n)} _ \text{Silicon neighbors}, \underbrace{Z_i} _ \text{Atomic number}] $$

In practice, if the feature_matrix has a shape of (n_atoms, n_species, n_lambs), it can be flatten as feature_matrix.reshape(n_atoms, -1). For example, if there are 3 species and 5 lambdas, the feature matrix of one atom [O(1) means $\rho^O(\lambda_1)$]

O(1)    O(2)    O(3)   O(4)   O(5)
Mg(1)   Mg(2)   Mg(3)  Mg(4)  Mg(5)
Si(1)   Si(2)   Si(3)  Si(4)  Si(5)

will be reshaped into

O(1)    O(2)    O(3)   O(4)   O(5)  Mg(1)   Mg(2)   Mg(3)  Mg(4)  Mg(5)   Si(1)   Si(2)   Si(3)  Si(4)  Si(5)

Finally, the atomic numbers can be appended as the last column.