AGOX

AGOX's modules are designed for usage of various structure search methods. Some basic modules are used everywhere, but do not need to be defined in the input script
- `Candidate`

: an extended ASE atoms object

Mandatory modules for an input script are:
- `Environment`

: define the chemical information of the search system
- Such information could be the number of atoms, element types, confinement_cell in which atoms are allowed to be placed
- It is possible to define a template or a seed here. For example, if you want to search Au20, you could provide a structure of Au13 as a template, can search the remaining 7 Au atoms.
- `Database`

: stores candidates/structures that have been evaluated during a search
- `Generator`

(or `Collector(Generator)`

): define how a new structure is generated/modified
- `Evaluator`

: define how good/bad the modified structure is
- Normally a lower total energy means a better structure

Mandatory submodules (not directly passing to `agox.run()`

function) for an input script are:
- `ASE Calculator`

: an ase calculator used by `Evaluator`

Optional modules for an input script are:
- `Sampler`

: define which structure to be processed by `Generator`

- `Collector`

: create a pool of structures based on `Generator`

- `Postprocessor`

: postprocess of generated structures from `Generator`

before evaluation by `Evaluator`

- `Acquisitor`

: select which structures should be evaluated

Optional submodules for an input script are:
- `Model`

: machine learning models such as neural network, gaussian process model.
- It could be used as inputs for `Acquisitor`

and `Postprocessor`

- It is normally attached to an `Database`

to get/update training data
- Training is triggered after `Evaluator`

and before the update of `Database`

via the `model.training_observer`

in a model.
- By default, the model uses all structures in the database for training.

The feature vector is based on the local density of element `Z`

around the central atom `i`

$$ \begin{array}{ccc} \rho _ i^Z(\lambda) = & \sum_{j \neq i, Z_j=Z} \dfrac{1}{\lambda}\text{exp}(-r_{ij}/\lambda) f_c(r_{ij}) \\ \\ f_c(r) = & \begin{cases} \dfrac{1}{2} \text{cos} (\pi \dfrac{r}{r_c}) + \dfrac{1}{2}, & r \le r_c \\ 0, & r > r_c \end{cases} \end{array} $$

where $\lambda$ is a hyperparameter and can have multiple values like 0.5Å, 1Å, 1.5Å, ...

Taking a silicate (Mg$_2$SiO$_4$)$_x$ system as an example, the whole feature vector is sorted by atomic numbers and is constructed as $$ \textbf{f}_i = [ \underbrace{\rho _ i^O(\lambda_1), \rho _ i^O(\lambda_2), \dots, \rho _ i^O(\lambda_n)} _ \text{Oxygen neighbors}, \underbrace{\rho _ i^{Mg}(\lambda_1), \rho _ i^{Mg}(\lambda_2), \dots, \rho _ i^{Mg}(\lambda_n)} _ \text{Magnesium neighbors}, \underbrace{\rho _ i^{Si}(\lambda_1), \rho _ i^{Si}(\lambda_2), \dots, \rho _ i^{Si}(\lambda_n)} _ \text{Silicon neighbors}, \underbrace{Z_i} _ \text{Atomic number}] $$

In practice, if the `feature_matrix`

has a shape of (n_atoms, n_species, n_lambs), it can be flatten as `feature_matrix.reshape(n_atoms, -1)`

. For example, if there are 3 species and 5 lambdas, the feature matrix of one atom [`O(1)`

means $\rho^O(\lambda_1)$]

O(1) O(2) O(3) O(4) O(5) Mg(1) Mg(2) Mg(3) Mg(4) Mg(5) Si(1) Si(2) Si(3) Si(4) Si(5)

will be reshaped into

O(1) O(2) O(3) O(4) O(5) Mg(1) Mg(2) Mg(3) Mg(4) Mg(5) Si(1) Si(2) Si(3) Si(4) Si(5)

Finally, the atomic numbers can be appended as the last column.