AGOX's modules are designed for usage of various structure search methods. Some basic modules are used everywhere, but do not need to be defined in the input script
- Candidate
: an extended ASE atoms object
Mandatory modules for an input script are:
- Environment
: define the chemical information of the search system
- Such information could be the number of atoms, element types, confinement_cell in which atoms are allowed to be placed
- It is possible to define a template or a seed here. For example, if you want to search Au20, you could provide a structure of Au13 as a template, can search the remaining 7 Au atoms.
- Database
: stores candidates/structures that have been evaluated during a search
- Generator
(or Collector(Generator)
): define how a new structure is generated/modified
- Evaluator
: define how good/bad the modified structure is
- Normally a lower total energy means a better structure
Mandatory submodules (not directly passing to agox.run()
function) for an input script are:
- ASE Calculator
: an ase calculator used by Evaluator
Optional modules for an input script are:
- Sampler
: define which structure to be processed by Generator
- Collector
: create a pool of structures based on Generator
- Postprocessor
: postprocess of generated structures from Generator
before evaluation by Evaluator
- Acquisitor
: select which structures should be evaluated
Optional submodules for an input script are:
- Model
: machine learning models such as neural network, gaussian process model.
- It could be used as inputs for Acquisitor
and Postprocessor
- It is normally attached to an Database
to get/update training data
- Training is triggered after Evaluator
and before the update of Database
via the model.training_observer
in a model.
- By default, the model uses all structures in the database for training.
The feature vector is based on the local density of element Z
around the central atom i
$$ \begin{array}{ccc} \rho _ i^Z(\lambda) = & \sum_{j \neq i, Z_j=Z} \dfrac{1}{\lambda}\text{exp}(-r_{ij}/\lambda) f_c(r_{ij}) \\ \\ f_c(r) = & \begin{cases} \dfrac{1}{2} \text{cos} (\pi \dfrac{r}{r_c}) + \dfrac{1}{2}, & r \le r_c \\ 0, & r > r_c \end{cases} \end{array} $$
where $\lambda$ is a hyperparameter and can have multiple values like 0.5Å, 1Å, 1.5Å, ...
Taking a silicate (Mg$_2$SiO$_4$)$_x$ system as an example, the whole feature vector is sorted by atomic numbers and is constructed as $$ \textbf{f}_i = [ \underbrace{\rho _ i^O(\lambda_1), \rho _ i^O(\lambda_2), \dots, \rho _ i^O(\lambda_n)} _ \text{Oxygen neighbors}, \underbrace{\rho _ i^{Mg}(\lambda_1), \rho _ i^{Mg}(\lambda_2), \dots, \rho _ i^{Mg}(\lambda_n)} _ \text{Magnesium neighbors}, \underbrace{\rho _ i^{Si}(\lambda_1), \rho _ i^{Si}(\lambda_2), \dots, \rho _ i^{Si}(\lambda_n)} _ \text{Silicon neighbors}, \underbrace{Z_i} _ \text{Atomic number}] $$
In practice, if the feature_matrix
has a shape of (n_atoms, n_species, n_lambs), it can be flatten as feature_matrix.reshape(n_atoms, -1)
. For example, if there are 3 species and 5 lambdas, the feature matrix of one atom [O(1)
means $\rho^O(\lambda_1)$]
O(1) O(2) O(3) O(4) O(5) Mg(1) Mg(2) Mg(3) Mg(4) Mg(5) Si(1) Si(2) Si(3) Si(4) Si(5)
will be reshaped into
O(1) O(2) O(3) O(4) O(5) Mg(1) Mg(2) Mg(3) Mg(4) Mg(5) Si(1) Si(2) Si(3) Si(4) Si(5)
Finally, the atomic numbers can be appended as the last column.