Skip to content

Commit

Permalink
Merge pull request #317 from mir-group/cjo
Browse files Browse the repository at this point in the history
Cjo
  • Loading branch information
YuuuXie authored Aug 31, 2022
2 parents 0315248 + af6794d commit 2ec05b3
Showing 1 changed file with 90 additions and 86 deletions.
176 changes: 90 additions & 86 deletions docs/source/faqs.rst
Original file line number Diff line number Diff line change
@@ -1,121 +1,125 @@
Frequently Asked Questions
==========================

This page is designed to help users troubleshoot commonly encountered questions when performing tasks in the FLARE framework.


Installation and Packages
-------------------------

1. Can I accelerate my calculation using parallelization?
See the section in the `Installation <https://flare.readthedocs.io/en/latest/install.html#acceleration-with-multiprocessing-and-mkl>`_ section.
1. What do I do if I encounter an mkl error following installation?
* Make sure you load the mkl module, or do ``conda install -y mkl_fft``. Then please check ``pip list`` and ``conda list`` to see if you have two versions of numpy installed by ``pip`` and ``conda``. If so, try uninstalling the ``pip`` numpy.
* Otherwise, reverting to version 1.18 might also fix this error.

2. Github Action fails due to Github Pages?
For developers, if your Github Action build fails from an issue related to ``gh-pages`` in the step of "Publish the docs", this may be because you did not pull the latest code before push. Please try pull and push again. If it still does not work, check `this <https://gist.github.com/mandiwise/44d1edce18f2ffb14f63>`_.

Gaussian Processes
------------------


1. I'm confused about how Gaussian Processes work.
1. How do Gaussian Processes work?
Gaussian Processes enjoy a long history of study and there are many excellent resources out there we can recommend.
One such resource (that some of the authors consult quite frequently!) is the textbook
`Gaussian Processes for Machine Learning, by Rasmussen and Williams <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_
, with Chapter 2 in particular being a great help.

2. How should I choose my cutoff(s)?
* The right cutoff depends on the system you're studying: ionic systems often do better with a larger cutoff,
while dense systems like diamond can employ smaller cutoffs.

2. How should I choose my cutoffs?
* The right cutoff depends on the system you're studying: ionic systems often do better with a larger 2-body cutoff,
while dense systems like diamond require smaller cutoffs.
* We recommend that you refer to the radial distribution function of your atomic system first, then try a range of cutoff
values and examine the model error, optimized hyperparameters, and model likelihood as a function of the cutoff(s).

* We recommend you to refer to the radial distribution function of your atomic system first, then try a range of cutoff
values and examine the model error, optimized noise parameter, and model likelihood as a function of the cutoff.
* Keep in mind that the model cutoff is intimately coupled with the radial and angular bases of the model. So, we recommend that
model cutoff(s) be tested with varying n_max and l_max for the ACE descriptors. Generally larger cutoff requires larger n_max.

* In the current implementation, the 3-body cutoff needs to be smaller than 2-body cutoff.
* For multi-component systems, a cutoff_matrix can be set with explicit cutoffs for each inter-species interaction (e.g.,
``cutoff_matrix: [[cutoff_11, cutoff_12],[cutoff_21, cutoff_22]]`` for species 1 and 2), otherwise the matrix will populate with the values of the maximum cutoff listed in the input file.

3. What is a good strategy for hyperparameter optimization?
The hyperparameter optimization is important for obtaining a good model.
However, the optimization of hyperparameters will get slower when more training data are collected.
There are a few parameters to notice:

In `GaussianProcess`,

* ``maxiter``: maximal number of iterations, usually set to ~10 to prevent training for too long.
OTF (On-the-fly) Active-Learning
-------------------------

* ``parallel``: if `True`, then parallelization is used in optimization.
The serial version could be very slow.
1. What is a good strategy for hyperparameter optimization?
Hyperparameter optimization is important for obtaining a good model and is an important step in the case of a bad choice of priors.
However, the optimization will get slower when more training data are collected, so some tricks may be needed to provide a good model while minimizing training time.
There are a few parameters to notice:

* ``output`` (in `train` function): set up an output file for monitoring optimization iterations.
* ``maxiter`` : maximal number of iterations, usually set to ~20 to prevent training for too long. However, if hyperparameter training is unstable,
raising this number can help if the model is not converged within a smaller number of iterations.

* ``grad_tol``, ``x_tol``, ``line_steps`` (in `train` function): can be changed for iterations.
* ``train_hyps`` : range of DFT calls whererin the hyperparameters will be optimized. We recommend setting the initial value to 10 (i.e., 10th DFT call),
since we have observed more stable training, as opposed to training from the initial DFT call.

There are a few tips in the OTF training, see below.

* ``std_tolerance_factor`` : DFT will be called when the predicted uncertainty is above this threshold,
which is defined relative to the mean uncertainty in the system. The default value is 1. In general, we recommend that this value be set relative to the number
of species in the system (e.g., -0.01 for 1 species, -0.05 for 2, -0.1 for 3, etc.). If more DFT calls are desired, you can set it to a lower value.

* ``update_threshold`` : atoms will only be added to the sparse set of the Gaussian Process when their uncertainty surpasses this threshold. We have found that this
value provides a decent number of sparse environment `additions` when set to be 0.1*std_tolerance_factor. This ensures that several atoms are added to the sparse-set of the
Gaussian Process for every DFT call. If this value is set to be closer to the std_tolerance_factor, it may be the case where only 1 atomic environment is added for each DFT call,
which is inefficient depending on the DFT complexity.

2. How (why) should a small perturbation be included for the initial structure?
If you are starting from a perfect lattice, we recommend adding small random perturbations to the atomic positions,
such that the symmetry of the crystal lattice is broken. This is accomplished using the `jitter` flag in the `yaml` script, in the units of angstrom.
The reason is that the perfect lattice is highly symmetric, thus usually the force on each atom is zero, and the local
environments all look the same. Adding these highly similar environments with close-to-zero forces might raise numerical
stability issues for GP.

OTF (On-the-fly) Training
-------------------------
3. Why is the temperature of the simulation unreasonably high?
* This is the signal of a high-energy configuration being used to start active-learning. Try relaxing the structure before initializing the active-learning trajectory so that your
initial structure has atoms in local energy minima. High energy initial structures can yield high forces, leading to instability in the temperature and velocities of the atoms.
* If you are simulating with a high temperature or light atoms, you can try reducing the MD timestep to enhance stability.

1. What is a good strategy for hyperparameter optimization?
* ``freeze_hyps`` : the hyperparameter will only be optimized for `freeze_hyps` times.
Can be set to a small number if optimization is too expensive with a large data set.

* ``std_tolerance_factor`` : the DFT will be called when the predicted uncertainty is above the threshold,
which is defined as `std_tolerance_factor * gp_hyps_noise`. The default value is 1. In general, you
can set it to O(0.1)-O(1). If more DFT calls are desired, you can set it to a lower value.

2. How to set initial temperatures and rescale temperatures?
* For initial temperature, if you are using NVE ensemble, and starting from a perfect crystal lattice,
please set the initial temperature to be twice the value you want the system to be in equilibrium. E.g.,
if you want the system to equilibrate at 1000K, set the initial temperature to 2000K starting from a
perfect lattice. To do this,
* if you are using our OTF trainer without ASE, you can set the ``prev_positions`` of the initial
structure to be the perfect lattice shifted by one step with the velocity corresponding to the
initial temperature.
* if you are using OTF with ASE + VelocityVerlet, you can set the initial velocity as shown in our
tutorial

* Similarly, if you want to rescale the system from 300K to 500K at step 1000, you should set the resaling
temperature to be higher than 500K, e.g.

.. code-block:: python
rescale_temp = [800]
rescale_step = [1000]
The reason is that we only rescale the velocity of the atoms at the steps specified in ``rescale_step``,
at those steps not in the list, we will let the system evolve by itself. Thus, after step 1000, the system's
temperature will gradually equilibrate at a lower temperature.

3. Include small perturbation for initial structure
If you are starting from a perfect lattice, we recommend adding small random perturbations to the atomic positions,
such that the symmetry of the crystal lattice is broken. E.g.
4. How do I know that my active-learning trajectory is "good"?
It is important to do some analysis of your active-learning trajectories both while they are running and once they are completed. We recommend that you keep an eye on the system parameters,
e.g. temperature, pressure, or the radial distribution function. In addition to these system specific markers, we also recommend keeping an eye on the hyperparameters, and making sure that they
make sense numerically.

5. When should I stop my active-learning trajectory?
Active-learning can be ceased when the number of DFT calls becomes sparse as a function of timestep. The MAE values for energy, forces, and stresses can also indicate when a model has approached a given
threshold in accuracy. If the number of DFT calls remains low throughout the entire trajectory, try altering the conditions under which the system performs MD (e.g., temperature or pressure) or decrease
the ``std_tolerance_factor`` so that more DFT calls will be made.

6. What happens if I get ``AssertionError`` from ``assert np.allclose(lmp_energy, gp_energy)``?
This error can appear when using ``PyLAMMPS`` for training on-the-fly with LAMMPS MD. FLARE does a sanity check to make sure LAMMPS energy and GP energy are the same.
This error means their disagreement is not small enough, which might result from unphysical structure, temperature explosion, or unreasonable hyperparameters.
You can try relaxing the initial structure, reducing the timestep, or increasing the lower bound of ``train_hyps``.


Offline-Learning
----

.. code-block:: python
1. Why is my offline training selecting so few sparse environments?
We have found that it is helpful to reduce the `std_tolerance_factor` below that of what is typically used for active-learning when training a final model with offline learning.
This is fine, since all of the sparse environments being selected are from DFT calculated frames. It is also helpful to track the likelihood and hyperparameters when reducing this value
in order to select an appropriate model.

positions = positions + 0.01 * (2 * np.random.rand(len(positions), 3) - 1)
2. How do I know that my offline-trained model is "good"?
Several markers can be used to evaluate the success of your offline training. Most immediate is the evaluation of errors as assessed throughout training on the DFT frames being used. Also immediately available
are the hyperparameters, which are based in physical units and should make sense numerically (energy, force, and stress noises relative to the the actual energy, force, and stress lables). The user can also
generate more in-depth analyses, e.g., parity plots of energies, forces, and stresses.

The reason is that the perfect lattice is highly symmetric, thus usually the force on each atom is zero, and the local
environments all look the same. Adding these highly similar environments with close-to-zero forces might raise numerical
stability issue for GP.

GPFA

Production MD Simulations using a FLARE ML-FF
----

1. My models are adding too many atoms from each frame, causing a serious slowdown without much gain in model accuracy.
In order to 'govern' the rate at which the model adds atoms, we suggest using the ``pre_train_atoms_per_element`` and
``train_atoms_per_element`` arguments, which can limit the number of atoms added from each seed frame and training frame respectively.
You can pass in a dictionary like ``{'H':1, 'Cu':2}`` to limit the number of H atoms to 1 and Cu atoms to 2 from any given frame.
You can also use ``max_atoms_per_frame`` for the same functionality.
2. The uncertainty seems low on my force predictions, but the true errors in the forces are high.
This could be happening for a few reasons. One reason could be that your hyperparameters aren't at an optimum (check that the gradient of
the likelihood with respect to the hyperparameters is small). Another is that your model, such as 2-body or 2+3 body, may not be of sufficient
complexity to handle the system (in other words, many-body effects could be important).

MGP
---
1. How does the grid number affect my mapping?
* The lower cutoff is better set to be a bit smaller than the minimal interatomic distance.
* The upper cutoff should be consistent with GP's cutoff.
* For three-body, the grid is 3-D, with lower cutoffs `[a, a, a]` and upper cutoffs `[b, b, b]`.
* You can try different grid numbers and compare the force prediction of MGP and GP
on the same testing structure. Choose the grid number of satisfying efficiency and accuracy.
A reference is `grid_num=64` should be safe for `a=2.5`, `b=5`.
1. Which MD engines is FLARE compatible?
We commonly employ our trained FLARE models in LAMMPs and the ASE md engines.

2. How do I know that my model is performing well?
Without diving into system-specific benchmarks that can be done, we recommend using the uncertainty quantification capabilities of FLARE to determine whether your MD simulation is operating within the domains of the
training set. Example scripts for the quantification of uncertianty can be found elsewhere in this repository.

3. Why is my simulation misbehaving?
Several parameters can influence the success of the MD simulations that are run after building your FLARE model. It is important to first check that the species match the order that is present in the
lammps coefficient file, and that their masses are assigned appropriately.
- If non-physical environments appear in your simulation (either by visual inspection or via uncertainty analysis), several tricks can be implemented to fix this.
(1) try reducing the timestep. An aggressive timestep can lead to errors in integration and prompt unphysical environments to appear.
(2) toggle the thermostat damping factor (specific to the MD engine being used).
(3) make sure that the initial structure is reasonable and not unreasonably high in energy or does not have high forces. (related to next point)

- If the temperature of the simulation is unreasonably high upon initialization:
(1) try relaxing the structure using built-in methods (e.g., conjugate gradient descent in LAMMPS) so that your initial structure has atoms in local energy minima. High energy initial structures
can yield high forces, leading to temperature increasing drastically.

0 comments on commit 2ec05b3

Please sign in to comment.