Skip to content

Fair and transparent benchmark of machine-learned interatomic potentials (MLIPs), beyond basic error metrics

License

Notifications You must be signed in to change notification settings

atomind-ai/mlip-arena

Repository files navigation

MLIP Arena

DOI Hugging Face

Caution

MLIP Arena is currently in pre-alpha. The results are not stable. Please intepret them with care.

Note

Contributions of new tasks are very welcome! If you're interested in joining the effort, please reach out to Yuan at cyrusyc@berkeley.edu. See project page for some outstanding tasks, or propose new one in Discussion.

MLIP Arena is a platform for evaluating foundation machine learning interatomic potentials (MLIPs) beyond conventional energy and force error metrics. It focuses on revealing the underlying physics and chemistry learned by these models and assessing their performance in molecular dynamics (MD) simulations. The platform's benchmarks are specifically designed to evaluate the readiness and reliability of open-source, open-weight models in accurately reproducing both qualitative and quantitative behaviors of atomic systems.

Installation

From PyPI (without model running capability)

pip install mlip-arena

From source

git clone https://github.com/atomind-ai/mlip-arena.git
cd mlip-arena
pip install torch==2.2.0
bash scripts/install-pyg.sh
bash scripts/install-dgl.sh
pip install -e .[test]
pip install -e .[mace]
# DeePMD
DP_ENABLE_TENSORFLOW=0 pip install -e .[deepmd]

Contribute

MLIP Arena is now in pre-alpha. If you're interested in joining the effort, please reach out to Yuan at cyrusyc@berkeley.edu.

Development

streamlit run serve/app.py

Add new benchmark tasks (WIP)

Note

Please reuse or extend the general tasks defined as Prefect / Atomate2 / Quacc workflow. The following are some tasks implemented:

  1. Follow the task template to implement the task class and upload the script along with metadata to the MLIP Arena here.
  2. Code a benchmark script to evaluate the performance of your model on the task. The script should be able to load the model and the dataset, and output the evaluation metrics.

Add new MLIP models

If you have pretrained MLIP models that you would like to contribute to the MLIP Arena and show benchmark in real-time, there are two ways:

External ASE Calculator (easy)

  1. Implement new ASE Calculator class in mlip_arena/models/externals.
  2. Name your class with awesome model name and add the same name to registry with metadata.

Caution

Remove unneccessary outputs under results class attributes to avoid error for MD simulations. Please refer to other class definition for example.

Hugging Face Model (recommended, difficult)

  1. Inherit Hugging Face ModelHubMixin class to your awesome model class definition. We recommend PytorchModelHubMixin.
  2. Create a new Hugging Face Model repository and upload the model file using push_to_hub function.
  3. Follow the template to code the I/O interface for your model here.
  4. Update model registry with metadata

Note

CPU benchmarking will be performed automatically. Due to the limited amount GPU compute, if you would like to be considered for GPU benchmarking, please create a pull request to demonstrate the offline performance of your model (published paper or preprint). We will review and select the models to be benchmarked on GPU.

Add new datasets

The "ultimate" goal is to compile the copies of all the open data in a unified format for lifelong learning with Hugging Face Auto-Train.

  1. Create a new Hugging Face Dataset repository and upload the reference data (e.g. DFT, AIMD, experimental measurements such as RDF).

Single-point density functional theory calculations

Molecular dynamics calculations