CHGNet scorer implementation #8

kianpu34593 · 2024-04-11T16:51:31Z

Hi,

I was bothered by the ZMQ option for connecting with an external scorer because it is kinda slow and unreliable on the supercomputer cluster. I have implemented a new scorer called CHGNetScorer in _scorer.py. This is designed for the case that 2 GPUs are available in the same node. It also works if the scorer is hosted on CPU. The idea is basically to setup two different devices: CrystaLLM on cuda and CHGNetScorer on cuda:1. Worker CPU is used for the output transfer.

For more details about CHGNet, please see here. From Matbench Discovery, CHGNet is a better ML model than ALIGNN. I'm more familiar with CHGNet than MACE. MACE is an even better option. That being said, I will implement a MACEScorer shortly.

I tested this feature using a LiFeF3 example. The truncated output is attached below:

--> python bin/mcts.py --config=template_6.yaml
Using configuration:
out_dir: crystallm_v1_large
temperature: 1.0
start: 'data_Li6Fe6F18

  '
seed: 1337
device: cuda
dtype: float32
compile: false
tree_width: 5
max_depth: 1000
c: 1.0
num_simulations: 1000
bond_length_acceptability_cutoff: 1.0
reward_k: 2.0
mcts_out_dir: Li6Fe6F18_mcts_cifs
scorer: CHGNet
scorer_host: localhost
scorer_port: 5555
use_context_sensitive_tree_builder: true
top_child_weight_cutoff: 0.99
selector: puct
n_space_groups: 0
bypass_only_child: false
n_rollouts: 1
scorer_device: cuda
chgnet_model_name: 0.3.0

number of parameters: 201.74M
CrystaLLM using: cuda
Pytorch Scorer using: cuda:1
CHGNET model name: 0.3.0
CHGNet v0.3.0 initialized with 412,525 parameters
CHGNet will run on cuda:1
performing 1000 simulations...
performing simulation 1...
/jet/home/jpu/projects/softwares/envs/crystalllm/lib/python3.10/site-packages/pymatgen/analysis/local_env.py:4148: UserWarning: No oxidation states specified on sites! For better results, set the site oxidation states in the structure.
  warnings.warn(
/jet/home/jpu/projects/softwares/envs/crystalllm/lib/python3.10/site-packages/pymatgen/analysis/local_env.py:3941: UserWarning: CrystalNN: cannot locate an appropriate radius, covalent or atomic radii will be used, this can lead to non-optimal results.
  warnings.warn(
invoking external scorer...
sending reply: -5.952406406402588
external scorer returned score: -5.952406406402588
computed reward: 0.5
CIF not written to file as it already exists: /jet/home/jpu/projects/projects/crystal_llm/mcts_example/Li6Fe6F18_mcts_cifs/generated_1.cif
performing simulation 2...

template_6.yaml

out_dir: ../crystallm_v1_large # path to the folder containing the model checkpoint file
temperature: 1.0  # 1.0 = no change, < 1.0 = less random, > 1.0 = more random, in predictions
start: "data_Li6Fe6F18\n"  # the prompt; can also specify a file, use as: "FILE:prompt.txt"
seed: 1337
device: cuda  # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1', etc.
dtype: float32  # 'float32' or 'bfloat16' or 'float16'
compile: False  # use PyTorch 2.0 to compile the model to be faster
tree_width: 5  # the tree width
max_depth: 1000  # the maximum depth of the tree
c: 1  # the selector constant: c_puct for PUCT, c for UCT, epsilon for greedy
num_simulations: 1000  # the number of simulations to perform during search
bond_length_acceptability_cutoff: 1.0
reward_k: 2.0  # the reward constant
mcts_out_dir: ../mcts_example/Li6Fe6F18_mcts_cifs  # path to the directory where generated CIF files will be stored
scorer: "CHGNet"  # supported values: 'zmq', 'random', "CHGNet"
scorer_host: localhost   # required if `scorer` is 'zmq'
scorer_port: 5555  # required if `scorer` is 'zmq'
use_context_sensitive_tree_builder: True
top_child_weight_cutoff:  0.99
selector: puct  # valid values: 'puct', 'uct', 'greedy'
n_space_groups: 0
bypass_only_child: False
n_rollouts: 1  # the number of rollouts to perform per simulation
scorer_device: "cuda"
chgnet_model_name: "0.3.0"

Oh, I also updated the code to support the latest Pytorch (v2024.3.1).

Please take a look! Open for discussions!

Best,
Kian

lantunes · 2024-04-13T13:30:30Z

Hi,

Thanks for creating this PR. The time and effort you put into developing the CHGNetScorer is greatly appreciated!

I understand your concerns about the current ZMQ-based approach. However, this project intentionally avoids dependencies on specific scorers like ALIGNN, CHGNet, and others. The core focus of this repository is on the development and enhancement of the CrystaLLM model itself, rather than on facilitating specific integrations with other models. Moreover, we want to maintain simplicity, and avoid the complexities of managing inter-dependencies and potential conflicts amongst various dependencies that might be introduced by incorporating these other models. Instead, we provide an interface for representing the scorer (CIFScorer), and a ZMQ implementation (ZMQScorer) that illustrates how two different processes (with different and even incompatible environments) might interoperate. We also provide an example script of how one might use ALIGNN in a separate process. The intention is that users will determine the best way to integrate scorers that works for them, as you have done.

While we might not merge this PR into the main project for the reasons stated above, we encourage you to maintain your fork with the CHGNetScorer implementation. Additionally, we can link to your fork in the documentation, and include an example script in the resources folder, as we did for ALIGNN (alignn_zmq_example.py). This way, users who need this specific functionality and are in similar hardware environments can benefit from your work.

Best regards,
Luis

update CHGNet typo

kianpu34593 and others added 5 commits April 9, 2024 14:28

update pymatgen to be compatible with 2024.03.01

11bca7a

add chgnet evaluator and using pytorch to connect differnt devices

5801d77

delete temp print during debug

ffdd132

chgnet implementation into mcts.py

667f39f

Merge branch 'lantunes:main' into main

3214274

Update README.md

a7029f8

update CHGNet typo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHGNet scorer implementation #8

CHGNet scorer implementation #8

kianpu34593 commented Apr 11, 2024

lantunes commented Apr 13, 2024

CHGNet scorer implementation #8

Are you sure you want to change the base?

CHGNet scorer implementation #8

Conversation

kianpu34593 commented Apr 11, 2024

lantunes commented Apr 13, 2024