❓ [QUESTION] Restart run #343

IZugec · 2023-06-03T16:29:07Z

Hello,

I have a situation in which I have really huge dataset so much so that even with multiprocessing it still takes day and a half/two days to preprocess it. Now, it happened that due to the unexpected crash on the node I would like to continue training starting from the best_model.pth weights. However I would really like to avoid processing this huge dataset again.

I tried both initial_model_state / initialize_from_state and load_model_state / load_model_state

however, when I started training initial model the key for append was false so now when I try to put it to false the error is

Traceback (most recent call last):
File "/home/user/.conda/envs/nequip_stress/bin/nequip-train", line 8, in
sys.exit(main())
File "/home/user/.conda/envs/nequip_stress/lib/python3.10/site-packages/nequip/scripts/train.py", line 65, in main
raise RuntimeError(
RuntimeError: Training instance exists at /path_to_traning_dir; either set append to True or use a different root or runname

However when I start it with append equal to true I get following error

Traceback (most recent call last):
File "/home/user/.conda/envs/nequip_stress/bin/nequip-train", line 8, in
sys.exit(main())
File "/home/user/.conda/envs/nequip_stress/lib/python3.10/site-packages/nequip/scripts/train.py", line 74, in main
trainer = restart(config)
File "/home/user/.conda/envs/nequip_stress/lib/python3.10/site-packages/nequip/scripts/train.py", line 220, in restart
raise ValueError(
ValueError: Key "append" is different in config and the result trainer.pth file. Please double check

I guess the question is if there is a way to pass already processed dataset along with model state?

Thanks in advance on any advice,
Ivan

Linux-cpp-lisp · 2023-06-05T18:21:27Z

Hi @IZugec ,

I tried both initial_model_state / initialize_from_state and load_model_state / load_model_state

This will be the easiest way forward, and will load the cached processed dataset unless something goes wrong. I think there should be a full discussion of how to do this here--- you want initialize_from_state and a new run name:

#235

IZugec added the question Further information is requested label Jun 3, 2023

IZugec closed this as completed Jun 6, 2023

pablo-unzueta mentioned this issue Jun 27, 2023

🐛 [BUG] Cannot restart run with different dataset #349

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ [QUESTION] Restart run #343

❓ [QUESTION] Restart run #343

IZugec commented Jun 3, 2023 •

edited

Loading

Linux-cpp-lisp commented Jun 5, 2023 •

edited

Loading

❓ [QUESTION] Restart run #343

❓ [QUESTION] Restart run #343

Comments

IZugec commented Jun 3, 2023 • edited Loading

Linux-cpp-lisp commented Jun 5, 2023 • edited Loading

IZugec commented Jun 3, 2023 •

edited

Loading

Linux-cpp-lisp commented Jun 5, 2023 •

edited

Loading