Full recovery from a restart point #42

Benzoin96485 · 2024-10-07T20:15:18Z

We found that a perfect restart from a checkpoint is still not working. For example, loading a checkpoint of a trained model and continuing results in a significantly higher training loss than the last epoch in the last training. This may be fixed by a thorough check of what parameters are loaded, adding optimizer and scheduler state dicts, and distinguish the "restart" loading and "pretrain-finetune" loading

Benzoin96485 added this to Enerzyme! Oct 7, 2024

Benzoin96485 converted this from a draft issue Oct 7, 2024

Benzoin96485 linked a pull request Oct 10, 2024 that will close this issue

Optimizer, scheduler and model resume #43

Merged

Benzoin96485 closed this as completed Oct 30, 2024

github-project-automation bot moved this from In progress to Done in Enerzyme! Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full recovery from a restart point #42

Full recovery from a restart point #42

Benzoin96485 commented Oct 7, 2024

Full recovery from a restart point #42

Full recovery from a restart point #42

Comments

Benzoin96485 commented Oct 7, 2024