Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full recovery from a restart point #42

Closed
Benzoin96485 opened this issue Oct 7, 2024 · 0 comments · Fixed by #43
Closed

Full recovery from a restart point #42

Benzoin96485 opened this issue Oct 7, 2024 · 0 comments · Fixed by #43

Comments

@Benzoin96485
Copy link
Owner

We found that a perfect restart from a checkpoint is still not working. For example, loading a checkpoint of a trained model and continuing results in a significantly higher training loss than the last epoch in the last training. This may be fixed by a thorough check of what parameters are loaded, adding optimizer and scheduler state dicts, and distinguish the "restart" loading and "pretrain-finetune" loading

@Benzoin96485 Benzoin96485 converted this from a draft issue Oct 7, 2024
@Benzoin96485 Benzoin96485 linked a pull request Oct 10, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from In progress to Done in Enerzyme! Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant