Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load best model instead of last one #142

Closed
jsalotti opened this issue Dec 10, 2021 · 0 comments · Fixed by #151
Closed

Load best model instead of last one #142

jsalotti opened this issue Dec 10, 2021 · 0 comments · Fixed by #151
Labels

Comments

@jsalotti
Copy link
Contributor

Problem

Models trained with pl_helper.run_pl_training with option --save are saved with a pytorch lightning callback (ModelCheckpoint).
This callback is configured to save:

  • the top_k best models using the following naming convention : "{epoch}-{step}.ckpt"
  • the last model : "last.ckpt"

When loading a trained model with pl_helper.load_training with option --run_id, the checkpoint file used to load the model is "last.ckpt". This means the weights used are the weights obtained after last training iteration, and not the weights of the best model.

Proposition

It would be good to add a parameter in pl_helper.load_training to select between last and best model.
For this we need to :

  • change the naming convention in ModelCheckpoint, in order to add the val_loss value in the checkpoint file name.
  • when loading the model: parse the filenames in the checkpoint folder corresponding to the run_id, and keep the one with the best value.
    important we need to be sure of what we mean by best value (is it possible that for some models, a greater val_loss is better, or is it impossible with ModelCheckpoint ?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant