Load best model instead of last one #142

jsalotti · 2021-12-10T13:29:29Z

Problem

Models trained with pl_helper.run_pl_training with option --save are saved with a pytorch lightning callback (ModelCheckpoint).
This callback is configured to save:

the top_k best models using the following naming convention : "{epoch}-{step}.ckpt"
the last model : "last.ckpt"

When loading a trained model with pl_helper.load_training with option --run_id, the checkpoint file used to load the model is "last.ckpt". This means the weights used are the weights obtained after last training iteration, and not the weights of the best model.

Proposition

It would be good to add a parameter in pl_helper.load_training to select between last and best model.
For this we need to :

change the naming convention in ModelCheckpoint, in order to add the val_loss value in the checkpoint file name.
when loading the model: parse the filenames in the checkpoint folder corresponding to the run_id, and keep the one with the best value.
important we need to be sure of what we mean by best value (is it possible that for some models, a greater val_loss is better, or is it impossible with ModelCheckpoint ?)

The text was updated successfully, but these errors were encountered:

jsalotti added the alonet label Dec 10, 2021

Johansmm mentioned this issue Feb 18, 2022

142: Load best model instead of last one #151

Merged

thibo73800 closed this as completed in #151 Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load best model instead of last one #142

Load best model instead of last one #142

jsalotti commented Dec 10, 2021

Load best model instead of last one #142

Load best model instead of last one #142

Comments

jsalotti commented Dec 10, 2021

Problem

Proposition