Fix/ loading metrics and loss in load_from_checkpoint #1759

madtoinou · 2023-05-11T07:58:26Z

Summary

Since loss_fn and torch_metrics are not saved in PLForecastingModule checkpoints, they must be re-created using the model.model_params values so that the training continue with the proper loss (and continue to report the desired torch metrics).

Other Information

Added the corresponding unittests

…ding_from_checkpoint()

dennisbader

Thanks @madtoinou for this, Looks good :)

I would be interested to see if we can let PL handle the saving/loading of these parameters by adapting PLForecastingModule.on_save_checkpoint and PLForecastingModule.on_load_checkpoint.

darts/models/forecasting/pl_forecasting_module.py

codecov-commenter · 2023-05-15T07:42:39Z

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.13 ⚠️

Comparison is base (1efb1f8) 94.19% compared to head (37956ac) 94.06%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1759      +/-   ##
==========================================
- Coverage   94.19%   94.06%   -0.13%     
==========================================
  Files         125      125              
  Lines       11505    11495      -10     
==========================================
- Hits        10837    10813      -24     
- Misses        668      682      +14

Impacted Files	Coverage Δ
...arts/models/forecasting/torch_forecasting_model.py	`90.15% <ø> (-0.21%)`	⬇️
darts/models/forecasting/pl_forecasting_module.py	`93.98% <100.00%> (+0.09%)`	⬆️

... and 10 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

…_checkpoint()

dennisbader

Awesome that it worked with the checkpointing :)
Actually when you mentioned that we ignore loss_fn, and torch_metrics when saving the hyperparameters, I tested if we can achieve the same thing by removing the ignore, and it works :) I left a comment.

After this change we can merge 🚀

dennisbader · 2023-05-19T13:17:00Z

darts/models/forecasting/pl_forecasting_module.py


    def on_load_checkpoint(self, checkpoint: Dict[str, Any]) -> None:
        # by default our models are initialized as float32. For other dtypes, we need to cast to the correct precision
        # before parameters are loaded by PyTorch-Lightning
        dtype = checkpoint["model_dtype"]
        self.to_dtype(dtype)

+        # restoring attributes necessary to resume from training properly


btw I just saw that we don't load the "train_sample_shape" from checkpoint. I think we should add this here as well, right?

I checked, it's already loaded when calling load_weights_from_checkpoint(). My guess is that since it's one of the constructor argument and that it does not require any processing, the de-serializing of the checkpoint by Pytorch Lightning does the job.

darts/models/forecasting/pl_forecasting_module.py

darts/tests/models/forecasting/test_torch_forecasting_model.py

… the constructor

dennisbader

Very nice, looks great! Thanks a lot @madtoinou 💯 🚀

* fix: loss_fn and torch_metrics are properly restored when calling laoding_from_checkpoint() * fix: moved fix to the PL on_save/on_load methods instead of load_from_checkpoint() * fix: address reviewer comments, loss and metrics objects are saved in the constructor * update changelog --------- Co-authored-by: Dennis Bader <dennis.bader@gmx.ch>

fix: loss_fn and torch_metrics are properly restored when calling lao…

3a24453

…ding_from_checkpoint()

madtoinou requested review from hrzn and dennisbader as code owners May 11, 2023 07:58

dennisbader requested changes May 12, 2023

View reviewed changes

darts/models/forecasting/pl_forecasting_module.py Show resolved Hide resolved

Merge branch 'master' into fix/load_loss_ckpt

fefe551

madtoinou and others added 3 commits May 15, 2023 09:58

fix: moved fix to the PL on_save/on_load methods instead of load_from…

e6f8aca

…_checkpoint()

Merge branch 'master' into fix/load_loss_ckpt

50b4e3b

Merge branch 'master' into fix/load_loss_ckpt

1c8a4a6

dennisbader requested changes May 19, 2023

View reviewed changes

dennisbader and others added 3 commits May 19, 2023 15:30

Merge branch 'master' into fix/load_loss_ckpt

27ef820

fix: address reviewer comments, loss and metrics objects are saved in…

a61db24

… the constructor

Merge branch 'master' into fix/load_loss_ckpt

37956ac

dennisbader approved these changes May 23, 2023

View reviewed changes

dennisbader added 2 commits May 23, 2023 10:03

update changelog

229572b

Merge branch 'master' into fix/load_loss_ckpt

1215990

dennisbader merged commit 31da6d3 into master May 23, 2023

dennisbader deleted the fix/load_loss_ckpt branch May 23, 2023 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/ loading metrics and loss in load_from_checkpoint #1759

Fix/ loading metrics and loss in load_from_checkpoint #1759

madtoinou commented May 11, 2023

dennisbader left a comment

codecov-commenter commented May 15, 2023 •

edited

Loading

dennisbader left a comment

dennisbader May 19, 2023

madtoinou May 22, 2023

dennisbader left a comment

Fix/ loading metrics and loss in load_from_checkpoint #1759

Fix/ loading metrics and loss in load_from_checkpoint #1759

Conversation

madtoinou commented May 11, 2023

Summary

Other Information

dennisbader left a comment

Choose a reason for hiding this comment

codecov-commenter commented May 15, 2023 • edited Loading

Codecov Report

dennisbader left a comment

Choose a reason for hiding this comment

dennisbader May 19, 2023

Choose a reason for hiding this comment

madtoinou May 22, 2023

Choose a reason for hiding this comment

dennisbader left a comment

Choose a reason for hiding this comment

codecov-commenter commented May 15, 2023 •

edited

Loading