Skip to content

Commit

Permalink
Docs: Add note about version counter in ModelCheckpoint (#20146)
Browse files Browse the repository at this point in the history
  • Loading branch information
adosar authored Aug 4, 2024
1 parent 1bc2aad commit 854d166
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 1 deletion.
9 changes: 9 additions & 0 deletions docs/source-pytorch/common/checkpointing_intermediate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,15 @@ Which
filename="sample-mnist-{epoch:02d}-{global_step}",
)

.. note::

It is recommended that you pass formatting options to ``filename`` to include the monitored metric like shown
in the example above. Otherwise, if ``save_top_k >= 2`` and ``enable_version_counter=True`` (default), a
version is appended to the ``filename`` to prevent filename collisions. You should not rely on the appended
version to retrieve the top-k model, since there is no relationship between version count and model performance.
For example, ``filename-v2.ckpt`` doesn't necessarily correspond to the top-2 model.


- You can customize the checkpointing behavior to monitor any quantity of your training or validation steps. For example, if you want to update your checkpoints based on your validation loss:

|
Expand Down
4 changes: 3 additions & 1 deletion src/lightning/pytorch/callbacks/model_checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,9 @@ class ModelCheckpoint(Checkpoint):
Please note that the monitors are checked every ``every_n_epochs`` epochs.
If ``save_top_k >= 2`` and the callback is called multiple times inside an epoch, and the filename remains
unchanged, the name of the saved file will be appended with a version count starting with ``v1`` to avoid
collisions unless ``enable_version_counter`` is set to False.
collisions unless ``enable_version_counter`` is set to False. The version counter is unrelated to the top-k
ranking of the checkpoint, and we recommend formatting the filename to include the monitored metric to avoid
collisions.
mode: one of {min, max}.
If ``save_top_k != 0``, the decision to overwrite the current save file is made
based on either the maximization or the minimization of the monitored quantity.
Expand Down

0 comments on commit 854d166

Please sign in to comment.