Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LightningDataModule.load_from_checkpoint to load datamodules directly from checkpoint #12550

Merged
merged 12 commits into from
May 3, 2022
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added `LightningDataModule.load_from_checkpoint` to support loading datamodules directly from checkpoint ([#12550](https://github.com/PyTorchLightning/pytorch-lightning/pull/12550))


- Added a friendly error message when attempting to call `Trainer.save_checkpoint()` without a model attached ([#12772](https://github.com/PyTorchLightning/pytorch-lightning/pull/12772))

Expand Down
1 change: 1 addition & 0 deletions docs/source/common/checkpointing_basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Inside a Lightning checkpoint you'll find:
- State of all callbacks (for stateful callbacks)
- State of datamodule (for stateful datamodules)
- The hyperparameters used for that model if passed in as hparams (Argparse.Namespace)
- The hyperparameters used for that datamodule if passed in as hparams (Argparse.Namespace)
rohitgr7 marked this conversation as resolved.
Show resolved Hide resolved
- State of Loops (if using Fault-Tolerant training)

----
Expand Down
76 changes: 75 additions & 1 deletion pytorch_lightning/core/datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,15 @@
# limitations under the License.
"""LightningDataModule for loading DataLoaders with ease."""
from argparse import ArgumentParser, Namespace
from typing import Any, Dict, List, Mapping, Optional, Sequence, Tuple, Union
from typing import Any, Dict, IO, List, Mapping, Optional, Sequence, Tuple, Union

from torch.utils.data import DataLoader, Dataset, IterableDataset

from pytorch_lightning.core.hooks import CheckpointHooks, DataHooks
from pytorch_lightning.core.mixins import HyperparametersMixin
from pytorch_lightning.core.saving import _load_from_checkpoint
from pytorch_lightning.utilities.argparse import add_argparse_args, from_argparse_args, get_init_arguments_and_types
from pytorch_lightning.utilities.types import _PATH


class LightningDataModule(CheckpointHooks, DataHooks, HyperparametersMixin):
Expand Down Expand Up @@ -52,6 +54,9 @@ def teardown(self):
"""

name: str = ...
CHECKPOINT_HYPER_PARAMS_KEY = "datamodule_hyper_parameters"
CHECKPOINT_HYPER_PARAMS_NAME = "datamodule_hparams_name"
CHECKPOINT_HYPER_PARAMS_TYPE = "datamodule_hparams_type"

def __init__(self) -> None:
super().__init__()
Expand Down Expand Up @@ -158,3 +163,72 @@ def load_state_dict(self, state_dict: Dict[str, Any]) -> None:
state_dict: the datamodule state returned by ``state_dict``.
"""
pass

@classmethod
def load_from_checkpoint(
cls,
checkpoint_path: Union[_PATH, IO],
hparams_file: Optional[_PATH] = None,
**kwargs,
):
r"""
Primary way of loading a datamodule from a checkpoint. When Lightning saves a checkpoint
it stores the arguments passed to ``__init__`` in the checkpoint under ``"datamodule_hyper_parameters"``.

Any arguments specified through \*\*kwargs will override args stored in ``"datamodule_hyper_parameters"``.

Args:
checkpoint_path: Path to checkpoint. This can also be a URL, or file-like object
hparams_file: Optional path to a ``.yaml`` or ``.csv`` file with hierarchical structure
as in this example::

dataloader:
batch_size: 32

You most likely won't need this since Lightning will always save the hyperparameters
to the checkpoint.
However, if your checkpoint weights don't have the hyperparameters saved,
use this method to pass in a ``.yaml`` file with the hparams you'd like to use.
These will be converted into a :class:`~dict` and passed into your
:class:`LightningDataModule` for use.

If your datamodule's ``hparams`` argument is :class:`~argparse.Namespace`
and ``.yaml`` file has hierarchical structure, you need to refactor your datamodule to treat
``hparams`` as :class:`~dict`.
\**kwargs: Any extra keyword args needed to init the datamodule. Can also be used to override saved
carmocca marked this conversation as resolved.
Show resolved Hide resolved
hyperparameter values.

Return:
:class:`LightningDataModule` instance with loaded weights and hyperparameters (if available).

Note:
``load_from_checkpoint`` is a **class** method. You should use your :class:`LightningDataModule`
**class** to call it instead of the :class:`LightningDataModule` instance.

Example::

# load weights without mapping ...
datamodule = MyLightningDataModule.load_from_checkpoint('path/to/checkpoint.ckpt')

# or load weights and hyperparameters from separate files.
datamodule = MyLightningDataModule.load_from_checkpoint(
'path/to/checkpoint.ckpt',
hparams_file='/path/to/hparams_file.yaml'
)

# override some of the params with new values
datamodule = MyLightningDataModule.load_from_checkpoint(
PATH,
batch_size=32,
num_workers=10,
)

"""
return _load_from_checkpoint(
cls,
checkpoint_path,
map_location=None,
hparams_file=hparams_file,
strict=None,
**kwargs,
)
Loading