Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patches for Lightning have not kept up with backwards-incompatible changes #1249

Closed
a-gardner1 opened this issue Apr 17, 2024 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@a-gardner1
Copy link
Contributor

Describe the bug

While there has clearly been some effort to keep pace with changes to Lightning (see #1033), it has fallen behind since the initial patches were created (64e10b2) and new versions of Lightning were released. Unfortunately, it silently fails to apply patches to model saving and restoration, which can hide the fact that model logging doesn't fully work as expected. One of the two related (and nearly duplicate) patch methods is shown below (linked here)

    @staticmethod
    def _patch_pytorch_lightning_io():
        if PatchPyTorchModelIO.__patched_pytorch_lightning:
            return

        if 'pytorch_lightning' not in sys.modules:
            return

        PatchPyTorchModelIO.__patched_pytorch_lightning = True

        # noinspection PyBroadException
        try:
            import pytorch_lightning  # noqa

            pytorch_lightning.trainer.Trainer.save_checkpoint = _patched_call(
                pytorch_lightning.trainer.Trainer.save_checkpoint, PatchPyTorchModelIO._save)  # noqa

            pytorch_lightning.trainer.Trainer.restore = _patched_call(
                pytorch_lightning.trainer.Trainer.restore, PatchPyTorchModelIO._load_from_obj)  # noqa
        except ImportError:
            pass
        except Exception:
            pass

        # noinspection PyBroadException
        try:
            import pytorch_lightning  # noqa

            # noinspection PyUnresolvedReferences
            pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector.save_checkpoint = \
                _patched_call(
                    pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector.save_checkpoint,
                    PatchPyTorchModelIO._save)  # noqa

            # noinspection PyUnresolvedReferences
            pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector.restore = \
                _patched_call(
                    pytorch_lightning.trainer.connectors.checkpoint_connector.CheckpointConnector.restore,
                    PatchPyTorchModelIO._load_from_obj)  # noqa
        except ImportError:
            pass
        except Exception:
            pass

Three AttributeErrors exist in _patch_pytorch_lightning_io with newer versions of pytorch-lightning:

To reproduce

No reproduction is necessary. There are multiple clear AttributeErrors that get caught by the Exception handler depending on the pytorch-lightning version.

Expected behaviour

The checkpointing mechanism of pytorch-lightning should have been patched to enable automatic logging of models with ClearML.

Environment

  • Server type: self hosted
  • ClearML SDK Version: 1.15.1
  • ClearML Server Version (Only for self hosted): WebApp: 1.11.0-000 • Server: 1.12.0- • API: 2.26
  • Python Version: 3.11
  • OS: Linux
@a-gardner1 a-gardner1 added the bug Something isn't working label Apr 17, 2024
@ainoam
Copy link
Collaborator

ainoam commented Apr 18, 2024

Thanks for letting us know @a-gardner1. We'll take a look and update on fix availability.

a-gardner1 added a commit to a-gardner1/clearml that referenced this issue Apr 22, 2024
@a-gardner1
Copy link
Contributor Author

I can open a PR with a proposed fix if you like. I've already implemented one

@ainoam
Copy link
Collaborator

ainoam commented Apr 24, 2024

Contributions are most welcome @a-gardner1 🙂

@a-gardner1
Copy link
Contributor Author

@ainoam In case you missed it, I did open a PR. Let me know if anything looks off

@ainoam
Copy link
Collaborator

ainoam commented May 8, 2024

Thanks for the friendly nudge @a-gardner1. We'll try to address your PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants