Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Deprecate weights_summary off the Trainer constructor #9043

Closed
ananthsub opened this issue Aug 23, 2021 · 13 comments · Fixed by #9699
Closed

[RFC] Deprecate weights_summary off the Trainer constructor #9043

ananthsub opened this issue Aug 23, 2021 · 13 comments · Fixed by #9699
Assignees
Labels
callback deprecation Includes a deprecation feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement refactor
Milestone

Comments

@ananthsub
Copy link
Contributor

ananthsub commented Aug 23, 2021

Proposed refactoring or deprecation

Motivation

We are auditing the Lightning components and APIs to assess opportunities for improvements:

This is a followup to #8478 and #9006

Why do we want to remove this from the core trainer logic?

  • We need a way for users to customize more of the inputs to the model summary over time without affecting the trainer API. Today, changes to the model summary API also require changes in the core trainer (e.g. the addition of max_depth ). This gives model summarization more room to grow without cascading changes elsewhere.
  • Users may want to configure this summarization for different points of execution. Right now, this is hardcoded to be run only during fit(). But users could want to call this potentially multiple times during each of trainer.fit(), trainer.validate(), trainer.test() or trainer.predict().
  • Users may want to customize where they save the summary. Right now, it's printed to stdout, but this could also be useful to save to a file or upload to another service for tracking the run.
  • The current implementation runs on global rank 0 only in order to avoid printing out multiple summary tables. However, running this on rank 0 will break for model parallel use cases that require communication across ranks. This can lead to subtle failures if example_input_array is set as a property on the LightningModule. For instance, a model wrapped with FSDP will break because parameters need to be all-gathered across layers across ranks.
  • In case the LightningModule leverages PyTorch LazyModules, users may want to generate this summary only after the first batch is processed in order to get accurate parameter estimations. Estimates of parameter sizes with lazy modules would be misleading.
  • AFAICT, this is the only piece of logic that runs in between on_pretrain_routine_start/end hooks. Would we still need these hooks if the summarization logic was removed from the trainer? Why doesn't this happen in on_train_start today? We don't have on_prevalidation_routine_start/end hooks: the necessity of these hooks for training isn't clear to me, and further deprecating these hooks could bring greater API clarity & simplification.
    https://github.com/PyTorchLightning/pytorch-lightning/blob/8a931732ae5135e3e55d9c7b7031d81837e5798a/pytorch_lightning/trainer/trainer.py#L1103-L1113

Pitch

A callback in Lightning naturally fits this extension purpose. It generalizes well across lightning modules, has great flexibility for when it can be called, and allows users to customize the summarization logic (e.g. integrate other libraries more easily).

Additional context

The model summary is by default enabled right now. This is likely the core issue we have to resolve as to whether this is opt-in or opt-out: #8478 (comment)

Seeking @edenafek @tchaton 's input on this


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

@ananthsub ananthsub added feature Is an improvement or enhancement help wanted Open to be worked on refactor deprecation Includes a deprecation labels Aug 23, 2021
@awaelchli
Copy link
Contributor

awaelchli commented Aug 23, 2021

@ananthsub the callback idea sounds nice but how do we make it so that a user can disable it? It's the same reason why we have the checkpoint_callback Trainer argument despite checkpointing being fully implemented as a callback. We set "good" defaults but we need a way to disable it too.

@ananthsub
Copy link
Contributor Author

@ananthsub the callback idea sounds nice but how do we make it so that a user can disable it? It's the same reason why we have the checkpoint_callback Trainer argument despite checkpointing being fully implemented as a callback. We set "good" defaults but we need a way to disable it too.

This assumes the weights-summary is by default enabled. In this scenario, an incremental approach would be:

Or, we can make this opt-in. Remove the argument from the trainer constructor entirely, and enforce that users set this by instantiating a callback and passing it to the callbacks argument.

@awaelchli
Copy link
Contributor

This assumes the weights-summary is by default enabled.

Yes this was my assumption and I hope we can keep that. I will definitely vote for that, but I am biased since I have been working on that summary, so I prefer if others could comment @PyTorchLightning/core-contributors . I strongly believe everyone working with ML models should be aware with how many parameters they are training with.

@ananthsub
Copy link
Contributor Author

This assumes the weights-summary is by default enabled.

Yes this was my assumption and I hope we can keep that. I will definitely vote for that, but I am biased since I have been working on that summary, so I prefer if others could comment @PyTorchLightning/core-contributors . I strongly believe everyone working with ML models should be aware with how many parameters they are training with.

What this would likely also introduce is users needing to extend their callback from a particular summary base class in order for the trainer to validate whether it should add the default callback or not. This is the case for the Model
Checkpoint today.

However, the model checkpoint callback is not explicitly designed with extensibility in mind. See prior issues for offering a base interface:

This could limit what a custom summary could do in case the base class is too restrictive. I'd also like to avoid dependencies on inheritance wherever possible.

@ananthsub
Copy link
Contributor Author

Would definitely like to hear @kaushikb11 's opinion given work on the Rich-based Model Summary

@kaushikb11
Copy link
Contributor

kaushikb11 commented Aug 31, 2021

@ananthsub I absolutely agree with you on introducing a new ModelSummary callback. We could easily extend the logic to a callback and make the Trainer cleaner. One more good point is that the user could extend the callback for different use cases. It could be extended for RichProgressBar as well, with updating the summarize functionality. As one can see in #9215, it feels hacky with the current design to extend it.

There are two parts to this issue as you mentioned. Regarding the second part proposal, I strongly believe we shouldn't deprecate the weights_summary argument. It's a good default to have as well as Users should be aware of the model parameters.

Also, with the current ModelSummary, we are able to support the below step as well

model = LitModel()
ModelSummary(model, max_depth=1)

Hence, we should be supporting both, the existing ModelSummary class and the new proposed callback.

@ananthsub
Copy link
Contributor Author

ananthsub commented Sep 2, 2021

@kaushikb11 @awaelchli - what do you think of this to mirror what was done for checkpoint_callback:

  1. We create a new ModelSummary callback, which uses the summarize functionality recently moved to the utilities

  2. We inject a ModelSummary callback into the callback constructor if none are set here and replace the hardcoded logic here. This way users could extend this callback for when the summary is generated, what information is included in the summary, or where the summary is outputted (e.g. stdout, file, etc).

  3. On the Trainer, we type weights_summary as Union[str, bool] = True and mark the string values as deprecated, given the oncoming incompatibilities over mode vs max_depth. The expectation is that users who wish to customize the ModelSummary should pass in a customized ModelSummary callback through the callbacks argument.

@awaelchli
Copy link
Contributor

awaelchli commented Sep 3, 2021

I like that very much. This is the minimal functionality I wish we could keep.

Btw on a side note, I believe the best moving forward would be to have a model summary class with the only responsibility of collecting the summary data (like it is now) BUT not contain the logic for printing and visualization. I think it would be best if this would live in the callback. this way it will also be easier to customize things like rich logging etc while keeping the actual model summary untouched.

not sure if @kaushikb11 was already going in that direction, he might have

@ananthsub
Copy link
Contributor Author

ananthsub commented Sep 3, 2021

@awaelchli I fully agree regarding where the output of the summarization should go. Outputting the summary should live in the callback and not in the utils, as it does today

@tchaton tchaton added the let's do it! approved to implement label Sep 3, 2021
@tchaton
Copy link
Contributor

tchaton commented Sep 3, 2021

Awesome ! Let's do this :)

@kaushikb11 kaushikb11 self-assigned this Sep 3, 2021
@kaushikb11
Copy link
Contributor

@awaelchli

I believe the best moving forward would be to have a model summary class with the only responsibility of collecting the summary data (like it is now) BUT not contain the logic for printing and visualization.

How would we support the following then?

from pytorch_lightning.utilities.model_summary import ModelSummary

model = LitModel()
ModelSummary(model, max_depth=1)

IMO, we could have the default string output for the ModelSummary class but could move the summary_data aggregation in get_summary_data method as I did here.

@awaelchli
Copy link
Contributor

awaelchli commented Sep 26, 2021

@ananthsub would it make sense to drop the "enable_" prefix? It seems redundant because the type is bool anyway.
Pro:

  • shorter
  • I don't need to remember if it was "disable_feature_x" or "enable_feature_x"

#9664 has the same problem imo.

@ananthsub ananthsub added this to the v1.5 milestone Sep 30, 2021
@TalhaUsuf
Copy link

@ananthsub I absolutely agree with you on introducing a new ModelSummary callback. We could easily extend the logic to a callback and make the Trainer cleaner. One more good point is that the user could extend the callback for different use cases. It could be extended for RichProgressBar as well, with updating the summarize functionality. As one can see in #9215, it feels hacky with the current design to extend it.

There are two parts to this issue as you mentioned. Regarding the second part proposal, I strongly believe we shouldn't deprecate the weights_summary argument. It's a good default to have as well as Users should be aware of the model parameters.

Also, with the current ModelSummary, we are able to support the below step as well

model = LitModel()
ModelSummary(model, max_depth=1)

Hence, we should be supporting both, the existing ModelSummary class and the new proposed callback.

callback idea is great , we can call it anywhere plus it gives the flexibility of setting the depth parameter.

ModelSummary() 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
callback deprecation Includes a deprecation feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement refactor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants