[RFC] Deprecate `weights_summary` off the Trainer constructor #9043

ananthsub · 2021-08-23T02:03:34Z

Proposed refactoring or deprecation

Introduce a new ModelSummary callback, which calls summarize: https://github.com/PyTorchLightning/pytorch-lightning/blob/8a931732ae5135e3e55d9c7b7031d81837e5798a/pytorch_lightning/utilities/model_summary.py#L437-L439
Deprecate weights_summary off the Trainer constructor

Motivation

We are auditing the Lightning components and APIs to assess opportunities for improvements:

This is a followup to #8478 and #9006

Why do we want to remove this from the core trainer logic?

We need a way for users to customize more of the inputs to the model summary over time without affecting the trainer API. Today, changes to the model summary API also require changes in the core trainer (e.g. the addition of max_depth ). This gives model summarization more room to grow without cascading changes elsewhere.
Users may want to configure this summarization for different points of execution. Right now, this is hardcoded to be run only during fit(). But users could want to call this potentially multiple times during each of trainer.fit(), trainer.validate(), trainer.test() or trainer.predict().
Users may want to customize where they save the summary. Right now, it's printed to stdout, but this could also be useful to save to a file or upload to another service for tracking the run.
The current implementation runs on global rank 0 only in order to avoid printing out multiple summary tables. However, running this on rank 0 will break for model parallel use cases that require communication across ranks. This can lead to subtle failures if example_input_array is set as a property on the LightningModule. For instance, a model wrapped with FSDP will break because parameters need to be all-gathered across layers across ranks.
In case the LightningModule leverages PyTorch LazyModules, users may want to generate this summary only after the first batch is processed in order to get accurate parameter estimations. Estimates of parameter sizes with lazy modules would be misleading.
AFAICT, this is the only piece of logic that runs in between on_pretrain_routine_start/end hooks. Would we still need these hooks if the summarization logic was removed from the trainer? Why doesn't this happen in on_train_start today? We don't have on_prevalidation_routine_start/end hooks: the necessity of these hooks for training isn't clear to me, and further deprecating these hooks could bring greater API clarity & simplification.
https://github.com/PyTorchLightning/pytorch-lightning/blob/8a931732ae5135e3e55d9c7b7031d81837e5798a/pytorch_lightning/trainer/trainer.py#L1103-L1113

Pitch

A callback in Lightning naturally fits this extension purpose. It generalizes well across lightning modules, has great flexibility for when it can be called, and allows users to customize the summarization logic (e.g. integrate other libraries more easily).

https://github.com/tyleryep/torchinfo
https://github.com/facebookresearch/fvcore/blob/master/fvcore/nn/flop_count.py
With this callback available, this logic can be removed from the core Trainer in order to be more pluggable:
https://github.com/PyTorchLightning/pytorch-lightning/blob/6604fc1344e1b8a459c45a5a2157aa7fc60d950d/pytorch_lightning/trainer/trainer.py#L1000-L1004

Additional context

The model summary is by default enabled right now. This is likely the core issue we have to resolve as to whether this is opt-in or opt-out: #8478 (comment)

Seeking @edenafek @tchaton 's input on this

If you enjoy Lightning, check out our other projects! ⚡

_{Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.}

The text was updated successfully, but these errors were encountered:

awaelchli · 2021-08-23T08:42:01Z

@ananthsub the callback idea sounds nice but how do we make it so that a user can disable it? It's the same reason why we have the checkpoint_callback Trainer argument despite checkpointing being fully implemented as a callback. We set "good" defaults but we need a way to disable it too.

ananthsub · 2021-08-23T16:48:36Z

@ananthsub the callback idea sounds nice but how do we make it so that a user can disable it? It's the same reason why we have the checkpoint_callback Trainer argument despite checkpointing being fully implemented as a callback. We set "good" defaults but we need a way to disable it too.

This assumes the weights-summary is by default enabled. In this scenario, an incremental approach would be:

Add the callback
Update this connector to add the callback to the trainer list: https://github.com/PyTorchLightning/pytorch-lightning/blob/f3c5889aa3116bb8299a849f4f02dfb535b5624e/pytorch_lightning/trainer/connectors/callback_connector.py#L28-L65
The trainer constructor arg remains the same. Users can already set weights_summary=None to disable it.

Or, we can make this opt-in. Remove the argument from the trainer constructor entirely, and enforce that users set this by instantiating a callback and passing it to the callbacks argument.

awaelchli · 2021-08-24T00:15:39Z

This assumes the weights-summary is by default enabled.

Yes this was my assumption and I hope we can keep that. I will definitely vote for that, but I am biased since I have been working on that summary, so I prefer if others could comment @PyTorchLightning/core-contributors . I strongly believe everyone working with ML models should be aware with how many parameters they are training with.

ananthsub · 2021-08-24T02:34:25Z

This assumes the weights-summary is by default enabled.

Yes this was my assumption and I hope we can keep that. I will definitely vote for that, but I am biased since I have been working on that summary, so I prefer if others could comment @PyTorchLightning/core-contributors . I strongly believe everyone working with ML models should be aware with how many parameters they are training with.

What this would likely also introduce is users needing to extend their callback from a particular summary base class in order for the trainer to validate whether it should add the default callback or not. This is the case for the Model
Checkpoint today.

However, the model checkpoint callback is not explicitly designed with extensibility in mind. See prior issues for offering a base interface:

This could limit what a custom summary could do in case the base class is too restrictive. I'd also like to avoid dependencies on inheritance wherever possible.

ananthsub · 2021-08-24T02:35:03Z

Would definitely like to hear @kaushikb11 's opinion given work on the Rich-based Model Summary

kaushikb11 · 2021-08-31T06:53:05Z

@ananthsub I absolutely agree with you on introducing a new ModelSummary callback. We could easily extend the logic to a callback and make the Trainer cleaner. One more good point is that the user could extend the callback for different use cases. It could be extended for RichProgressBar as well, with updating the summarize functionality. As one can see in #9215, it feels hacky with the current design to extend it.

There are two parts to this issue as you mentioned. Regarding the second part proposal, I strongly believe we shouldn't deprecate the weights_summary argument. It's a good default to have as well as Users should be aware of the model parameters.

Also, with the current ModelSummary, we are able to support the below step as well

model = LitModel()
ModelSummary(model, max_depth=1)

Hence, we should be supporting both, the existing ModelSummary class and the new proposed callback.

ananthsub · 2021-09-02T00:09:13Z

@kaushikb11 @awaelchli - what do you think of this to mirror what was done for checkpoint_callback:

We create a new ModelSummary callback, which uses the summarize functionality recently moved to the utilities
We inject a ModelSummary callback into the callback constructor if none are set here and replace the hardcoded logic here. This way users could extend this callback for when the summary is generated, what information is included in the summary, or where the summary is outputted (e.g. stdout, file, etc).
On the Trainer, we type weights_summary as Union[str, bool] = True and mark the string values as deprecated, given the oncoming incompatibilities over mode vs max_depth. The expectation is that users who wish to customize the ModelSummary should pass in a customized ModelSummary callback through the callbacks argument.

awaelchli · 2021-09-03T00:09:08Z

I like that very much. This is the minimal functionality I wish we could keep.

Btw on a side note, I believe the best moving forward would be to have a model summary class with the only responsibility of collecting the summary data (like it is now) BUT not contain the logic for printing and visualization. I think it would be best if this would live in the callback. this way it will also be easier to customize things like rich logging etc while keeping the actual model summary untouched.

not sure if @kaushikb11 was already going in that direction, he might have

ananthsub · 2021-09-03T01:02:35Z

@awaelchli I fully agree regarding where the output of the summarization should go. Outputting the summary should live in the callback and not in the utils, as it does today

tchaton · 2021-09-03T07:43:38Z

Awesome ! Let's do this :)

kaushikb11 · 2021-09-06T09:15:48Z

@awaelchli

I believe the best moving forward would be to have a model summary class with the only responsibility of collecting the summary data (like it is now) BUT not contain the logic for printing and visualization.

How would we support the following then?

from pytorch_lightning.utilities.model_summary import ModelSummary

model = LitModel()
ModelSummary(model, max_depth=1)

IMO, we could have the default string output for the ModelSummary class but could move the summary_data aggregation in get_summary_data method as I did here.

awaelchli · 2021-09-26T14:24:46Z

@ananthsub would it make sense to drop the "enable_" prefix? It seems redundant because the type is bool anyway.
Pro:

shorter
I don't need to remember if it was "disable_feature_x" or "enable_feature_x"

#9664 has the same problem imo.

TalhaUsuf · 2022-01-27T09:33:17Z

@ananthsub I absolutely agree with you on introducing a new ModelSummary callback. We could easily extend the logic to a callback and make the Trainer cleaner. One more good point is that the user could extend the callback for different use cases. It could be extended for RichProgressBar as well, with updating the summarize functionality. As one can see in #9215, it feels hacky with the current design to extend it.

There are two parts to this issue as you mentioned. Regarding the second part proposal, I strongly believe we shouldn't deprecate the weights_summary argument. It's a good default to have as well as Users should be aware of the model parameters.

Also, with the current ModelSummary, we are able to support the below step as well
model = LitModel()
ModelSummary(model, max_depth=1)
Hence, we should be supporting both, the existing ModelSummary class and the new proposed callback.

callback idea is great , we can call it anywhere plus it gives the flexibility of setting the depth parameter.

ModelSummary()

ananthsub added feature Is an improvement or enhancement help wanted Open to be worked on refactor deprecation Includes a deprecation labels Aug 23, 2021

ananthsub mentioned this issue Aug 23, 2021

Deprecate summarize() off LightningModule #8478

Closed

tchaton added the let's do it! approved to implement label Sep 3, 2021

kaushikb11 self-assigned this Sep 3, 2021

kaushikb11 mentioned this issue Sep 6, 2021

feat: Add ModelSummary Callback #9344

Merged

12 tasks

ananthsub mentioned this issue Sep 11, 2021

fix mypy typing for model summary #9447

Merged

11 tasks

awaelchli mentioned this issue Sep 11, 2021

update rank_zero condition for logging model summary #9461

Merged

11 tasks

ananthsub mentioned this issue Sep 25, 2021

Add enable_model_summary flag and deprecate weights_summary #9699

Merged

12 tasks

ananthsub added this to the v1.5 milestone Sep 30, 2021

ananthsub added the callback label Sep 30, 2021

kaushikb11 closed this as completed in #9699 Oct 13, 2021

ananthsub mentioned this issue Jan 28, 2022

[RFC] Deprecate on_pretrain_routine_start and on_pretrain_routine_end LM/callback hooks #10984

Closed

Hijus22 mentioned this issue Sep 5, 2022

weights_summary error even after passing enable_model_summary sktime/pytorch-forecasting#1085

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Deprecate `weights_summary` off the Trainer constructor #9043

[RFC] Deprecate `weights_summary` off the Trainer constructor #9043

ananthsub commented Aug 23, 2021 •

edited

Loading

awaelchli commented Aug 23, 2021 •

edited

Loading

ananthsub commented Aug 23, 2021

awaelchli commented Aug 24, 2021

ananthsub commented Aug 24, 2021

ananthsub commented Aug 24, 2021

kaushikb11 commented Aug 31, 2021 •

edited

Loading

ananthsub commented Sep 2, 2021 •

edited

Loading

awaelchli commented Sep 3, 2021 •

edited

Loading

ananthsub commented Sep 3, 2021 •

edited

Loading

tchaton commented Sep 3, 2021

kaushikb11 commented Sep 6, 2021

awaelchli commented Sep 26, 2021 •

edited

Loading

TalhaUsuf commented Jan 27, 2022

[RFC] Deprecate weights_summary off the Trainer constructor #9043

[RFC] Deprecate weights_summary off the Trainer constructor #9043

Comments

ananthsub commented Aug 23, 2021 • edited Loading

Proposed refactoring or deprecation

Motivation

Pitch

Additional context

If you enjoy Lightning, check out our other projects! ⚡

awaelchli commented Aug 23, 2021 • edited Loading

ananthsub commented Aug 23, 2021

awaelchli commented Aug 24, 2021

ananthsub commented Aug 24, 2021

ananthsub commented Aug 24, 2021

kaushikb11 commented Aug 31, 2021 • edited Loading

ananthsub commented Sep 2, 2021 • edited Loading

awaelchli commented Sep 3, 2021 • edited Loading

ananthsub commented Sep 3, 2021 • edited Loading

tchaton commented Sep 3, 2021

kaushikb11 commented Sep 6, 2021

awaelchli commented Sep 26, 2021 • edited Loading

TalhaUsuf commented Jan 27, 2022

[RFC] Deprecate `weights_summary` off the Trainer constructor #9043

[RFC] Deprecate `weights_summary` off the Trainer constructor #9043

ananthsub commented Aug 23, 2021 •

edited

Loading

awaelchli commented Aug 23, 2021 •

edited

Loading

kaushikb11 commented Aug 31, 2021 •

edited

Loading

ananthsub commented Sep 2, 2021 •

edited

Loading

awaelchli commented Sep 3, 2021 •

edited

Loading

ananthsub commented Sep 3, 2021 •

edited

Loading

awaelchli commented Sep 26, 2021 •

edited

Loading