Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move progress bar disabling out of the Trainer #10982

Closed
ananthsub opened this issue Dec 7, 2021 · 3 comments · Fixed by #11377
Closed

Move progress bar disabling out of the Trainer #10982

ananthsub opened this issue Dec 7, 2021 · 3 comments · Fixed by #11377

Comments

@ananthsub
Copy link
Contributor

ananthsub commented Dec 7, 2021

Proposed refactor

Move this logic to individual progress bar callback implementations:
https://github.com/PyTorchLightning/pytorch-lightning/blob/6369e3b77fa3f38613b661517f6361f842f611c9/pytorch_lightning/trainer/trainer.py#L1273-L1275

Motivation

  1. Simplifies the trainer

  2. Avoid duplication of this logic in between trainer & spawning plugins. For example, this logic is replicated in the TPU Spawn strategy: https://github.com/PyTorchLightning/pytorch-lightning/blob/6369e3b77fa3f38613b661517f6361f842f611c9/pytorch_lightning/plugins/training_type/tpu_spawn.py#L158-L159

  3. Avoid duplication of this logic across different running stages:

  1. Allows custom progress bars to collect information from all ranks.

We underwent a very similar refactor for loggers here to remove rank 0 restrictions:
#8589
#8608

#7740

Pitch

enable and disable can be internal implementations of the progress bar callback. This flags can be set anytime after the setup hook runs.

def setup(self, trainer, pl_module, stage) -> None:
    if not trainer.is_global_zero:
        self.disable()
    ...

This way the trainer doesn't have to do any special checks for progress bars in the middle of the training control flow

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @justusschock @awaelchli @akihironitta @SeanNaren @kaushikb11

@daniellepintz
Copy link
Contributor

Completely agree with this! I can work on it unless you were planning to?

@ananthsub
Copy link
Contributor Author

#11061 removes redundancy for TPU spawning

@awaelchli
Copy link
Contributor

awaelchli commented Dec 14, 2021

Ok, didn't see this issue before, but sounds good feel free to go ahead. To add more context to the issue: For historical reason, the switch had to happen in the Trainer because of the spawning issue and also because the global_rank back then was not defined on Trainer init (delayed to later). Now that #10896 has landed, the progress bar can encapsulate this behavior completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment