Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightning Lite Examples #9987

Merged
merged 385 commits into from
Nov 2, 2021
Merged

Lightning Lite Examples #9987

merged 385 commits into from
Nov 2, 2021

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Oct 18, 2021

What does this PR do?

This is the V1 for the new Lightning Lite package. It bundles all major changes together, but individual PRs will be done to get it merged (e.g., #10175, #10176)
Planned to be released as part of 1.5.

Demo

Lightning Lite Demo

TODOs

  • Precision support

    • NativeMixedPrecision
    • Delay support ApexMixedPrecision
    • TPUBfloat (verify it works)
    • Double
  • Plugin support

    • DeepSpeed
    • DeepSpeed with multiple models (@tchaton, @awaelchli)
    • TPUSpawn: DDPSpawn changes need to be carried down into this plugin. A minor refactor could be merged to PL master. (@kaushikb11)
    • Sharded (optimizer sharding)
    • Delay support FullySharded (model sharding)
  • Move data to device automatically

  • Move model to device automatically

  • Allow only one model per setup() call

  • DataLoader setup: Currently, there is no distributed sampler.

  • Resolve miscellaneous TODOs in the code base

  • Fix changes that broke Lightning tests

  • Make self.setup() take model and optimizers positionally.

  • Unit testing, parity tests

  • Typing (mypy)

Discussions

  • LightningLite constructor arguments: We are currently changing the Trainer constructor arguments to support a new pattern: Trainer(accelerator="cpu/tpu/gpu", strategy="ddp/deepspeed/...", devices=X) Should we start promoting this directly in the LightningLite API?

  • Deepspeed API for backward: The user can't call loss.backward(), it needs to be called on the model. Which API do we want to offer:

    A) self.backward(loss, model) # model is optional for plugins other than deepspeed
    B) model.backward(loss)

    In both cases, user needs to change their code if they switch from one strategy to the next.

  • Deepspeed API for optimization step: Plain Deepspeed requires a call to model.step() as opposed to the usual optimizer.step(). Since we wrap the optimizers of the user anyway, we could still offer optimizer.step() and redirect to model.step(). It would mean less code changes for the user switching between plugins, but for deepspeed users it might be confusing!

Related work:

Part of #1 (it's a lie, this is just here to avoid noisy GitHub bot)

@awaelchli awaelchli added feature Is an improvement or enhancement priority: 0 High priority task labels Oct 18, 2021
@awaelchli awaelchli added this to the v1.5 milestone Oct 18, 2021
pytorch_lightning/lite/lite.py Outdated Show resolved Hide resolved
pytorch_lightning/lite/wrappers.py Outdated Show resolved Hide resolved
pytorch_lightning/plugins/precision/precision_plugin.py Outdated Show resolved Hide resolved
@ananthsub
Copy link
Contributor

The design document isn't visible to people outside of grid.aI so it's hard to know the context for this

@awaelchli awaelchli mentioned this pull request Oct 19, 2021
11 tasks
@tchaton
Copy link
Contributor

tchaton commented Oct 19, 2021

The design document isn't visible to people outside of grid.aI so it's hard to know the context for this

Hey @ananthsub. Find the document here: https://docs.google.com/document/d/1b10LMNqnv1ellVTAEIlJFV5KvBuxIlFCTnNB3SYIFok/edit#heading=h.jl44rslqge7e.

Best,
T.C

awaelchli and others added 7 commits November 1, 2021 17:53
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@awaelchli awaelchli merged commit 3cd65b5 into master Nov 2, 2021
@awaelchli awaelchli deleted the lite-poc branch November 2, 2021 08:04
Comment on lines +167 to +170
def __len__(self) -> Union[int, float]:
if isinstance(self._dataloader, Sized):
return len(self._dataloader)
return float("inf")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not belong in this PR. Why did we add this?
Needs to be addressed in #10297

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fabric lightning.fabric.Fabric feature Is an improvement or enhancement priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.