Add support for init_meta_context, materialize_module #9920

tchaton · 2021-10-13T20:33:46Z

What does this PR do?

This PR builds on top of pytorch/pytorch#66317. A code section will be dropped once merged to PyTorch.

The goal is no code change for the end users.

from pytorch_lightning.utilities.meta import init_meta_context, materialize_module

class MLP(nn.Module):
    def __init__(self, num_convs: int):
        super().__init__()
        self.lins = []
        for _ in range(num_convs):
            self.lins.append(nn.Linear(1, 1))
        self.layer = nn.Sequential(*self.lins)

with init_meta_context():
    m = nn.Linear(in_features=1, out_features=1)
    assert m.weight.device.type == "meta"
    mlp = MLP(4)
    assert mlp.layer[0].weight.device.type == "meta"

    materialize_module(mlp)
    assert mlp.layer[0].weight.device.type == "cpu"

m = nn.Linear(in_features=1, out_features=1)
assert m.weight.device.type == "cpu"

with init_meta_context():
    m = nn.Linear(in_features=1, out_features=1)
    assert m.weight.device.type == "meta"

m = nn.Linear(in_features=1, out_features=1)
assert m.weight.device.type == "cpu"

TODO:

Verify it works with DeepSpeed on BoringModel

    with init_meta_context():
        model = BoringModel()
    assert model.layer.weight.device.type == "meta"
    trainer = Trainer(
        default_root_dir=tmpdir, plugins=[DeepSpeedPlugin(stage=3)], gpus=2, fast_dev_run=True, precision=16
    )
    trainer.fit(model)
    assert model.layer.weight.device.type == "cpu"

Verify it works with DeepSpeed on minGPT. Find the associated example here: https://github.com/SeanNaren/minGPT/blob/meta_device/train.py

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

pytorch_lightning/utilities/meta.py

SeanNaren · 2021-10-14T12:53:58Z

This is incredible work allowing users to not have to change their model definition, and hopefully in the future FSDP supports sharding directly from the meta device modules.

(cc @myleott @blefaudeux @anj-s who may have some stuff to say about the API/future integration!)

Should we wait for the code to be merged into PyTorch and available in the nightly build?

EDIT: looping @jeffra and @tjruwase from the DeepSpeed team as well :)

tchaton · 2021-10-14T13:16:08Z

This is incredible work allowing users to not have to change their model definition, and hopefully in the future FSDP supports sharding directly from the meta device modules.

(cc @myleott @blefaudeux @anj-s who may have some stuff to say about the API/future integration!)

Should we wait for the code to be merged into PyTorch and available in the nightly build?

Adding @cbalioglu to the conversation.

IMO, we definitely want this for Lightning v1.5 as it is already working with PyTorch 1.10 nightly.
Hopefully, this feature will be merged in 1.10 and released before Lightning v1.5, so we can remove the copy / paste.

Best,
T.C

SeanNaren · 2021-10-17T18:35:50Z

Lets chase up on @zou3519 comment before proceeding further with this PR! We shouldn't merge till we understand the edge cases that are causing unstability.

Also I think it would be beneficial if we showed a use case of benefit (which I think @tchaton has with DeepSpeed but might need more testing based on instantiation times vs configure_sharded_model) before considering to merge this PR!

cbalioglu · 2021-10-21T14:38:15Z

Looks good to me as an experimental API. Please consider me your point of contact for any customer feedback or issue specific to the parts you copied over from the PyTorch PR #66317.

This reverts commit 454e93b.

update

9a8954e

tchaton changed the title ~~Add support for meta device~~ Add support for use_meta_device Oct 13, 2021

tchaton added 5 commits October 13, 2021 21:36

update

36bb238

remove credit

f1890bc

update

103c311

update

f346120

update

8f7fc11

tchaton added this to the v1.5 milestone Oct 14, 2021

tchaton self-assigned this Oct 14, 2021

tchaton added the feature Is an improvement or enhancement label Oct 14, 2021

add changelog

3d7852f

tchaton marked this pull request as ready for review October 14, 2021 11:56

tchaton requested review from awaelchli, Borda, carmocca, justusschock, kaushikb11, rohitgr7, SeanNaren and williamFalcon as code owners October 14, 2021 11:56

tchaton changed the title ~~Add support for use_meta_device~~ Add support for set_device Oct 14, 2021

SeanNaren reviewed Oct 14, 2021

View reviewed changes

pytorch_lightning/utilities/meta.py Outdated Show resolved Hide resolved

tchaton added 2 commits October 14, 2021 13:58

update

feb6c9c

update on comments

0cdbec2

update changelog

8c0402b

tchaton requested a review from SeanNaren October 14, 2021 13:17

tchaton added 2 commits October 14, 2021 14:19

typo

ad0f3ba

update

73c0588

tchaton added 4 commits October 16, 2021 10:39

update

e3f991b

add a warning about unstability

cfb42a2

add a warning about unstability

50357b2

update test

df531aa

SeanNaren self-requested a review October 17, 2021 18:33

mergify bot added the has conflicts label Oct 18, 2021

Merge branch 'master' into set_meta_device

50e9d65

mergify bot removed the has conflicts label Oct 19, 2021

revert on previous api based on can comments

0afb695

mergify bot added the has conflicts label Oct 20, 2021

Merge branch 'master' into set_meta_device

2d8c0a1

mergify bot removed the has conflicts label Oct 20, 2021

tchaton merged commit 454e93b into master Oct 21, 2021

tchaton deleted the set_meta_device branch October 21, 2021 14:48

carmocca mentioned this pull request Oct 22, 2021

Add context manager to properly convert the precision #10079

Closed

12 tasks

awaelchli added a commit that referenced this pull request Oct 22, 2021

Revert "Add support for init_meta_context, materialize_module (#9920)"

94e2bf5

This reverts commit 454e93b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for init_meta_context, materialize_module #9920

Add support for init_meta_context, materialize_module #9920

tchaton commented Oct 13, 2021 •

edited

Loading

SeanNaren commented Oct 14, 2021 •

edited

Loading

tchaton commented Oct 14, 2021

SeanNaren commented Oct 17, 2021

cbalioglu commented Oct 21, 2021

Add support for init_meta_context, materialize_module #9920

Add support for init_meta_context, materialize_module #9920

Conversation

tchaton commented Oct 13, 2021 • edited Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

SeanNaren commented Oct 14, 2021 • edited Loading

tchaton commented Oct 14, 2021

SeanNaren commented Oct 17, 2021

cbalioglu commented Oct 21, 2021

tchaton commented Oct 13, 2021 •

edited

Loading

SeanNaren commented Oct 14, 2021 •

edited

Loading