Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Transformer tolerate missing layers for PP #322

Merged
merged 1 commit into from
May 13, 2024

Conversation

wconstab
Copy link
Contributor

@wconstab wconstab commented May 10, 2024

Stack from ghstack (oldest at bottom):

A few small changes here lets manual PP frontend 'reconfigure' a whole
transformer model to a stage's portion simply by setting undesired
layers to None (in cases of top level layers) or deleting them from the
ModuleDict (for 'layers.*').

These changes don't impact the FQNs of the remaining layers, which is
critical for checkpoint load/save compatibility.

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 10, 2024
wconstab added a commit that referenced this pull request May 10, 2024
A few small changes here lets manual PP frontend 'reconfigure' a whole
transformer model to a stage's portion simply by setting undesired
layers to None (in cases of top level layers) or deleting them from the
ModuleDict (for 'layers.*').

These changes don't impact the FQNs of the remaining layers, which is
critical for checkpoint load/save compatibility.

ghstack-source-id: 48a7aafc89d86c3168f905560a4cd6bf4b5b9a12
Pull Request resolved: #322
@wconstab wconstab requested review from tianyu-l and wanchaol May 10, 2024 23:48

for layer in self.layers:
for layer in self.layers.values():
Copy link
Contributor

@tianyu-l tianyu-l May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is order still respected after switching to dict? If not, we need to sort the layers based on int(key).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fegin
Copy link
Contributor

fegin commented May 13, 2024

Nice. But it is less intuitive than I originally thought. Especially the int/str conversion part. Not sure if that's a best UX for pippy or a customized PipelineModuleList will be easier for users.

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@wconstab wconstab merged commit 3fd7d4a into gh/wconstab/14/base May 13, 2024
4 checks passed
wconstab added a commit that referenced this pull request May 13, 2024
A few small changes here lets manual PP frontend 'reconfigure' a whole
transformer model to a stage's portion simply by setting undesired
layers to None (in cases of top level layers) or deleting them from the
ModuleDict (for 'layers.*').

These changes don't impact the FQNs of the remaining layers, which is
critical for checkpoint load/save compatibility.

ghstack-source-id: 48a7aafc89d86c3168f905560a4cd6bf4b5b9a12
Pull Request resolved: #322
@wconstab wconstab deleted the gh/wconstab/14/head branch May 13, 2024 21:46
for layer_id in range(model_args.n_layers):
self.layers.append(TransformerBlock(layer_id, model_args))
self.layers[str(layer_id)] = TransformerBlock(layer_id, model_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why do the dict keys have to be str (as opposed to int directly)?

@awgu
Copy link
Contributor

awgu commented Jun 29, 2024

One downside to using ModuleDict is that now the model print does not collapse TransformerBlocks together, making the model print very long.

tianyu-l pushed a commit to tianyu-l/torchtitan_intern24 that referenced this pull request Aug 16, 2024
A few small changes here lets manual PP frontend 'reconfigure' a whole
transformer model to a stage's portion simply by setting undesired
layers to None (in cases of top level layers) or deleting them from the
ModuleDict (for 'layers.*').

These changes don't impact the FQNs of the remaining layers, which is
critical for checkpoint load/save compatibility.

ghstack-source-id: 48a7aafc89d86c3168f905560a4cd6bf4b5b9a12
Pull Request resolved: pytorch#322
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
A few small changes here lets manual PP frontend 'reconfigure' a whole
transformer model to a stage's portion simply by setting undesired
layers to None (in cases of top level layers) or deleting them from the
ModuleDict (for 'layers.*').

These changes don't impact the FQNs of the remaining layers, which is
critical for checkpoint load/save compatibility.

ghstack-source-id: 48a7aafc89d86c3168f905560a4cd6bf4b5b9a12
Pull Request resolved: pytorch#322
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants