Make Transformer tolerate missing layers for PP #322

wconstab · 2024-05-10T23:19:16Z

Stack from ghstack (oldest at bottom):

A few small changes here lets manual PP frontend 'reconfigure' a whole
transformer model to a stage's portion simply by setting undesired
layers to None (in cases of top level layers) or deleting them from the
ModuleDict (for 'layers.*').

These changes don't impact the FQNs of the remaining layers, which is
critical for checkpoint load/save compatibility.

[ghstack-poisoned]

A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's portion simply by setting undesired layers to None (in cases of top level layers) or deleting them from the ModuleDict (for 'layers.*'). These changes don't impact the FQNs of the remaining layers, which is critical for checkpoint load/save compatibility. ghstack-source-id: 48a7aafc89d86c3168f905560a4cd6bf4b5b9a12 Pull Request resolved: #322

tianyu-l · 2024-05-11T00:55:02Z

torchtitan/models/llama/model.py


-        for layer in self.layers:
+        for layer in self.layers.values():


Is order still respected after switching to dict? If not, we need to sort the layers based on int(key).

yea, it is. https://pytorch.org/docs/stable/generated/torch.nn.ModuleDict.html

fegin · 2024-05-13T17:18:31Z

Nice. But it is less intuitive than I originally thought. Especially the int/str conversion part. Not sure if that's a best UX for pippy or a customized PipelineModuleList will be easier for users.

wanchaol

lgtm!

A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's portion simply by setting undesired layers to None (in cases of top level layers) or deleting them from the ModuleDict (for 'layers.*'). These changes don't impact the FQNs of the remaining layers, which is critical for checkpoint load/save compatibility. ghstack-source-id: 48a7aafc89d86c3168f905560a4cd6bf4b5b9a12 Pull Request resolved: #322

awgu · 2024-05-14T10:30:05Z

torchtitan/models/llama/model.py

        for layer_id in range(model_args.n_layers):
-            self.layers.append(TransformerBlock(layer_id, model_args))
+            self.layers[str(layer_id)] = TransformerBlock(layer_id, model_args)


curious why do the dict keys have to be str (as opposed to int directly)?

awgu · 2024-06-29T16:03:41Z

One downside to using ModuleDict is that now the model print does not collapse TransformerBlocks together, making the model print very long.

A few small changes here lets manual PP frontend 'reconfigure' a whole transformer model to a stage's portion simply by setting undesired layers to None (in cases of top level layers) or deleting them from the ModuleDict (for 'layers.*'). These changes don't impact the FQNs of the remaining layers, which is critical for checkpoint load/save compatibility. ghstack-source-id: 48a7aafc89d86c3168f905560a4cd6bf4b5b9a12 Pull Request resolved: pytorch#322

Update

3fd7d4a

[ghstack-poisoned]

wconstab mentioned this pull request May 10, 2024

Refactor freqs_cis slice to be safer for PP #321

Merged

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 10, 2024

wconstab requested review from tianyu-l and wanchaol May 10, 2024 23:48

wconstab mentioned this pull request May 10, 2024

Add Pipeline Parallel (and 2D PP+FSDP) support #318

Merged

tianyu-l reviewed May 11, 2024

View reviewed changes

fegin approved these changes May 13, 2024

View reviewed changes

wanchaol approved these changes May 13, 2024

View reviewed changes

wconstab merged commit 3fd7d4a into gh/wconstab/14/base May 13, 2024
4 checks passed

wconstab deleted the gh/wconstab/14/head branch May 13, 2024 21:46

awgu reviewed May 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Transformer tolerate missing layers for PP #322

Make Transformer tolerate missing layers for PP #322

wconstab commented May 10, 2024 •

edited

Loading

tianyu-l May 11, 2024 •

edited

Loading

wconstab May 11, 2024

fegin commented May 13, 2024

wanchaol left a comment

awgu May 14, 2024

awgu commented Jun 29, 2024

Make Transformer tolerate missing layers for PP #322

Make Transformer tolerate missing layers for PP #322

Conversation

wconstab commented May 10, 2024 • edited Loading

tianyu-l May 11, 2024 • edited Loading

Choose a reason for hiding this comment

wconstab May 11, 2024

Choose a reason for hiding this comment

fegin commented May 13, 2024

wanchaol left a comment

Choose a reason for hiding this comment

awgu May 14, 2024

Choose a reason for hiding this comment

awgu commented Jun 29, 2024

wconstab commented May 10, 2024 •

edited

Loading

tianyu-l May 11, 2024 •

edited

Loading