Swappable Transformer Components #3567

spencerp · 2021-04-01T18:01:05Z

Problem

Right now it's cumbersome to override some specific part of a transformer. If you want to significantly modify the encoder self attention, for example, you either need to add a bunch of conditional branching in MultiHeadAttention or subclass MultiHeadAttention and TransformerGeneratorModel and TransformerEncoder and TransformerEncoderLayer.

Purpose of this PR

Allow swapping out of any nn.Module in TransformerGeneratorModel while maintaining backwards compatibility with existing call sites.

NOT the Purpose of this PR

Reducing subclassing to improve readability, nor making architectures infinitely composable.

Overview

The main files to look at are:

parlai/agents/examples/transformer_variant.py: demonstrates swapping out various components in TransformerGeneratorModel
parlai/agents/transformer/modules/modular.py: defines and explains the new classes introduced

A brief summary of the pattern introduced:

ModularComponent: Any component with subcomponents that can be swapped out.
Subcomponents: A catalog of all swappable nn.Modules in a ModularComponent. Replace any class in the subcomponents with one that shares the same __init__ and forward signatures and it should just work.
To modify some business logic in a model: subclass (e.g. modifying TransformerEncoder.forward_embedding).
To replace an entire component: use the Subcomponents (e.g. replacing MultiHeadAttention in TransformerDecoderLayer).
ModularComponentSpec: A convenience object for specifying a ModularComponent along with a Subcomponents for building its subcomponents.
Together these can be thought of as a graph/tree that fully defines an architecture. If you find yourself adding new components to the template, you might want to create a new Module/ModularComponent instead.

Some downsides:

It does not explicitly allow for fancier, dynamic architecture changes like using a different layer class for for each layer. But this could still be accomplished by passing in a custom layer class that behaviors differently depending on the layer. If we find ourselves wanting super dynamic composable architectures often we can revisit this.
The code that initializes each component gets a little harder to read. TransformerDecoderLayer is easier to understand than swappables.layer.

Testing

Added a print statement in MyCustomEncoder.forward and verified it showed in stdout:

λ parlai train_model --model examples/transformer_variant --task convai2 --model-file /tmp/testtransformer --beam-size 5 --batchsize 16
...
10:42:49 | training...                                                                                                         
12:54:54 | Custom encoder called!                                                                                              
12:54:55 | Custom attention called!                                                                                                                                                                                                                           
12:54:55 | Custom attention called!                                                                                            
12:54:55 | Custom encoder called!
...

CircleCI

…omments

parlai/agents/transformer/modules/decoder.py

parlai/agents/transformer/modules/generator.py

parlai/agents/examples/transformer_variant.py

jxmsML · 2021-04-06T21:47:39Z

LGTM!!! I tried swapping in my custom encoder and works perfect 💯 🚀 Great to have this!!

parlai/agents/transformer/modules/decoder.py

klshuster

for some reason i'm not 100% grasping the interplay between Manifests and ComponentSpecs, but here's my current interpretation:

A Manifest defines the components of a TComponent. So like the encoder manifest defines the layers.
A ComponentSpec defines the module class itself, and the respective manifest for the class.

I like how it's quite customizable if you know what you're doing. It still seems to suffer from a the downsides of a hierarchical setup (you need to traverse quite a few things to find out the different manifests, etc.), but it might be a better tradeoff than what we have already

parlai/agents/examples/transformer_variant.py

parlai/agents/transformer/modules/decoder.py

parlai/agents/transformer/modules/encoder.py

klshuster · 2021-04-07T15:01:29Z

parlai/agents/transformer/modules/encoder.py

+        Forward pass.
+        """
+        residual = tensor
+        if self.variant == 'prelayernorm':


one could also view the different variants as custom variations, right?

For sure! I didn't do anything like that as part of this PR to keep it simple, but I could imagine a pretty simple way to implement that without this if statement:

class PreLayerNormTransformerEncoderLayer(TransformerEncoderLayer): def forward(self, tensor: torch.Tensor, mask: torch.Tensor) -> torch.Tensor: residual = tensor tensor = self.norm1(tensor) attended_tensor = self.attention(tensor, mask=mask)[0] tensor = residual + self.dropout(attended_tensor) residual = tensor tensor = self.norm2(tensor) tensor = residual + self.dropout(self.ffn(tensor)) tensor *= mask.unsqueeze(-1).type_as(tensor) return tensor manifest.encoder_layer = ComponentSpec( PreLayerNormTransformerEncoderLayer, TransformerEncoderLayer.Manifest(), )

To me the code feels simple enough as is for now, but if we end up with a ton of variations then maybe we'll want to split them out into different implementations like this.

yes agreed, not in the scope of this current PR but could be an interesting application from it

spencerp · 2021-04-07T17:27:07Z

Thank you both so much for your comments!!

for some reason i'm not 100% grasping the interplay between Manifests and ComponentSpecs, but here's my current interpretation:

Your summary is accurate. A TComponent is like a template, a Manifest is like a form to fill out with all of the slots in that template, and a ComponentSpec combines these two to fully define a component.

It still seems to suffer from a the downsides of a hierarchical setup (you need to traverse quite a few things to find out the different manifests, etc.), but it might be a better tradeoff than what we have already

This PR is just meant to solve the problem of "how do I swap out a piece of the Transformer without copying and pasting the whole dang file??". I think getting rid of the hierarchy traversal altogether would require a radical refactor such that none of our nn.Modules initialize new nn.Modules in their __init__.

Maybe another option to get around the traversal issue without a radical refactor could be making the Manifest immutable and final (not subclassable). That would enforce that anyone who customizes a component has to write out the whole manifest, not just their incremental change. That would be quite verbose, though.

klshuster · 2021-04-08T18:36:50Z

This PR is just meant to solve the problem of "how do I swap out a piece of the Transformer without copying and pasting the whole dang file??". I think getting rid of the hierarchy traversal altogether would require a radical refactor such that none of our nn.Modules initialize new nn.Modules in their init.

Agreed, let's deal with that separately. I do think this is more flexible than before

Maybe another option to get around the traversal issue without a radical refactor could be making the Manifest immutable and final (not subclassable). That would enforce that anyone who customizes a component has to write out the whole manifest, not just their incremental change. That would be quite verbose, though.

Verbosity, if it improves clarity, may not be the worst thing, but if it requires more work from the user it may not be super desirable

…eaf component substitution

spencerp · 2021-04-09T03:59:11Z

Verbosity, if it improves clarity, may not be the worst thing, but if it requires more work from the user it may not be super desirable

Take a look at examples/transformer_variant.py now. I tried a more verbose way of specifying the components, and I think it makes it easier to get a wholistic view of the architecture of the model.

klshuster · 2021-04-09T13:50:36Z

Verbosity, if it improves clarity, may not be the worst thing, but if it requires more work from the user it may not be super desirable

Take a look at examples/transformer_variant.py now. I tried a more verbose way of specifying the components, and I think it makes it easier to get a wholistic view of the architecture of the model.

Yes, I like that a lot actually; obviously it may get a bit more complicated with more complicated models but that might be a good thing! Since you have everything laid out in one specification

jaseweston · 2021-04-20T13:57:01Z

parlai/agents/transformer/modules/encoder.py

@@ -141,7 +218,7 @@ def _default(val, default):
        self.layers = nn.ModuleList()
        for _ in range(self.n_layers):
            self.layers.append(
-                TransformerEncoderLayer(
+                template.layer.build(


this is really hard to read.. can't it be called encoderlayer_class or something (assuming that's what it is)? template is so generic...

That's pretty much what it was called in the previous commit lol. The more I stare at and play with the code the less sure I am about what's intuitive and what's confusing.

jaseweston · 2021-04-20T13:57:35Z

parlai/agents/transformer/modules/interfaces.py

+    def __init__(
+        self, klass: Type[MC], template: Optional[ModularComponent.Template] = None
+    ) -> None:
+        self._klass = klass


what is 'klass' (sorry)?

It's type-annotated in __init__ for reference, but it's a class. Spelled with a k so it doesn't clash with the python reserved word class. So TransformerEncoderLayer, for example.

jaseweston · 2021-04-20T13:57:43Z

parlai/agents/transformer/modules/generator.py

        self.pad_idx = dictionary[dictionary.null_token]
        self.start_idx = dictionary[dictionary.start_token]
        self.end_idx = dictionary[dictionary.end_token]
        super().__init__(self.pad_idx, self.start_idx, self.end_idx)
+        template = template or self.Template()


parlai/agents/transformer/modules/attention.py

stephenroller · 2021-04-27T11:56:12Z

parlai/agents/examples/transformer_variant.py

+
+    def build_model(self, states=None):
+        wrapped_class = TransformerGeneratorModel.with_components(
+            encoder=TransformerEncoder.with_components(


yes this is the one

agreed, this looks nice

stephenroller

Needs a tutorial in website docs showing essentially the same as examples, but with more prose.

parlai/agents/transformer/modules/interfaces.py

parlai/agents/examples/transformer_variant.py

stephenroller · 2021-04-27T12:08:43Z

(I like this final form a lot)

…s, tests

stephenroller · 2021-04-29T13:20:34Z

docs/source/tutorial_swap_components.md

@@ -0,0 +1,60 @@
+# Swapping Out Model Subcomponents


say Transformer explicitly, since this only works for Transformers

It technically can be added to any model (or any class at all, really). But I'll take your advice to make it easier to think about for now.

spencerp added 3 commits April 1, 2021 10:25

minimum viable component-swappable transformer

6c8bd4e

add example

f3456ea

woops forgot to pass in manifest in example

725bbc9

spencerp requested review from stephenroller and jxmsML April 1, 2021 18:01

facebook-github-bot added the CLA Signed label Apr 1, 2021

spencerp added 2 commits April 1, 2021 11:46

make ComponentSpec immutable

7052932

keep build_encoder and build_decoder backwards compatible, add some c…

b3a1cf6

…omments

spencerp commented Apr 1, 2021

View reviewed changes

parlai/agents/transformer/modules/decoder.py Outdated Show resolved Hide resolved

spencerp added 2 commits April 1, 2021 13:33

autoformat.sh

34f33ec

update signature of overridden build_decoder

a52c96c

spencerp changed the title ~~Transformer Manifest MVP~~ Swappable Transformer Components Apr 1, 2021

spencerp requested review from klshuster, EricMichaelSmith and jaseweston April 6, 2021 14:51

jxmsML reviewed Apr 6, 2021

View reviewed changes

parlai/agents/transformer/modules/generator.py Outdated Show resolved Hide resolved

jxmsML reviewed Apr 6, 2021

View reviewed changes

parlai/agents/transformer/modules/generator.py Outdated Show resolved Hide resolved

jxmsML reviewed Apr 6, 2021

View reviewed changes

parlai/agents/examples/transformer_variant.py Outdated Show resolved Hide resolved

jxmsML approved these changes Apr 6, 2021

View reviewed changes

jxmsML reviewed Apr 6, 2021

View reviewed changes

parlai/agents/transformer/modules/decoder.py Outdated Show resolved Hide resolved

jxmsML self-requested a review April 6, 2021 21:57

klshuster reviewed Apr 7, 2021

View reviewed changes

spencerp added 3 commits April 8, 2021 15:47

address comments: rename manifest/tcomponent and provide example of l…

5e539cc

…eaf component substitution

Merge branch 'master' into transformer-manifest-mvp

265699e

update comment with new naming

f3db720

spencerp added 2 commits April 9, 2021 07:13

explicitly labeling some model components as static

8a3aae3

tweak ModuleComponentSpec

12f88f0

jaseweston reviewed Apr 20, 2021

View reviewed changes

spencerp added 6 commits April 20, 2021 08:52

Merge branch 'master' into transformer-manifest-mvp

2f9b08c

add fully-specified example

93ea7e7

add another example

c4b6924

remove weird auto-import

d871fd8

simplify API

b95135e

stick everything in a decorator

8f8d867

stephenroller reviewed Apr 27, 2021

View reviewed changes

parlai/agents/transformer/modules/attention.py Outdated Show resolved Hide resolved

stephenroller reviewed Apr 27, 2021

View reviewed changes

parlai/agents/transformer/modules/interfaces.py Outdated Show resolved Hide resolved

parlai/agents/examples/transformer_variant.py Show resolved Hide resolved

spencerp added 3 commits April 27, 2021 19:35

remove StaticComponent, clean up implementation a little, add comment…

6f5817b

…s, tests

add website docs

9a16730

obey header level rule in docs

e175f58

stephenroller reviewed Apr 29, 2021

View reviewed changes

stephenroller approved these changes Apr 29, 2021

View reviewed changes

spencerp added 2 commits April 29, 2021 06:30

in docs, Model->Transformer

e7cdb06

Merge branch 'master' into transformer-manifest-mvp

9752d2c

spencerp merged commit 6040079 into master Apr 29, 2021

spencerp deleted the transformer-manifest-mvp branch April 29, 2021 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swappable Transformer Components #3567

Swappable Transformer Components #3567

spencerp commented Apr 1, 2021 •

edited

Loading

jxmsML commented Apr 6, 2021

klshuster left a comment

klshuster Apr 7, 2021

spencerp Apr 7, 2021

klshuster Apr 8, 2021

spencerp commented Apr 7, 2021

klshuster commented Apr 8, 2021

spencerp commented Apr 9, 2021

klshuster commented Apr 9, 2021

jaseweston Apr 20, 2021

spencerp Apr 20, 2021

jaseweston Apr 20, 2021

spencerp Apr 20, 2021

jaseweston Apr 20, 2021

stephenroller Apr 27, 2021

klshuster Apr 27, 2021

stephenroller left a comment

stephenroller commented Apr 27, 2021

stephenroller Apr 29, 2021

spencerp Apr 29, 2021

Swappable Transformer Components #3567

Swappable Transformer Components #3567

Conversation

spencerp commented Apr 1, 2021 • edited Loading

Problem

Purpose of this PR

NOT the Purpose of this PR

Overview

Testing

jxmsML commented Apr 6, 2021

klshuster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerp commented Apr 7, 2021

klshuster commented Apr 8, 2021

spencerp commented Apr 9, 2021

klshuster commented Apr 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenroller left a comment

Choose a reason for hiding this comment

stephenroller commented Apr 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerp commented Apr 1, 2021 •

edited

Loading