Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Breaking transformer modules.py into smaller files #3466

Merged
merged 9 commits into from
Mar 23, 2021
Merged

Conversation

spencerp
Copy link
Contributor

@spencerp spencerp commented Feb 24, 2021

Patch description
This is one small step in a larger effort to refactor the transformer code to make it easier to modify and extend. There shouldn't be any functional changes as part of this diff, just organizational.

Breaking modules.py into smaller files is motivated by a couple things:

  • It's currently 1490 lines long. With smaller files, I hope to make each component easier to read, and make it more obvious the ways in which they are independent from or dependent of each other.
  • It contains a mixture of composable components that are used throughout the code (TransformerEncoder, MultiHeadAttention, etc) and complete models that usually stand on their own (TransformerMemNetModel and TransformerGeneratorModel). By making the independence of these subcomponents more explicit, I hope to encourage their reuse.

If this seems more confusing instead of less confusing to anyone, please let me know!

Testing steps
Circle CI.
Sanity check with a canonical example:

parlai train_model --task twitter --model-file /tmp/tr_twitter --model transformer/ranker --batchsize 16 --validation-every-n-secs 3600 --candidates batch --eval-candidates batch --data-parallel True

Copy link
Contributor

@klshuster klshuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm certainly not opposed to this, as it seems to be a cosmetically pleasing change!

would just hope that nothing breaks of course. looks like tests currently failing but seemed flaky so i re-ran them

LayerNorm,
normalize,
MultiHeadAttention,
TransformerFFN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we really need is though, if you wanted to say change the FFN but not copy all the code this shouldnt be a hard import in here, otherwise when trying out new architectures you still have to copy this file just to change this one line.. so i'm not sure separating into separate files is actually helping that much

@stephenroller ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah this isn't the whole refactor. How I'm thinking about it is from two angles:

  1. (the simpler change that this PR is moving toward) It should be easier to reuse/reorder components of the transformer. So if you want to build a new architecture you could mostly build it from components instead of copying and pasting all the code into an empty file.
  2. (the harder change that you're referring to) It should be easier to swap out a specific component while keeping the rest of the architecture intact.

For 2, I've been leaning towards a builder + protocol pattern. But I'm hoping to get feedback on that part in more detail in our next ParlAI meeting.

Honestly we could accomplish 1 without this change, but as I was playing with the code this reorganization felt easier to understand. And I'm hoping that keeping components in separate files encourages the code to stay as clean as it is now. But if this feels more confusing or cumbersome for any reason we don't need to do this.

Copy link
Contributor

@jaseweston jaseweston Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it.

hmm maybe the laziest way of doing (2) with very little code change would be to import the modules from modules.py which is a file in the same directory.. and that file points to the actual implementations such as modules/ffn.py (but doesn't have any other code)
This way in a new agent directory you only have to have that small file pointing to your reimplementations (+ the top level model.py which inherits transformer)

would that work?

Copy link
Contributor Author

@spencerp spencerp Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! It sounds similar to what I'm thinking, but exploits the directory structure more which has some advantages (one big one being less code like you mentioned).

With that pattern, how would you tell the TransformerGeneratorModel that it should look in .modules relative to the agent directory? Are you thinking we'd just look for a standard class name (like how we look for a DefaultTeacher)?

I'm curious how unittest.mock.patch works. Maybe your pattern could whatever that does. I think it might just reassign the module attribute after importing.

One thing I'd like to move away from is inheritance (though I'm sure people will have opinions about this). Personally I've found that trying to follow the movement of data through the code gets very difficult when I have to keep moving between classes. And it can get particularly hairy when subclass logic modifies properties used by superclass logic. So moving towards protocols as part of the protocol + builder pattern I mentioned is kind of accomplishing multiple things from my perspective. But if it turns out that I'm the only one and everyone else loves inheritance then that pattern becomes less enticing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like:

from .ffn import TransformerFFN

already looks in the current agent's directory...
if you want to use the classic you use
from parlai.agents.transformer.modules.fnn import TransformerFFN
i guess?

the good thing about this it's a pretty small code change from where we are now, so we could just try it out...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will +1 the confusion that comes with inheritance; I'm curious to see what this protocol/builder pattern is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You maybe could do Jason's proposal with something like:

try:
    from .local_modules import Whatever
except ImportError:
    from parlai.agents.transformer.modules.whatever import Whatever

I'm not reaaaaally sure i like that tho.

Copy link
Contributor Author

@spencerp spencerp Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so does this proposal still require copying TransformerGeneratorModel into your new agent?

The way I'm interpreting what you guys are saying is that you'd do something like:

$ cd ~/ParlAI/parlai
$ cp agents/transformer/generator.py agents/my_custom_transformer/my_custom_transformer.py
$ touch agents/my_custom_transformer/modules.py

Then you'd implement your own TransformerFFN in agents/my_custom_transformer/modules.py.

This would require that

try:
    from .modules import TransformerFFN
except ImportError:
    from parlai.agents.transformer.modules import TransformerFFN

and all other component imports live in whichever file you've copied into your agent directory (in this example, generator.py). Which I think brings us back to a monolithic file?

A way to accomplish this without all the imports living in a single file could be monkey patching. I think it would something like this in agents/my_custom_transformer/my_custom_transformer.py:

import parlai.agents.transformer.generator as transformer_generator
from parlai.core.torch_generator_agent import TorchGeneratorAgent

class MyCustomMHA(nn.Module):
    ...

transformer_generator.modules.MultiHeadAttention = MyCustomMHA

class MyCustomTransformerAgent(TorchGeneratorAgent):
    def build_model(self, states=None):
        return transformer_generator.TransformerGeneratorModel(self.opt, self.dict)

This provides a lot of flexibility and doesn't require much code, but comes with a lot of dangers (the wikipedia article has a nice list under "Pitfalls": https://en.wikipedia.org/wiki/Monkey_patch)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my intuition probably would have been either to do monkey patching or registering subcomponents, but I admit the limitations with monkey patching so I'll defer to others who have thought more about this. But personally I'll say that I've never minded inheritance-based solutions - I use an IDE (PyCharm) that tracks the value of all methods/attributes in arbitrary subclasses. But that's just me...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole debate just sounds like a poor man's protocol pattern to me...

Copy link
Contributor

@EricMichaelSmith EricMichaelSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to have! I'm curious about your builder + protocol pattern - I think I'd need to visualize more what the immediate usecases will be of swapping out certain components for others

LayerNorm,
normalize,
MultiHeadAttention,
TransformerFFN,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my intuition probably would have been either to do monkey patching or registering subcomponents, but I admit the limitations with monkey patching so I'll defer to others who have thought more about this. But personally I'll say that I've never minded inheritance-based solutions - I use an IDE (PyCharm) that tracks the value of all methods/attributes in arbitrary subclasses. But that's just me...

@spencerp spencerp changed the title [RFC] Breaking transformer modules.py into smaller files Breaking transformer modules.py into smaller files Mar 23, 2021
@spencerp spencerp merged commit 7171d7c into master Mar 23, 2021
@spencerp spencerp deleted the tf-components branch March 23, 2021 18:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants