-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Breaking transformer modules.py into smaller files #3466
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm certainly not opposed to this, as it seems to be a cosmetically pleasing change!
would just hope that nothing breaks of course. looks like tests currently failing but seemed flaky so i re-ran them
LayerNorm, | ||
normalize, | ||
MultiHeadAttention, | ||
TransformerFFN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what we really need is though, if you wanted to say change the FFN but not copy all the code this shouldnt be a hard import in here, otherwise when trying out new architectures you still have to copy this file just to change this one line.. so i'm not sure separating into separate files is actually helping that much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah this isn't the whole refactor. How I'm thinking about it is from two angles:
- (the simpler change that this PR is moving toward) It should be easier to reuse/reorder components of the transformer. So if you want to build a new architecture you could mostly build it from components instead of copying and pasting all the code into an empty file.
- (the harder change that you're referring to) It should be easier to swap out a specific component while keeping the rest of the architecture intact.
For 2, I've been leaning towards a builder + protocol pattern. But I'm hoping to get feedback on that part in more detail in our next ParlAI meeting.
Honestly we could accomplish 1 without this change, but as I was playing with the code this reorganization felt easier to understand. And I'm hoping that keeping components in separate files encourages the code to stay as clean as it is now. But if this feels more confusing or cumbersome for any reason we don't need to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
hmm maybe the laziest way of doing (2) with very little code change would be to import the modules from modules.py which is a file in the same directory.. and that file points to the actual implementations such as modules/ffn.py (but doesn't have any other code)
This way in a new agent directory you only have to have that small file pointing to your reimplementations (+ the top level model.py which inherits transformer)
would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! It sounds similar to what I'm thinking, but exploits the directory structure more which has some advantages (one big one being less code like you mentioned).
With that pattern, how would you tell the TransformerGeneratorModel that it should look in .modules
relative to the agent directory? Are you thinking we'd just look for a standard class name (like how we look for a DefaultTeacher)?
I'm curious how unittest.mock.patch
works. Maybe your pattern could whatever that does. I think it might just reassign the module attribute after importing.
One thing I'd like to move away from is inheritance (though I'm sure people will have opinions about this). Personally I've found that trying to follow the movement of data through the code gets very difficult when I have to keep moving between classes. And it can get particularly hairy when subclass logic modifies properties used by superclass logic. So moving towards protocols as part of the protocol + builder pattern I mentioned is kind of accomplishing multiple things from my perspective. But if it turns out that I'm the only one and everyone else loves inheritance then that pattern becomes less enticing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like:
from .ffn import TransformerFFN
already looks in the current agent's directory...
if you want to use the classic you use
from parlai.agents.transformer.modules.fnn import TransformerFFN
i guess?
the good thing about this it's a pretty small code change from where we are now, so we could just try it out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will +1 the confusion that comes with inheritance; I'm curious to see what this protocol/builder pattern is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You maybe could do Jason's proposal with something like:
try:
from .local_modules import Whatever
except ImportError:
from parlai.agents.transformer.modules.whatever import Whatever
I'm not reaaaaally sure i like that tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so does this proposal still require copying TransformerGeneratorModel
into your new agent?
The way I'm interpreting what you guys are saying is that you'd do something like:
$ cd ~/ParlAI/parlai
$ cp agents/transformer/generator.py agents/my_custom_transformer/my_custom_transformer.py
$ touch agents/my_custom_transformer/modules.py
Then you'd implement your own TransformerFFN
in agents/my_custom_transformer/modules.py
.
This would require that
try:
from .modules import TransformerFFN
except ImportError:
from parlai.agents.transformer.modules import TransformerFFN
and all other component imports live in whichever file you've copied into your agent directory (in this example, generator.py). Which I think brings us back to a monolithic file?
A way to accomplish this without all the imports living in a single file could be monkey patching. I think it would something like this in agents/my_custom_transformer/my_custom_transformer.py
:
import parlai.agents.transformer.generator as transformer_generator
from parlai.core.torch_generator_agent import TorchGeneratorAgent
class MyCustomMHA(nn.Module):
...
transformer_generator.modules.MultiHeadAttention = MyCustomMHA
class MyCustomTransformerAgent(TorchGeneratorAgent):
def build_model(self, states=None):
return transformer_generator.TransformerGeneratorModel(self.opt, self.dict)
This provides a lot of flexibility and doesn't require much code, but comes with a lot of dangers (the wikipedia article has a nice list under "Pitfalls": https://en.wikipedia.org/wiki/Monkey_patch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, my intuition probably would have been either to do monkey patching or registering subcomponents, but I admit the limitations with monkey patching so I'll defer to others who have thought more about this. But personally I'll say that I've never minded inheritance-based solutions - I use an IDE (PyCharm) that tracks the value of all methods/attributes in arbitrary subclasses. But that's just me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole debate just sounds like a poor man's protocol pattern to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to have! I'm curious about your builder + protocol pattern - I think I'd need to visualize more what the immediate usecases will be of swapping out certain components for others
LayerNorm, | ||
normalize, | ||
MultiHeadAttention, | ||
TransformerFFN, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, my intuition probably would have been either to do monkey patching or registering subcomponents, but I admit the limitations with monkey patching so I'll defer to others who have thought more about this. But personally I'll say that I've never minded inheritance-based solutions - I use an IDE (PyCharm) that tracks the value of all methods/attributes in arbitrary subclasses. But that's just me...
Patch description
This is one small step in a larger effort to refactor the transformer code to make it easier to modify and extend. There shouldn't be any functional changes as part of this diff, just organizational.
Breaking modules.py into smaller files is motivated by a couple things:
If this seems more confusing instead of less confusing to anyone, please let me know!
Testing steps
Circle CI.
Sanity check with a canonical example: