Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Initializing segment embeddings on Transformer Encoder #3679

Closed
wade3han opened this issue May 28, 2021 · 1 comment
Closed

Initializing segment embeddings on Transformer Encoder #3679

wade3han opened this issue May 28, 2021 · 1 comment

Comments

@wade3han
Copy link
Contributor

wade3han commented May 28, 2021

Hello! I figured out that the code is not initializing the segment embeddings on Transformer encoder: https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules/encoder.py#L201-L202

I found that initializing the segment embeddings induces the model to converge faster, when fine-tuning the seq2seq model (such as Blender) with the usage of segment embeddings. (Note that the pre-trained model didn't use the segment embeddings)
I guess adding the initialization code as below will be helpful.

        if self.n_segments >= 1:
            self.segment_embeddings = nn.Embedding(self.n_segments, self.dim)
            nn.init.normal_(self.segment_embeddings.weight, 0, self.dim ** -0.5)

Anyway, it may be an intended feature, then is there reason you didn't apply the initialization for the segment embeddings?

@stephenroller
Copy link
Contributor

Nah that was overlooked. That's a good point. I'd happily accept a patch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants