You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 3, 2023. It is now read-only.
I found that initializing the segment embeddings induces the model to converge faster, when fine-tuning the seq2seq model (such as Blender) with the usage of segment embeddings. (Note that the pre-trained model didn't use the segment embeddings)
I guess adding the initialization code as below will be helpful.
Hello! I figured out that the code is not initializing the segment embeddings on Transformer encoder: https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/transformer/modules/encoder.py#L201-L202
I found that initializing the segment embeddings induces the model to converge faster, when fine-tuning the seq2seq model (such as Blender) with the usage of segment embeddings. (Note that the pre-trained model didn't use the segment embeddings)
I guess adding the initialization code as below will be helpful.
Anyway, it may be an intended feature, then is there reason you didn't apply the initialization for the segment embeddings?
The text was updated successfully, but these errors were encountered: