-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use different Transformer model for each sentence embedding #328
Comments
Hi @umairspn I thing that works well (and simplifies a lot of things) is to prepend your input with special tokens, e.g. your input A looks like: and for Input B: All the inputs are fed to the same transformer network. The transformer networks applies self-attention across all layers, so I will be able to learn what the difference are between inputs from A and inputs from B. Also, the information if this is an input for A or an input for B will be available at every token. This has several advantages:
Best |
Hi @nreimers, |
Hi @umairspn You do this quite often in machine translation, as you have one transformer network that can translate in different directions, e.g. English to Spanish and French to Chinese. This is achieved by just adding special tokens for your input / target language. Training this single transformer network with special tokens greatly outperforms the setup where you have independent transformer networks for your different languages. So even if sent-B is of completely different nature, I don't see why it shouldn't work. Back to your question: Then, you would need another models class that has two transformer networks. Based on the sentence feature input, it could then route the input to either model A or model B. Your starting point would be to create a new class similar to models.Transformer |
@nreimers I get the overall idea and it makes much sense now. I'll go with the pretending method and work my way up to the separate transformers method if required. |
For completeness, here are two papers that use this special token adding: mBART: https://arxiv.org/abs/2001.08210 - For machine translation Here is an code example how you can add new special tokens to BERT tokenizer:
|
I am trying to use a different Transformer model for each sentence passed into the SBERT code.
e.g. For sent-A, if I am using Transformer1 (say. Bert-base-uncased).
For sent-B, I want to use a different Transformer2 (say. Bert-base-NLI)
The rest of the process is the same. It generates embedding of sent-A and sent-B using different Transformers, adds pooling operation, and concatenates them.
Any help will be appreciated!
Thank You.
The text was updated successfully, but these errors were encountered: