-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple way of producing two independent embeddings #238
Comments
I found some previous answers that were relevant, and even if they do not give all the details, I managed to get something working. I added a last layer to the transformer with several Dense instances. Then, depending on the value of self.condition, one instance is chosen:
So when I encode from 1, I put first I had to make other changes in |
Hi @fjhheras I think the best way is to integration the information on the condition into the dataloader. This information is passed to all intermediate modules and could be read from there. Best |
Thank you for your answer, @nreimers How would you send information to all modules? For example, If I add the key |
I can bypass the first module by creating a method forward in
not sure how general or desirable this would be (and still not sure how to do the equivalent for training)... |
Hi @fjhheras Than in the dense layer, you can check for features['condition'] and either pass it through an identity layer or through a non-linear dense layer. But I'm not sure yet how to get the datareader so that it can add features to input text, which are preserved withing the sequential pipeline. |
Yes, I understood your suggestion. But the first module does not seem to accept an extra key in the features dictionary. At least in the way it is called in |
I am stuck in the same situation where I want to: Training two independent transformer models (one for 1, the other for 2) Any help will be appreciated. Thank You! |
Just for completeness: See #328, there I comment an easy method to create two independent embeddings for different inputs without needing any code changes. |
I would like to finetune BERT (or similar) models for an asymmetric task using two different embeddings. There will be two inputs (1 and 2), and I would use an embedding in 1 and an embedding in 2 to build meaningful distances between 1 and 2. But I cannot use a common embedding, because sentences in 1 are of very different nature from sentences in 2 (it is not exactly like that, but you can think of questions and answers)
I have thought about several options:
Input 1 >> transformer 1 >> Pooling >> Output 1
Input 2 >> transformer 2 >> Pooling >> Output 2
Input 1 >> transformer >> Pooling >> Output 1
Input 2 >> transformer >> Pooling >> extra layer >> Output 2
or
Input 1 >> transformer >> Pooling >> Output 1
Input 2 >> transformer >> extra layer >> Pooling >> Output 2
Do you think there is an easy way to do this by adapting one of the training scripts? I would appreciate some guidance about what codes I can try to adapt in my use case, so I can make the most of the code that is already in this repo!
The text was updated successfully, but these errors were encountered: