Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple way of producing two independent embeddings #238

Open
fjhheras opened this issue May 20, 2020 · 8 comments
Open

Simple way of producing two independent embeddings #238

fjhheras opened this issue May 20, 2020 · 8 comments

Comments

@fjhheras
Copy link

fjhheras commented May 20, 2020

I would like to finetune BERT (or similar) models for an asymmetric task using two different embeddings. There will be two inputs (1 and 2), and I would use an embedding in 1 and an embedding in 2 to build meaningful distances between 1 and 2. But I cannot use a common embedding, because sentences in 1 are of very different nature from sentences in 2 (it is not exactly like that, but you can think of questions and answers)

I have thought about several options:

  • Training two independent transformer models (one for 1, the other for 2)

Input 1 >> transformer 1 >> Pooling >> Output 1
Input 2 >> transformer 2 >> Pooling >> Output 2

  • Training the same transformer model, but adding extra layers to one of them. For example:

Input 1 >> transformer >> Pooling >> Output 1
Input 2 >> transformer >> Pooling >> extra layer >> Output 2

or

Input 1 >> transformer >> Pooling >> Output 1
Input 2 >> transformer >> extra layer >> Pooling >> Output 2

Do you think there is an easy way to do this by adapting one of the training scripts? I would appreciate some guidance about what codes I can try to adapt in my use case, so I can make the most of the code that is already in this repo!

@fjhheras
Copy link
Author

fjhheras commented May 21, 2020

I found some previous answers that were relevant, and even if they do not give all the details, I managed to get something working. I added a last layer to the transformer with several Dense instances. Then, depending on the value of self.condition, one instance is chosen:

    def __init__(self, in_features, out_features, bias=True,
                 activation_function=nn.Tanh(), conditions=None):
        ........
        self.conditions = conditions
        dense_dict = {key: Dense(in_features, out_features, bias=bias,
                                 activation_function=activation_function)
                      for key in self.conditions}
        self.dense_dict = nn.ModuleDict(dense_dict)
    def forward(self, features):
        return self.dense_dict[self.condition].forward(features)

So when I encode from 1, I put first list(module.children())[-1].condition='1', etc. It is not beautiful (monkey patching), but it works. If I write a PR to make a layer like this, would you be interested?

I had to make other changes in CosineSimilariryLoss.py and EmbeddingSimilarityEvaluator.py, (to change condition before each call to encode).

@nreimers
Copy link
Member

Hi @fjhheras
Yes, if a nice and clean integration of that would be quite cool.

I think the best way is to integration the information on the condition into the dataloader. This information is passed to all intermediate modules and could be read from there.

Best
Nils

@fjhheras
Copy link
Author

fjhheras commented May 22, 2020

Thank you for your answer, @nreimers

How would you send information to all modules?

For example, SentenceTransformer.encode calls self.forward(features). This forward is inherited from nn.Sequential, so it sends all the arguments to the first module (in the case I am testing modules/BERT), which does self.bert(**features), where self.bert is a huggingface transformer.

If I add the key text_type to the dictionary features it fails with an error because the huggingface transformer does not have that keyword argument. Even if the last module has that key, it will not work.

@fjhheras
Copy link
Author

fjhheras commented May 22, 2020

I can bypass the first module by creating a method forward in SentenceTransformer:

    def forward(self, features, intermediate_features=None):
        for i, module in enumerate(self):
            if i == 1 and intermediate_features is not None:
                features.update(intermediate_features)
            features = module(features)
        return features

not sure how general or desirable this would be (and still not sure how to do the equivalent for training)...

@nreimers
Copy link
Member

Hi @fjhheras
My idea was more to inject into the features array a new key, like:
features['condition'] = 1

Than in the dense layer, you can check for features['condition'] and either pass it through an identity layer or through a non-linear dense layer.

But I'm not sure yet how to get the datareader so that it can add features to input text, which are preserved withing the sequential pipeline.

@fjhheras
Copy link
Author

fjhheras commented May 22, 2020

Yes, I understood your suggestion. But the first module does not seem to accept an extra key in the features dictionary. At least in the way it is called in SentenceTransformer.encode

@umairspn
Copy link

I am stuck in the same situation where I want to:

Training two independent transformer models (one for 1, the other for 2)
Input 1 >> transformer 1 >> Pooling >> Output 1
Input 2 >> transformer 2 >> Pooling >> Output 2

Any help will be appreciated. Thank You!

@nreimers
Copy link
Member

Just for completeness: See #328, there I comment an easy method to create two independent embeddings for different inputs without needing any code changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants