Simple way of producing two independent embeddings #238

fjhheras · 2020-05-20T16:05:02Z

I would like to finetune BERT (or similar) models for an asymmetric task using two different embeddings. There will be two inputs (1 and 2), and I would use an embedding in 1 and an embedding in 2 to build meaningful distances between 1 and 2. But I cannot use a common embedding, because sentences in 1 are of very different nature from sentences in 2 (it is not exactly like that, but you can think of questions and answers)

I have thought about several options:

Training two independent transformer models (one for 1, the other for 2)

Input 1 >> transformer 1 >> Pooling >> Output 1
Input 2 >> transformer 2 >> Pooling >> Output 2

Training the same transformer model, but adding extra layers to one of them. For example:

Input 1 >> transformer >> Pooling >> Output 1
Input 2 >> transformer >> Pooling >> extra layer >> Output 2

or

Input 1 >> transformer >> Pooling >> Output 1
Input 2 >> transformer >> extra layer >> Pooling >> Output 2

Do you think there is an easy way to do this by adapting one of the training scripts? I would appreciate some guidance about what codes I can try to adapt in my use case, so I can make the most of the code that is already in this repo!

fjhheras · 2020-05-21T13:34:33Z

I found some previous answers that were relevant, and even if they do not give all the details, I managed to get something working. I added a last layer to the transformer with several Dense instances. Then, depending on the value of self.condition, one instance is chosen:

    def __init__(self, in_features, out_features, bias=True,
                 activation_function=nn.Tanh(), conditions=None):
        ........
        self.conditions = conditions
        dense_dict = {key: Dense(in_features, out_features, bias=bias,
                                 activation_function=activation_function)
                      for key in self.conditions}
        self.dense_dict = nn.ModuleDict(dense_dict)
    def forward(self, features):
        return self.dense_dict[self.condition].forward(features)

So when I encode from 1, I put first list(module.children())[-1].condition='1', etc. It is not beautiful (monkey patching), but it works. If I write a PR to make a layer like this, would you be interested?

I had to make other changes in CosineSimilariryLoss.py and EmbeddingSimilarityEvaluator.py, (to change condition before each call to encode).

nreimers · 2020-05-21T18:48:38Z

Hi @fjhheras
Yes, if a nice and clean integration of that would be quite cool.

I think the best way is to integration the information on the condition into the dataloader. This information is passed to all intermediate modules and could be read from there.

Best
Nils

fjhheras · 2020-05-22T15:23:10Z

Thank you for your answer, @nreimers

How would you send information to all modules?

For example, SentenceTransformer.encode calls self.forward(features). This forward is inherited from nn.Sequential, so it sends all the arguments to the first module (in the case I am testing modules/BERT), which does self.bert(**features), where self.bert is a huggingface transformer.

If I add the key text_type to the dictionary features it fails with an error because the huggingface transformer does not have that keyword argument. Even if the last module has that key, it will not work.

fjhheras · 2020-05-22T16:08:58Z

I can bypass the first module by creating a method forward in SentenceTransformer:

    def forward(self, features, intermediate_features=None):
        for i, module in enumerate(self):
            if i == 1 and intermediate_features is not None:
                features.update(intermediate_features)
            features = module(features)
        return features

not sure how general or desirable this would be (and still not sure how to do the equivalent for training)...

nreimers · 2020-05-22T17:52:03Z

Hi @fjhheras
My idea was more to inject into the features array a new key, like:
features['condition'] = 1

Than in the dense layer, you can check for features['condition'] and either pass it through an identity layer or through a non-linear dense layer.

But I'm not sure yet how to get the datareader so that it can add features to input text, which are preserved withing the sequential pipeline.

fjhheras · 2020-05-22T18:00:27Z

Yes, I understood your suggestion. But the first module does not seem to accept an extra key in the features dictionary. At least in the way it is called in SentenceTransformer.encode

umairspn · 2020-07-28T07:22:32Z

I am stuck in the same situation where I want to:

Training two independent transformer models (one for 1, the other for 2)
Input 1 >> transformer 1 >> Pooling >> Output 1
Input 2 >> transformer 2 >> Pooling >> Output 2

Any help will be appreciated. Thank You!

nreimers · 2020-07-28T07:51:26Z

Just for completeness: See #328, there I comment an easy method to create two independent embeddings for different inputs without needing any code changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple way of producing two independent embeddings #238

Simple way of producing two independent embeddings #238

fjhheras commented May 20, 2020 •

edited

Loading

fjhheras commented May 21, 2020 •

edited

Loading

nreimers commented May 21, 2020

fjhheras commented May 22, 2020 •

edited

Loading

fjhheras commented May 22, 2020 •

edited

Loading

nreimers commented May 22, 2020

fjhheras commented May 22, 2020 •

edited

Loading

umairspn commented Jul 28, 2020

nreimers commented Jul 28, 2020

Simple way of producing two independent embeddings #238

Simple way of producing two independent embeddings #238

Comments

fjhheras commented May 20, 2020 • edited Loading

fjhheras commented May 21, 2020 • edited Loading

nreimers commented May 21, 2020

fjhheras commented May 22, 2020 • edited Loading

fjhheras commented May 22, 2020 • edited Loading

nreimers commented May 22, 2020

fjhheras commented May 22, 2020 • edited Loading

umairspn commented Jul 28, 2020

nreimers commented Jul 28, 2020

fjhheras commented May 20, 2020 •

edited

Loading

fjhheras commented May 21, 2020 •

edited

Loading

fjhheras commented May 22, 2020 •

edited

Loading

fjhheras commented May 22, 2020 •

edited

Loading

fjhheras commented May 22, 2020 •

edited

Loading