Language modeling head for flexible head classes #53

sosuperic · 2020-09-01T23:17:31Z

Hi! I have many language adapters, each trained with masked language modeling on different (English) datasets.

I want to be able to load 1 BERT model, load each of the adapters, and then decide which adapter to use on any given forward pass. This would save on memory and loading time, as opposed to loading a separate BERT for each adapter. This seems possible -- here's what I'm doing:

model_name = 'bert-base-cased'
model = BertForMaskedLM.from_pretrained(model_name)
model.load_adapter(ADAPTER1)
model.load_adapter(ADAPTER2)
# In practice, I have many more adapters

output1 = model(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER1])
output2 = model(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER2])

However, I am getting different outputs than having one model per adapter. For example, output1 above is different from output1 below, and output2 above is different from output2 below.

model_name = 'bert-base-cased'

model1 = BertForMaskedLM.from_pretrained(model_name)
model1.load_adapter(ADAPTER1)
output1 = model1(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER1])

model2 = BertForMaskedLM.from_pretrained(model_name)
model2.load_adapter(ADAPTER2)
output2 = model2(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER2])

I'm trying to read https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_model_mixin.py#L339 to see why this would be the case. Is this expected behavior? Am I doing something wrong?

Thanks!

The text was updated successfully, but these errors were encountered:

calpt · 2020-09-03T17:09:09Z

Hi @sosuperic!

The code you provided looks right and the behaviour you describe is expected (although certainly not optimal):

As you use a derived model with prediction head (BertForMaskedLM), all adapters are trained, saved & loaded with the corresponding prediction layer (LM head). However, while we can add an arbitrary number of adapters in parallel, every standard model class only has a maximum of one head. Now, if you load a second adapter to the same model, the prediction head loaded with the first model is overwritten, and so on.

To enable the exact scenario you described, we added new model classes with "multiple heads" additionally to the standard classes already provided by HuggingFace. Unfortunately, these classes currently don't provide an LM head out of the box.

The only option I can currently think of to make your setup work with your existing adapters would be to adapt the BertForMaskedLM class, e.g. to share a base BertModel instance between multiple models with LM heads.

Hope this helped a bit!

sosuperic · 2020-09-10T02:04:08Z

Hey @calpt! Thanks for the info. If I have time, I might try to add a LM head and make a pull request if you/someone could give feedback. I assume it'd be adding a function similar to add_classification_head in BertModelHeadsMixin?

calpt · 2020-09-14T13:54:56Z

@sosuperic that would be great, I would be happy to help/ give feedback.

I assume it'd be adding a function similar to add_classification_head in BertModelHeadsMixin?

Yes, adding a similar function and adding a case in the forward_head() method.

rdenaux · 2021-01-12T10:03:08Z

Hi @sosuperic , did you manage to try out your changes? I'm just getting started with the adapters library and have a similar use-case, so this feature would be very useful.
@calpt , can you confirm this issue is still open?

calpt · 2021-02-03T13:16:01Z

@rdenaux Yes, this issue is still open, switching between multiple loaded adapters is currently not possible with an LM head.

rdenaux · 2021-02-03T15:56:55Z

Thanks for the confirmation, @calpt . However, in the end our use cases could be handled with the models with multiple heads you mentioned above.

calpt added the question Further information is requested label Sep 8, 2020

calpt added enhancement New feature or request and removed question Further information is requested labels Mar 8, 2021

calpt changed the title ~~Switching adapters at inference time~~ Language modeling head for flexible head classes Jul 23, 2021

calpt mentioned this issue Jul 23, 2021

Language modeling flex heads #210

Merged

calpt self-assigned this Jul 23, 2021

calpt closed this as completed in #210 Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language modeling head for flexible head classes #53

Language modeling head for flexible head classes #53

sosuperic commented Sep 1, 2020 •

edited

Loading

calpt commented Sep 3, 2020

sosuperic commented Sep 10, 2020

calpt commented Sep 14, 2020

rdenaux commented Jan 12, 2021

calpt commented Feb 3, 2021 •

edited

Loading

rdenaux commented Feb 3, 2021

Language modeling head for flexible head classes #53

Language modeling head for flexible head classes #53

Comments

sosuperic commented Sep 1, 2020 • edited Loading

calpt commented Sep 3, 2020

sosuperic commented Sep 10, 2020

calpt commented Sep 14, 2020

rdenaux commented Jan 12, 2021

calpt commented Feb 3, 2021 • edited Loading

rdenaux commented Feb 3, 2021

sosuperic commented Sep 1, 2020 •

edited

Loading

calpt commented Feb 3, 2021 •

edited

Loading