Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language modeling head for flexible head classes #53

Closed
sosuperic opened this issue Sep 1, 2020 · 6 comments · Fixed by #210
Closed

Language modeling head for flexible head classes #53

sosuperic opened this issue Sep 1, 2020 · 6 comments · Fixed by #210
Assignees
Labels
enhancement New feature or request

Comments

@sosuperic
Copy link

sosuperic commented Sep 1, 2020

Hi! I have many language adapters, each trained with masked language modeling on different (English) datasets.

I want to be able to load 1 BERT model, load each of the adapters, and then decide which adapter to use on any given forward pass. This would save on memory and loading time, as opposed to loading a separate BERT for each adapter. This seems possible -- here's what I'm doing:

model_name = 'bert-base-cased'
model = BertForMaskedLM.from_pretrained(model_name)
model.load_adapter(ADAPTER1)
model.load_adapter(ADAPTER2)
# In practice, I have many more adapters

output1 = model(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER1])
output2 = model(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER2])

However, I am getting different outputs than having one model per adapter. For example, output1 above is different from output1 below, and output2 above is different from output2 below.

model_name = 'bert-base-cased'

model1 = BertForMaskedLM.from_pretrained(model_name)
model1.load_adapter(ADAPTER1)
output1 = model1(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER1])

model2 = BertForMaskedLM.from_pretrained(model_name)
model2.load_adapter(ADAPTER2)
output2 = model2(input_ids, token_type_ids, attention_mask, adapter_names=[ADAPTER2])

I'm trying to read https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_model_mixin.py#L339 to see why this would be the case. Is this expected behavior? Am I doing something wrong?

Thanks!

@calpt
Copy link
Member

calpt commented Sep 3, 2020

Hi @sosuperic!

The code you provided looks right and the behaviour you describe is expected (although certainly not optimal):

As you use a derived model with prediction head (BertForMaskedLM), all adapters are trained, saved & loaded with the corresponding prediction layer (LM head). However, while we can add an arbitrary number of adapters in parallel, every standard model class only has a maximum of one head. Now, if you load a second adapter to the same model, the prediction head loaded with the first model is overwritten, and so on.

To enable the exact scenario you described, we added new model classes with "multiple heads" additionally to the standard classes already provided by HuggingFace. Unfortunately, these classes currently don't provide an LM head out of the box.

The only option I can currently think of to make your setup work with your existing adapters would be to adapt the BertForMaskedLM class, e.g. to share a base BertModel instance between multiple models with LM heads.

Hope this helped a bit!

@calpt calpt added the question Further information is requested label Sep 8, 2020
@sosuperic
Copy link
Author

Hey @calpt! Thanks for the info. If I have time, I might try to add a LM head and make a pull request if you/someone could give feedback. I assume it'd be adding a function similar to add_classification_head in BertModelHeadsMixin?

@calpt
Copy link
Member

calpt commented Sep 14, 2020

@sosuperic that would be great, I would be happy to help/ give feedback.

I assume it'd be adding a function similar to add_classification_head in BertModelHeadsMixin?

Yes, adding a similar function and adding a case in the forward_head() method.

@rdenaux
Copy link

rdenaux commented Jan 12, 2021

Hi @sosuperic , did you manage to try out your changes? I'm just getting started with the adapters library and have a similar use-case, so this feature would be very useful.
@calpt , can you confirm this issue is still open?

@calpt
Copy link
Member

calpt commented Feb 3, 2021

@rdenaux Yes, this issue is still open, switching between multiple loaded adapters is currently not possible with an LM head.

@rdenaux
Copy link

rdenaux commented Feb 3, 2021

Thanks for the confirmation, @calpt . However, in the end our use cases could be handled with the models with multiple heads you mentioned above.

@calpt calpt added enhancement New feature or request and removed question Further information is requested labels Mar 8, 2021
@calpt calpt changed the title Switching adapters at inference time Language modeling head for flexible head classes Jul 23, 2021
@calpt calpt self-assigned this Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants