-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language modeling head for flexible head classes #53
Comments
Hi @sosuperic! The code you provided looks right and the behaviour you describe is expected (although certainly not optimal): As you use a derived model with prediction head ( To enable the exact scenario you described, we added new model classes with "multiple heads" additionally to the standard classes already provided by HuggingFace. Unfortunately, these classes currently don't provide an LM head out of the box. The only option I can currently think of to make your setup work with your existing adapters would be to adapt the Hope this helped a bit! |
Hey @calpt! Thanks for the info. If I have time, I might try to add a LM head and make a pull request if you/someone could give feedback. I assume it'd be adding a function similar to |
@sosuperic that would be great, I would be happy to help/ give feedback.
Yes, adding a similar function and adding a case in the |
Hi @sosuperic , did you manage to try out your changes? I'm just getting started with the adapters library and have a similar use-case, so this feature would be very useful. |
@rdenaux Yes, this issue is still open, switching between multiple loaded adapters is currently not possible with an LM head. |
Thanks for the confirmation, @calpt . However, in the end our use cases could be handled with the models with multiple heads you mentioned above. |
Hi! I have many language adapters, each trained with masked language modeling on different (English) datasets.
I want to be able to load 1 BERT model, load each of the adapters, and then decide which adapter to use on any given forward pass. This would save on memory and loading time, as opposed to loading a separate BERT for each adapter. This seems possible -- here's what I'm doing:
However, I am getting different outputs than having one model per adapter. For example,
output1
above is different fromoutput1
below, andoutput2
above is different fromoutput2
below.I'm trying to read https://github.com/Adapter-Hub/adapter-transformers/blob/master/src/transformers/adapter_model_mixin.py#L339 to see why this would be the case. Is this expected behavior? Am I doing something wrong?
Thanks!
The text was updated successfully, but these errors were encountered: