Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MT5 support #629

Merged
merged 11 commits into from
Jan 28, 2024
Merged

Adding MT5 support #629

merged 11 commits into from
Jan 28, 2024

Conversation

sotwi
Copy link
Contributor

@sotwi sotwi commented Jan 5, 2024

Pull request to address #568.

I followed the updated guide for Adapters to a Model and did a very quick port. I followed the approach that the mBART implementation took(they reused the BART mixins, I reused the T5 mixins) so the changes were minimal.

I hope it works.

@calpt calpt linked an issue Jan 5, 2024 that may be closed by this pull request
3 tasks
@sotwi
Copy link
Contributor Author

sotwi commented Jan 5, 2024

There appears to be an issues when loading the public mt5 weights into the AdapterModel.
I have no idea what it is, but I think it has to do with those weights already including a lm_head.weight layer in their state dict. That seems to cause issues when initializing the Flexible Heads.

@sotwi
Copy link
Contributor Author

sotwi commented Jan 8, 2024

When I try to load public mt5 weights (say mt5-small) with either AutoAdapterModel or MT5AdapterModel I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wsotomar/anaconda3/envs/dev_adapters/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/home/wsotomar/anaconda3/envs/dev_adapters/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3480, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/wsotomar/Code/adapters/src/adapters/heads/base.py", line 969, in _load_pretrained_model
    return super()._load_pretrained_model(
  File "/home/wsotomar/anaconda3/envs/dev_adapters/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3752, in _load_pretrained_model
    raise ValueError(
ValueError: The state dictionary of the model you are trying to load is corrupted. Are you sure it was properly saved?

I then tried loading the weights with the transformers MT5Model class, which gets rid of the lm_head layer that comes prepackaged with the weights. I saved that model locally and then loaded it with both AutoAdapterModel and MT5AdapterModel without any issues.

I really think that the inclusion of the lm_head on the original weights is messing with the loading process of ModelWithFlexibleHeadsAdaptersMixin and that's why its failing. But I can't quite find the exact issue nor how to fix it.

If anyone wants to have a look at it that would be great.

@calpt
Copy link
Member

calpt commented Jan 17, 2024

@sotwi thanks so much for your work on this so far! will look into the issue you mentioned shortly

@calpt
Copy link
Member

calpt commented Jan 26, 2024

Thanks again for working on this. I've looked into the issue and am working fixing this separately in #640 (for both the failing tests and the lm_head error). Once the fix there is ready, this PR is good to merge from my side!

@calpt calpt merged commit 5f91178 into adapter-hub:main Jan 28, 2024
3 checks passed
@sotwi
Copy link
Contributor Author

sotwi commented Jan 29, 2024

Thank you for your help @calpt!!!
I am glad it is working now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add mt5 support
2 participants