-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Umt5
] Add google's umt5 to transformers
#24477
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Umt5
] Add googl's umt5 to transformesrUmt5
] Add google's umt5 to transformers
Hi @ArthurZucker , thanks for updating this! As far as we can tell, it is not just mT5, because of joined/separate key-value in attention. Was this problem solved in latest conversion script of this PR 🤔 /cc @agemagician |
The conversion went well, the outputs are still a bit gibberish but didn’t have problem of un matching shape. |
So far, I can see you made similar changes as we did before, which led to gibberish output. I belive the issue still exist because of the way we reshape and convert the q,k and v for the attention as @stefan-it mentioned. |
There is also a different logic for |
Regarding the split / merge, I don't really see a problem with the code. The checkpoints are split, and the actual code is similar to mt5 with the difference being |
Co-authored-by: agemagician <ahmed.elnaggar@tum.de> Co-authored-by: stefan-it <>
Update, the outputs match 🔥 The issue was : the tokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the modeling file. Have a couple more nits.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Currently setting up an instance ton convert an upload the |
What does this PR do?
Superseeds #22626 which has been stale for quite some time
A kaggle notebook for reproducing and running the originial model:
https://www.kaggle.com/arthurzucker/umt5-inference
84 tokens are free to use apparently.
For a first conversion I'll be using this: