Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[seq2seq] correctly handle mt5 #9879

Merged
merged 1 commit into from
Jan 29, 2021
Merged

[seq2seq] correctly handle mt5 #9879

merged 1 commit into from
Jan 29, 2021

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Jan 29, 2021

This PR fixes seq2seq/utils.py to handle mt5 like it does t5.

Ideally there should be a test, which would require creating a tiny model for mt5, but I'm being told this code is going away anyway, so there is no point investing energy into it.

Fixes: #9865

@patil-suraj, @sgugger

@stas00 stas00 changed the title correctly handle mt5 [seq2seq] correctly handle mt5 Jan 29, 2021
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a quick fix. @patil-suraj when you port this to the new run_seq2seq, it would be great to try to find a way to make this not use any special code for a given model (but that's out of scope of this PR).

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Stas for fixing this!

@patil-suraj
Copy link
Contributor

when you port this to the new run_seq2seq, it would be great to try to find a way to make this not use any special code for a given model

I'm working on it in #9844, it's not finished though. We might need to add get_input_embeddings and get_pos_embeddings methods to every s2s model, to avoid special cases.

@sgugger
Copy link
Collaborator

sgugger commented Jan 29, 2021

If we need to add some methods to deal with the special cases, I would prefer it (otherwise the script might fail with new seq2seq models).

@stas00 stas00 merged commit 6bf94bc into huggingface:master Jan 29, 2021
@stas00 stas00 deleted the mt5 branch January 29, 2021 16:11
Qbiwan pushed a commit to Qbiwan/transformers that referenced this pull request Jan 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[trainer] seq2seq doesn't handle mt5 correctly
3 participants