-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate #18149
Comments
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Sorry for missing this! Could you take a look at this, @gante, @Rocketknight1, @ydshieh? |
Let me take a look on the quality issue. And possibly @gante or @Rocketknight1 for the speed issue, let's discuss it :-) |
Actually, the performance issue comes from the quality issue. The TF version didn't stop the generation until 512 tokens. [[65000 25 2092 7 179 15 276 185 7 227 32 9
2 2538 15 5716 2 2538 15 5716 2 2538 15 15
15 15 15 15 15 15 15 15 15 15 15 15 ............
0]], shape=(1, 512), dtype=int32) |
I believe the current PT / TF checkpoints for "Helsinki-NLP/opus-mt-en-ROMANCE" doesn't contain the same weight.
to
I could get
while the PyTorch version gives
So:
|
After a double check (see code below, where I use @gante Would you like to have a look too, upload a new TF checkpoint, and see why
|
Hi there @ydshieh @danielenricocahall 👋 None of the Marian models can be successfully converted to TF -- they all fail when validating the hidden layers and outputs of the models. This is a shame since there are a ton of Marian models for translation :( This means there is something wrong with either the model architecture or with weight cross-loading. I haven't looked into it, other than noticing the issue when attempting to convert the weights from |
@danielenricocahall a fix was merged and new weights were pushed -- if you run from |
cc @gante We still have the generation issue from transformers import MarianMTModel, MarianTokenizer, TFMarianMTModel
model_name = "Helsinki-NLP/opus-mt-en-ROMANCE"
tokenizer = MarianTokenizer.from_pretrained(model_name)
text_in = ['>>fr<< hello']
# PT generates a few tokens then stops early -> very fast
model = MarianMTModel.from_pretrained(model_name)
batch = tokenizer(text_in, return_tensors='pt', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)
print(translated)
print(o)
# TF generates 512 tokens, although the decoded version gives the same result as PT -> very slow
model = TFMarianMTModel.from_pretrained(model_name, from_pt=False)
batch = tokenizer(text_in, return_tensors='tf', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)
print(translated)
print(o) |
@ydshieh Hi, I am experiencing the same issue. Expected the TF version would be faster than the PT version. |
System Info
System
macOS Monterey 12.2.1
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Output:
Expected behavior
I would expect similar performance to the PyTorch model.
Inference requires about 120s on my machine and outputs an incorrect translation. In contrast, the PyTorch model (replacing
TFMarianMTModel
withMarianMTModel
and changingreturn_tensors
to'pt'
in the code snippet) returns the correct translation ("Bonjour") and inference requires about 6s on my machine.The text was updated successfully, but these errors were encountered: