Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate #18149

danielenricocahall · 2022-07-15T12:49:15Z

System Info

System
macOS Monterey 12.2.1

transformers==4.20.1
tensorflow==2.9.1

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import TFMarianMTModel, MarianTokenizer
model_name = "Helsinki-NLP/opus-mt-en-ROMANCE"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = TFMarianMTModel.from_pretrained(model_name)
text_in = ['>>fr<< hello']
batch = tokenizer(text_in, return_tensors='tf', padding=True)
translated = model.generate(**batch)

Output:

- Qu'est-ce qu'il y a, là-bas, là-bas, là---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Expected behavior

I would expect similar performance to the PyTorch model.

Inference requires about 120s on my machine and outputs an incorrect translation. In contrast, the PyTorch model (replacing TFMarianMTModel with MarianMTModel and changing return_tensors to 'pt' in the code snippet) returns the correct translation ("Bonjour") and inference requires about 6s on my machine.

The text was updated successfully, but these errors were encountered:

github-actions · 2022-08-14T15:01:35Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

LysandreJik · 2022-08-24T09:55:57Z

Sorry for missing this! Could you take a look at this, @gante, @Rocketknight1, @ydshieh?

ydshieh · 2022-08-24T10:02:45Z

Let me take a look on the quality issue. And possibly @gante or @Rocketknight1 for the speed issue, let's discuss it :-)

ydshieh · 2022-08-24T10:18:46Z

Actually, the performance issue comes from the quality issue. The TF version didn't stop the generation until 512 tokens.

[[65000    25  2092     7   179    15   276   185     7   227    32     9
      2  2538    15  5716     2  2538    15  5716     2  2538    15    15
     15    15    15    15    15    15    15    15    15    15    15    15 ............
     0]], shape=(1, 512), dtype=int32)

ydshieh · 2022-08-24T12:57:48Z

I believe the current PT / TF checkpoints for "Helsinki-NLP/opus-mt-en-ROMANCE" doesn't contain the same weight.
As if I change from

model = TFMarianMTModel.from_pretrained(model_name)

to

model = TFMarianMTModel.from_pretrained(model_name, from_pt=True)

I could get

[[65000 21959     3     0 65000 65000 65 ....] (still `512` tokens)

while the PyTorch version gives

tensor([[65000, 21959,     3,     0]])

So:

we probably need to check which checkpoint is the correct one, and uploaded the new checkpoint
investigate why TFMarianMTModel doesn't stop earlier.

ydshieh · 2022-08-24T13:18:19Z

After a double check (see code below, where I use from_pt=True), I believe the current PT checkpoint is the correct one, but not the TF checkpoint.

@gante Would you like to have a look too, upload a new TF checkpoint, and see why TFMarianMTModel doesn't stop the generation earlier as MarianMTModel does?

from transformers import MarianMTModel, MarianTokenizer, TFMarianMTModel

model_name = "Helsinki-NLP/opus-mt-en-ROMANCE"
tokenizer = MarianTokenizer.from_pretrained(model_name)

# text_in = ['>>fr<< hello']
text_in = ['>>fr<< Hello, I am a student.']

model = MarianMTModel.from_pretrained(model_name)

batch = tokenizer(text_in, return_tensors='pt', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)

print(translated)
print(o)

model = TFMarianMTModel.from_pretrained(model_name, from_pt=True)

batch = tokenizer(text_in, return_tensors='tf', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)

print(translated)
print(o)


text_in = ['>>it<< I love dogs and cats.']


model = MarianMTModel.from_pretrained(model_name)

batch = tokenizer(text_in, return_tensors='pt', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)

print(translated)
print(o)

model = TFMarianMTModel.from_pretrained(model_name, from_pt=True)

batch = tokenizer(text_in, return_tensors='tf', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)

print(translated)
print(o)

gante · 2022-08-24T17:10:48Z

Hi there @ydshieh @danielenricocahall 👋

None of the Marian models can be successfully converted to TF -- they all fail when validating the hidden layers and outputs of the models. This is a shame since there are a ton of Marian models for translation :(

This means there is something wrong with either the model architecture or with weight cross-loading. I haven't looked into it, other than noticing the issue when attempting to convert the weights from Helsinki-NLP

danielenricocahall · 2022-08-27T02:05:59Z

Thank you for looking into it @ydshieh and @gante !!! This is great information.

gante · 2022-09-05T10:48:39Z

@danielenricocahall a fix was merged and new weights were pushed -- if you run from main, the translations should be much better now 🙌

ydshieh · 2022-09-29T15:59:00Z

cc @gante

We still have the generation issue

from transformers import MarianMTModel, MarianTokenizer, TFMarianMTModel

model_name = "Helsinki-NLP/opus-mt-en-ROMANCE"
tokenizer = MarianTokenizer.from_pretrained(model_name)
text_in = ['>>fr<< hello']

# PT generates a few tokens then stops early -> very fast 
model = MarianMTModel.from_pretrained(model_name)
batch = tokenizer(text_in, return_tensors='pt', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)

print(translated)
print(o)

# TF generates 512 tokens, although the decoded version gives the same result as PT -> very slow
model = TFMarianMTModel.from_pretrained(model_name, from_pt=False)
batch = tokenizer(text_in, return_tensors='tf', padding=True)
translated = model.generate(**batch)
o = tokenizer.batch_decode(translated, skip_special_tokens=True)

print(translated)
print(o)

jamie0725 · 2022-11-03T10:19:44Z

@ydshieh Hi, I am experiencing the same issue. Expected the TF version would be faster than the PT version.

danielenricocahall added the bug label Jul 15, 2022

github-actions bot closed this as completed Aug 22, 2022

LysandreJik reopened this Aug 24, 2022

ydshieh self-assigned this Aug 24, 2022

ydshieh mentioned this issue Aug 29, 2022

load_tf_weights doesn't handle the weights added to the TF models at the top level #18802

Closed

huggingface deleted a comment from github-actions bot Sep 29, 2022

huggingface deleted a comment from github-actions bot Oct 25, 2022

ydshieh added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Oct 25, 2022

ydshieh mentioned this issue Dec 20, 2022

Fix TF generation (especially for TFMarian) #20853

Closed

gante mentioned this issue Dec 26, 2022

🚨🚨 Generate: correct beam search best possible score computation and handling #20901

Closed

gante mentioned this issue Jan 31, 2023

🚨🚨 Generate: standardize beam search behavior across frameworks #21368

Merged

4 tasks

gante closed this as completed in #21368 Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate #18149

Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate #18149

danielenricocahall commented Jul 15, 2022 •

edited

Loading

github-actions bot commented Aug 14, 2022

LysandreJik commented Aug 24, 2022

ydshieh commented Aug 24, 2022 •

edited

Loading

ydshieh commented Aug 24, 2022 •

edited

Loading

ydshieh commented Aug 24, 2022

ydshieh commented Aug 24, 2022

gante commented Aug 24, 2022

danielenricocahall commented Aug 27, 2022

gante commented Sep 5, 2022

ydshieh commented Sep 29, 2022

jamie0725 commented Nov 3, 2022

Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate #18149

Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate #18149

Comments

danielenricocahall commented Jul 15, 2022 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

github-actions bot commented Aug 14, 2022

LysandreJik commented Aug 24, 2022

ydshieh commented Aug 24, 2022 • edited Loading

ydshieh commented Aug 24, 2022 • edited Loading

ydshieh commented Aug 24, 2022

ydshieh commented Aug 24, 2022

gante commented Aug 24, 2022

danielenricocahall commented Aug 27, 2022

gante commented Sep 5, 2022

ydshieh commented Sep 29, 2022

jamie0725 commented Nov 3, 2022

danielenricocahall commented Jul 15, 2022 •

edited

Loading

ydshieh commented Aug 24, 2022 •

edited

Loading

ydshieh commented Aug 24, 2022 •

edited

Loading