[TODO] Investigate equivalence tests #16497

ydshieh · 2022-03-30T15:50:36Z

(add a lot of assignees just to make you informed and kept updated in the future. Don't hesitate to remove yourself if you think it's irrelevant)

Currently the PT/TF/Flax equivalence tests use 1e-5 as the tolerance for the absolute differences of outputs.

We see that these tests failed with a non-negligible (although not carefully defined) frequency.

Create this page to track a list of models to investigate.

FlaxWav2Vec2ModelTest (2.2888184e-05 > 1e-5)
- https://app.circleci.com/pipelines/github/huggingface/transformers/37363/workflows/a4b06424-0ba8-4fbc-9054-6ff52fbf8145/jobs/411654
TFGPT2EncoderDecoderModelTest (0.001009281724691391 > 1e-3)
- https://app.circleci.com/pipelines/github/huggingface/transformers/37358/workflows/43c12161-33d8-4df5-ba3c-3e62a4507ee7/jobs/411579
  - This also happens to TFBERTEncoderDecoderModelTest
  - This is caused by some sequence in a batch which gets all 0s as attention mask (generated by ids_tensor) - may happens on both encoder and decoder (especially after combining with the causal mask).
  - For TFBERTEncoderDecoderModelTest, the difference is smaller than TFGPT2EncoderDecoderModelTest (by a magnitude of 5x~10x) -> this is due to the last hidden states in GPT2 is after layer norm (not the case for BERT).
  - If we look the cross attention diff between PT/TF, it is clear that we have the same issue (both in the magnitude of 1e-3)
  - The encoder attention diff between PT/TF is in the magnitude of 5e-8: ~~not very sure why this doesn't get much larger~~.
    - This is because PT/TF (at least in BERT) has different encoder_extended_attention_mask: 1e-4 vs 1e-9.
TFViTMAEModelTest (1.013279e-05 > 1e-5)
- https://app.circleci.com/pipelines/github/huggingface/transformers/37319/workflows/5adfba7a-d12b-4e1e-9a7a-e33c7d5fd6ee/jobs/411002

The text was updated successfully, but these errors were encountered:

gante · 2022-04-12T08:42:41Z

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I've been getting a failure in this one every other day -- example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

ydshieh · 2022-04-12T08:45:52Z

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I've been getting a failure in this one every other day -- example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

Thanks. @stas00 also reported this. I will take a look~

ydshieh · 2022-04-13T07:14:26Z

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I've been getting a failure in this one every other day -- example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

(just for the record) Among 500 runs:

34 runs have FunnelForMaskedLM.output.logits at around 1e-5 ~ 2e-5: so ~ 6.8% chance of failure 😢
66 runs at around 9e-6
38 runs at around 8e-6

(so > 25% to get close to 1e-5)

gante · 2022-06-02T17:00:59Z

@ydshieh I believe you can add the WIP label to stop the bot :)

ydshieh · 2022-06-02T18:55:48Z

I am afraid I will completely forget this issue. But if this brother you guys, OK for me. Thanks for the tip, I didn't know about it

ydshieh assigned ydshieh, LysandreJik, sgugger, gante, patil-suraj, Rocketknight1, FrancescoSaverioZuppichini and NielsRogge Mar 30, 2022

ydshieh mentioned this issue Mar 31, 2022

Use random_attention_mask for TF tests #16517

Merged

huggingface deleted a comment from github-actions bot May 9, 2022

huggingface deleted a comment from github-actions bot Jun 2, 2022

gante added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TODO] Investigate equivalence tests #16497

[TODO] Investigate equivalence tests #16497

ydshieh commented Mar 30, 2022 •

edited

Loading

gante commented Apr 12, 2022

ydshieh commented Apr 12, 2022

ydshieh commented Apr 13, 2022 •

edited

Loading

gante commented Jun 2, 2022

ydshieh commented Jun 2, 2022 •

edited

Loading

[TODO] Investigate equivalence tests #16497

[TODO] Investigate equivalence tests #16497

Comments

ydshieh commented Mar 30, 2022 • edited Loading

gante commented Apr 12, 2022

ydshieh commented Apr 12, 2022

ydshieh commented Apr 13, 2022 • edited Loading

gante commented Jun 2, 2022

ydshieh commented Jun 2, 2022 • edited Loading

ydshieh commented Mar 30, 2022 •

edited

Loading

ydshieh commented Apr 13, 2022 •

edited

Loading

ydshieh commented Jun 2, 2022 •

edited

Loading