Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] Investigate equivalence tests #16497

Open
ydshieh opened this issue Mar 30, 2022 · 5 comments
Open

[TODO] Investigate equivalence tests #16497

ydshieh opened this issue Mar 30, 2022 · 5 comments
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 30, 2022

(add a lot of assignees just to make you informed and kept updated in the future. Don't hesitate to remove yourself if you think it's irrelevant)

Currently the PT/TF/Flax equivalence tests use 1e-5 as the tolerance for the absolute differences of outputs.

We see that these tests failed with a non-negligible (although not carefully defined) frequency.

Create this page to track a list of models to investigate.

@gante
Copy link
Member

gante commented Apr 12, 2022

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I've been getting a failure in this one every other day -- example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 12, 2022

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I've been getting a failure in this one every other day -- example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

Thanks. @stas00 also reported this. I will take a look~

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 13, 2022

Another one to add to this list: tests/funnel/test_modeling_funnel.py::FunnelModelTest::test_pt_tf_model_equivalence. I've been getting a failure in this one every other day -- example: https://app.circleci.com/pipelines/github/huggingface/transformers/38007/workflows/2a98b7b1-5ad0-4b80-a702-1887c620193f/jobs/421265

(just for the record) Among 500 runs:

  • 34 runs have FunnelForMaskedLM.output.logits at around 1e-5 ~ 2e-5: so ~ 6.8% chance of failure 😢
  • 66 runs at around 9e-6
  • 38 runs at around 8e-6

(so > 25% to get close to 1e-5)

@huggingface huggingface deleted a comment from github-actions bot May 9, 2022
@huggingface huggingface deleted a comment from github-actions bot Jun 2, 2022
@gante
Copy link
Member

gante commented Jun 2, 2022

@ydshieh I believe you can add the WIP label to stop the bot :)

@gante gante added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jun 2, 2022
@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 2, 2022

I am afraid I will completely forget this issue. But if this brother you guys, OK for me. Thanks for the tip, I didn't know about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests

8 participants