Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly count the number of failing tests (not failing test suites) in PyTorch builds #2794

Merged
merged 10 commits into from
Oct 11, 2022

Conversation

casparvl
Copy link
Contributor

@casparvl casparvl commented Sep 29, 2022

(created using eb --new-pr)

Previously, the PyTorch EasyBlock would count the number of test that were run (around 86k) and the number of test suites that failed (typically 10 or 15 or so). It presented the latter as if this was the number of failing tests, which would make it seem absolutely negligible. That is not a fair comparison however, as each test suite contains many tests, and if a test suite failed, it usually contains multiple failing tests.

This PR specifies more specific regex patterns to grab for in the output, to get the actual amount of individual tests that fail. With e.g. this run that now amounts to around 200 failures/errors. Compared to 86k total tests, that's still acceptable, and doesn't point to any fundamental issues with the PyTorch build - it merely reflects broken tests.

The printed warning is now also more clear, to help take the end-user with the decision if this amount of errors is acceptable or not. E.g. 200 test failures, distributed evenly over 15 or so test suites might be fine. But 200 test failures in a single test suite might point to one particular functionality of PyTorch being completely broken.

To make this concrete, with this PR, the standard output will look as follows:

...
== ... (took 1 hour 3 mins 54 secs)
== testing...

WARNING: 43 test failures, 151 test errors (out of 86678):
distributions/test_constraints (2 failed, 128 passed, 2 skipped, 2 warnings)
distributions/test_distributions (219 total tests, failures=1)
test_fx (924 total tests, errors=10, skipped=190, expected failures=6)
test_jit (2661 total tests, failures=12, errors=7, skipped=127, expected failures=7)
test_jit_cuda_fuser (147 total tests, errors=1, skipped=18)
test_jit_legacy (2661 total tests, failures=12, errors=8, skipped=125, expected failures=7)
test_jit_profiling (2661 total tests, failures=12, errors=7, skipped=127, expected failures=7)
test_package (131 total tests, errors=46, skipped=23)
test_quantization (877 total tests, failures=3, errors=40, skipped=47)
test_reductions (2895 total tests, errors=5, skipped=104, expected failures=49)
test_sort_and_select (91 total tests, errors=1, skipped=13)
test_sparse (1268 total tests, errors=1, skipped=129)
test_tensor_creation_ops (546 total tests, errors=25, skipped=60)
test_torch (853 total tests, failures=1, skipped=65)


The PyTorch test suite is known to include some flaky tests, which may fail depending on the specifics of the system or the context in which they are run. For this PyTorch installation, EasyBui$
d allows up to 400 tests to fail. We recommend to double check that the failing tests listed above  are known to be flaky, or do not affect your intended usage of PyTorch. In case of doubt, rea$
h out to the EasyBuild community (via GitHub, Slack, or mailing list).

== ... (took 4 hours 19 mins 1 secs)
== installing...
...
== COMPLETED: Installation ended successfully (took 5 hours 29 mins 36 secs)
...

… Improve error reporting to provide more insight into which test suites fail
@casparvl casparvl changed the title Correctly count the number of failing tests, not failing test suites. Improve error reporting to provide more insight into which test suites fail Correctly count the number of failing tests, not failing test suites Sep 29, 2022
@casparvl casparvl changed the title Correctly count the number of failing tests, not failing test suites Correctly count the number of failing tests (not failing test suites) in PyTorch builds Sep 29, 2022
@casparvl casparvl changed the title Correctly count the number of failing tests (not failing test suites) in PyTorch builds Correctly count the number of failing tests (not failing test suites) in PyTorch builds [WIP] Sep 29, 2022
casparvl pushed a commit to sara-nl/easybuild-easyconfigs that referenced this pull request Sep 29, 2022
…sybuild-easyblocks#2794 we'll actually start counting failing tests, instead of failing test suites. Thus, much higher numbers can be expected, since many test suites have multiple failing tests
@casparvl
Copy link
Contributor Author

casparvl commented Oct 5, 2022

Two test report for this PR can be found in this easyconfig PR here and here

@casparvl casparvl changed the title Correctly count the number of failing tests (not failing test suites) in PyTorch builds [WIP] Correctly count the number of failing tests (not failing test suites) in PyTorch builds Oct 5, 2022
@boegel
Copy link
Member

boegel commented Oct 7, 2022

@casparvl We will likely need to increase max_failed_tests in PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb before we merge this?
Can you upload a test report in this PR for PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb from one or more of your GPU systems, so we have an idea of how many tests are failing for you? I'll do the same for our V100 and A100 GPU systems...

@boegel
Copy link
Member

boegel commented Oct 7, 2022

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.85.02, Python 3.6.8
See https://gist.github.com/3a56324b90bcdcd9fe95af8938019afd for a full test report.

edit:

== 2022-10-07 19:24:22,320 pytorch.py:344 WARNING 3 test failures, 0 test error (out of 89226):
distributed/fsdp/test_fsdp_input (2 total tests, failures=2)
test_autograd (464 total tests, failures=1, skipped=52, expected failures=1)

@boegel
Copy link
Member

boegel commented Oct 7, 2022

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3304.joltik.os - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.85.02, Python 3.6.8
See https://gist.github.com/674ba95e438b310ab248a913912f79b6 for a full test report.

edit:

== 2022-10-07 21:13:12,666 pytorch.py:344 WARNING 3 test failures, 0 test error (out of 88959):
distributed/fsdp/test_fsdp_input (2 total tests, failures=2)
test_autograd (464 total tests, failures=1, skipped=52, expected failures=1)

@boegel
Copy link
Member

boegel commented Oct 8, 2022

@casparvl Test failures are still very low for PyTorch/1.11.0-foss-2021a-CUDA-11.3.1 for me, despite the change in counting, see the test reports.

That does raise questions about the much higher failing test count in easybuilders/easybuild-easyconfigs#15924 though?

@casparvl
Copy link
Contributor Author

Test report by @casparvl

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn2 - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.43.04, Python 3.6.8
See https://gist.github.com/cae42dedcc9d6ecb622de29425413ddc for a full test report.

@casparvl
Copy link
Contributor Author

casparvl commented Oct 10, 2022

@boegel Two reasons for that:

  1. For 1.11.0 we still created various patches to resolve test failures. As you can see here the original number of failing test suites was about 15, and I expect the number of failing tests to have been a multiple of that number. With our recent PyTorch EasyBlock updates, we basically took a different approach and no longer patch tests, but simply except their test failures. That means the number will indeed be higher. Note that in our final EasyConfig with all patches, I did not get any errors (see the test report above). That's because I'm the one who made all the patches, based on whatever was failing on my system... :)

  2. There are about 2 or 3 error types that are reoccuring a lot. One of the very common ones in the test suite is TypeError: 'float' object cannot be interpreted as an integer (or similar). The reason is that implicit typeconversion is no longer allowed by Python 3.10, but Torch's torch.Tensor(data, *, dtype, ...) uses this when the data is not of type dtype. This simply means that torch.tensor([0.5,1], dtype=torch.int8) is no longer valid code. That breaks a large number of tests. I haven't counted, but I think easily between 50-100 tests are failing with this error.

Note that there are other errors that occur quite a large number of times. You're right in that we should probably investigate those a bit further. One of them being the RuntimeError: Unsupported Python version: sys.version_info(major=3, minor=10, micro=4, releaselevel='final', serial=0) error. This is a bit of a weird one, since pytorch/pytorch#66424 suggests that 1.12 does have Python 3.10 support and they did release a wheel for Python 3.10 as well (see https://pypi.org/project/torch/1.12.0/#files). The error is thrown by this file and it seems that in version 1.12.1 Python 3.10 was also added to the list in this file. I guess what this means is that torch.package didn't work properly in PyTorch 1.12.0 - neither in our build nor in the officially released wheel for Python 3.10. The question is whether that should be 'our problem' and we should e.g. backport the fix, or whether this is just what it is (i.e. torch.package is broken with python 3.10), and we should accept that. (see also pytorch/pytorch@1d8ea83)

@boegel
Copy link
Member

boegel commented Oct 11, 2022

Thanks for clarifying @casparvl!

Since this enhanced PyTorch easyblock has been tested both here (with PyTorch 1.11) and in #15924 for Python 1.12, I don't see a reason to hold back this PR further...

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants