Correctly count the number of failing tests (not failing test suites) in PyTorch builds #2794

casparvl · 2022-09-29T14:54:09Z

(created using eb --new-pr)

Previously, the PyTorch EasyBlock would count the number of test that were run (around 86k) and the number of test suites that failed (typically 10 or 15 or so). It presented the latter as if this was the number of failing tests, which would make it seem absolutely negligible. That is not a fair comparison however, as each test suite contains many tests, and if a test suite failed, it usually contains multiple failing tests.

This PR specifies more specific regex patterns to grab for in the output, to get the actual amount of individual tests that fail. With e.g. this run that now amounts to around 200 failures/errors. Compared to 86k total tests, that's still acceptable, and doesn't point to any fundamental issues with the PyTorch build - it merely reflects broken tests.

The printed warning is now also more clear, to help take the end-user with the decision if this amount of errors is acceptable or not. E.g. 200 test failures, distributed evenly over 15 or so test suites might be fine. But 200 test failures in a single test suite might point to one particular functionality of PyTorch being completely broken.

To make this concrete, with this PR, the standard output will look as follows:

...
== ... (took 1 hour 3 mins 54 secs)
== testing...

WARNING: 43 test failures, 151 test errors (out of 86678):
distributions/test_constraints (2 failed, 128 passed, 2 skipped, 2 warnings)
distributions/test_distributions (219 total tests, failures=1)
test_fx (924 total tests, errors=10, skipped=190, expected failures=6)
test_jit (2661 total tests, failures=12, errors=7, skipped=127, expected failures=7)
test_jit_cuda_fuser (147 total tests, errors=1, skipped=18)
test_jit_legacy (2661 total tests, failures=12, errors=8, skipped=125, expected failures=7)
test_jit_profiling (2661 total tests, failures=12, errors=7, skipped=127, expected failures=7)
test_package (131 total tests, errors=46, skipped=23)
test_quantization (877 total tests, failures=3, errors=40, skipped=47)
test_reductions (2895 total tests, errors=5, skipped=104, expected failures=49)
test_sort_and_select (91 total tests, errors=1, skipped=13)
test_sparse (1268 total tests, errors=1, skipped=129)
test_tensor_creation_ops (546 total tests, errors=25, skipped=60)
test_torch (853 total tests, failures=1, skipped=65)


The PyTorch test suite is known to include some flaky tests, which may fail depending on the specifics of the system or the context in which they are run. For this PyTorch installation, EasyBui$
d allows up to 400 tests to fail. We recommend to double check that the failing tests listed above  are known to be flaky, or do not affect your intended usage of PyTorch. In case of doubt, rea$
h out to the EasyBuild community (via GitHub, Slack, or mailing list).

== ... (took 4 hours 19 mins 1 secs)
== installing...
...
== COMPLETED: Installation ended successfully (took 5 hours 29 mins 36 secs)
...

… Improve error reporting to provide more insight into which test suites fail

…sybuild-easyblocks#2794 we'll actually start counting failing tests, instead of failing test suites. Thus, much higher numbers can be expected, since many test suites have multiple failing tests

… read overview of failing tests etc than what is logged in the logfile by default.

casparvl · 2022-10-05T07:59:24Z

Two test report for this PR can be found in this easyconfig PR here and here

boegel · 2022-10-07T07:09:43Z

@casparvl We will likely need to increase max_failed_tests in PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb before we merge this?
Can you upload a test report in this PR for PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb from one or more of your GPU systems, so we have an idea of how many tests are failing for you? I'll do the same for our V100 and A100 GPU systems...

boegel · 2022-10-07T17:26:00Z

Test report by @boegel

Overview of tested easyconfigs (in order)

SUCCESS PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3905.accelgor.os - Linux RHEL 8.4, x86_64, AMD EPYC 7413 24-Core Processor (zen3), 1 x NVIDIA NVIDIA A100-SXM4-80GB, 510.85.02, Python 3.6.8
See https://gist.github.com/3a56324b90bcdcd9fe95af8938019afd for a full test report.

edit:

== 2022-10-07 19:24:22,320 pytorch.py:344 WARNING 3 test failures, 0 test error (out of 89226):
distributed/fsdp/test_fsdp_input (2 total tests, failures=2)
test_autograd (464 total tests, failures=1, skipped=52, expected failures=1)

boegel · 2022-10-07T19:14:47Z

Test report by @boegel

Overview of tested easyconfigs (in order)

SUCCESS PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3304.joltik.os - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 510.85.02, Python 3.6.8
See https://gist.github.com/674ba95e438b310ab248a913912f79b6 for a full test report.

edit:

== 2022-10-07 21:13:12,666 pytorch.py:344 WARNING 3 test failures, 0 test error (out of 88959):
distributed/fsdp/test_fsdp_input (2 total tests, failures=2)
test_autograd (464 total tests, failures=1, skipped=52, expected failures=1)

boegel · 2022-10-08T07:43:11Z

@casparvl Test failures are still very low for PyTorch/1.11.0-foss-2021a-CUDA-11.3.1 for me, despite the change in counting, see the test reports.

That does raise questions about the much higher failing test count in easybuilders/easybuild-easyconfigs#15924 though?

casparvl · 2022-10-10T11:45:06Z

Test report by @casparvl

Overview of tested easyconfigs (in order)

SUCCESS PyTorch-1.11.0-foss-2021a-CUDA-11.3.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn2 - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.43.04, Python 3.6.8
See https://gist.github.com/cae42dedcc9d6ecb622de29425413ddc for a full test report.

casparvl · 2022-10-10T11:49:43Z

@boegel Two reasons for that:

For 1.11.0 we still created various patches to resolve test failures. As you can see here the original number of failing test suites was about 15, and I expect the number of failing tests to have been a multiple of that number. With our recent PyTorch EasyBlock updates, we basically took a different approach and no longer patch tests, but simply except their test failures. That means the number will indeed be higher. Note that in our final EasyConfig with all patches, I did not get any errors (see the test report above). That's because I'm the one who made all the patches, based on whatever was failing on my system... :)
There are about 2 or 3 error types that are reoccuring a lot. One of the very common ones in the test suite is TypeError: 'float' object cannot be interpreted as an integer (or similar). The reason is that implicit typeconversion is no longer allowed by Python 3.10, but Torch's torch.Tensor(data, *, dtype, ...) uses this when the data is not of type dtype. This simply means that torch.tensor([0.5,1], dtype=torch.int8) is no longer valid code. That breaks a large number of tests. I haven't counted, but I think easily between 50-100 tests are failing with this error.

Note that there are other errors that occur quite a large number of times. You're right in that we should probably investigate those a bit further. One of them being the RuntimeError: Unsupported Python version: sys.version_info(major=3, minor=10, micro=4, releaselevel='final', serial=0) error. This is a bit of a weird one, since pytorch/pytorch#66424 suggests that 1.12 does have Python 3.10 support and they did release a wheel for Python 3.10 as well (see https://pypi.org/project/torch/1.12.0/#files). The error is thrown by this file and it seems that in version 1.12.1 Python 3.10 was also added to the list in this file. I guess what this means is that torch.package didn't work properly in PyTorch 1.12.0 - neither in our build nor in the officially released wheel for Python 3.10. The question is whether that should be 'our problem' and we should e.g. backport the fix, or whether this is just what it is (i.e. torch.package is broken with python 3.10), and we should accept that. (see also pytorch/pytorch@1d8ea83)

boegel · 2022-10-11T09:57:09Z

Thanks for clarifying @casparvl!

Since this enhanced PyTorch easyblock has been tested both here (with PyTorch 1.11) and in #15924 for Python 1.12, I don't see a reason to hold back this PR further...

boegel

lgtm

Correctly count the number of failing tests, not failing test suites.…

db706d7

… Improve error reporting to provide more insight into which test suites fail

casparvl changed the title ~~Correctly count the number of failing tests, not failing test suites. Improve error reporting to provide more insight into which test suites fail~~ Correctly count the number of failing tests, not failing test suites Sep 29, 2022

casparvl changed the title ~~Correctly count the number of failing tests, not failing test suites~~ Correctly count the number of failing tests (not failing test suites) in PyTorch builds Sep 29, 2022

casparvl changed the title ~~Correctly count the number of failing tests (not failing test suites) in PyTorch builds~~ Correctly count the number of failing tests (not failing test suites) in PyTorch builds [WIP] Sep 29, 2022

casparl added 4 commits September 29, 2022 17:07

Fix formatting issues

7c94d97

Fix typo

9ea72be

Further formatting fixes

8b96b69

Further formatting fixes

9be147e

casparl and others added 5 commits September 30, 2022 10:32

Forgot to assign max_failed_tests variable

6bab8b7

Removed include that is no longer needed

adbd911

Accept longer line, don't want to break a single regex in the middle

1e6b4a7

Also log to easybuild log file. The message contains a much easier to…

e1c4517

… read overview of failing tests etc than what is logged in the logfile by default.

Merge branch 'easybuilders:develop' into 20220929165406_new_pr_pytorch

d812dc4

casparvl mentioned this pull request Oct 4, 2022

{devel}[foss/2022a] PyTorch v1.12.0 w/ Python 3.10.4 + CUDA 11.7.0 easybuilders/easybuild-easyconfigs#15924

Merged

4 tasks

boegel added the bug fix label Oct 5, 2022

boegel added this to the next release (4.6.2?) milestone Oct 5, 2022

casparvl changed the title ~~Correctly count the number of failing tests (not failing test suites) in PyTorch builds [WIP]~~ Correctly count the number of failing tests (not failing test suites) in PyTorch builds Oct 5, 2022

boegel mentioned this pull request Oct 8, 2022

skip flaky test in PyTorch 1.9.0 easybuilders/easybuild-easyconfigs#16258

Merged

boegel approved these changes Oct 11, 2022

View reviewed changes

boegel merged commit 37673c7 into easybuilders:develop Oct 11, 2022

casparvl mentioned this pull request Oct 11, 2022

{devel}[foss/2021b] PyTorch v1.11.0 w/ Python 3.9.6 + CUDA 11.4.1 easybuilders/easybuild-easyconfigs#16385

Closed

casparvl mentioned this pull request Oct 14, 2022

PyTorch easyblock fails to properly count total number of test errors #2802

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly count the number of failing tests (not failing test suites) in PyTorch builds #2794

Correctly count the number of failing tests (not failing test suites) in PyTorch builds #2794

casparvl commented Sep 29, 2022 •

edited

Loading

casparvl commented Oct 5, 2022

boegel commented Oct 7, 2022

boegel commented Oct 7, 2022 •

edited

Loading

boegel commented Oct 7, 2022 •

edited

Loading

boegel commented Oct 8, 2022 •

edited

Loading

casparvl commented Oct 10, 2022

casparvl commented Oct 10, 2022 •

edited

Loading

boegel commented Oct 11, 2022

boegel left a comment

Correctly count the number of failing tests (not failing test suites) in PyTorch builds #2794

Correctly count the number of failing tests (not failing test suites) in PyTorch builds #2794

Conversation

casparvl commented Sep 29, 2022 • edited Loading

casparvl commented Oct 5, 2022

boegel commented Oct 7, 2022

boegel commented Oct 7, 2022 • edited Loading

Overview of tested easyconfigs (in order)

boegel commented Oct 7, 2022 • edited Loading

Overview of tested easyconfigs (in order)

boegel commented Oct 8, 2022 • edited Loading

casparvl commented Oct 10, 2022

Overview of tested easyconfigs (in order)

casparvl commented Oct 10, 2022 • edited Loading

boegel commented Oct 11, 2022

boegel left a comment

Choose a reason for hiding this comment

casparvl commented Sep 29, 2022 •

edited

Loading

boegel commented Oct 7, 2022 •

edited

Loading

boegel commented Oct 7, 2022 •

edited

Loading

boegel commented Oct 8, 2022 •

edited

Loading

casparvl commented Oct 10, 2022 •

edited

Loading