-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Surface failing tests on GHA #7364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm sure I'll have more specific feedback in the future after using this for a bit, but even in its current form the new summary is extremely useful. Thanks a lot @pmeier for working on it. |
In addition to running |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7364
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Compared to CircleCI, GitHub Actions has one major DevX downside: it is really hard to find failing tests in the logs. This happens for two reasons: 1. GitHub Actions seems to be really slow to load logs in general. We are often waiting multiple tens of seconds, with the browser already asking if we want to stop loading the side completely, because it seems to have hanged. 2. GitHub Actions loads the lines from the beginning, while the failing tests are summarized at the end. Meaning, we spent quite some time to load some stuff that we don't want to look at. CircleCI does this better: 1. In case the log exceeds 400k characters, the online view will be truncated to the *last* 400k characters, which comes down to roughly 3k lines. Paired with a lot faster loading in general, we can see the traceback of failing tests on CircleCI before GitHub Actions even loaded the logs. 2. CircleCI has the ability to surface failing tests in general so one doesn't even have to take a look at the logs in 99% of the cases. Since we can't do anything about 1. here, this PR goes after 2. to bring some of the DevX from CircleCI to GitHub Actions. This ability of CircleCI to surface failing tests comes from this builtin functionality: https://circleci.com/docs/collect-test-data/ This works by analyzing a JUnit XML file. This file can be generated by `pytest`, which is what the domain libraries are using to run their test suites. Copying from the [CircleCI documentation](https://circleci.com/docs/collect-test-data/#pytest), this is minimal example: ```yaml - run: name: run tests command: | . venv/bin/activate mkdir test-results pytest --junitxml=test-results/junit.xml - store_test_results: path: test-results ``` This PR introduces a small custom GitHub action that does the same as the `store_test_results` from CircleCI: [`pytest-results-action`](https://github.com/pmeier/pytest-results-action). All it does, is to parse the same JUnit XML file and store the results as part of the [job summary](https://github.blog/2022-05-09-supercharging-github-actions-with-job-summaries/). The usage is the same: ```yaml - name: Run tests run: pytest --junit-xml=test-results.xml - name: Summarize test results if: always() uses: pmeier/pytest-summary-gha@v0.2.0 with: junit-xml: test-results.xml ``` The branch of this PR was used in pytorch/vision#7364 and "stamped" by the `torchvision` maintainers. Offline, other domain library maintainers also expressed their support for this. Another option currently explored by @DanilBaibak is to bring the full PyTorch HUD including the bot to the domain libraries. This seems like a larger task that might take some time. In the meantime, this PR could drastically improve DevX without wait time.
pytorch/test-infra#3841 was merged and thus we can fully depend on this behavior now. 312f4dc will do a final showcase of the functionality. Afterwards, I'll revert caf522a and thus only real failures will remain. |
This reverts commit caf522a.
Before stamping, who is in charge of maintaining |
Right now, it is just me. We didn't have any discussions around that. There are similar instances all over the @DanilBaibak is working on surfacing failing tests through the HUD as well. Right now it only shows you the failing job with the log, but this will be improved in the near future. Meaning, maintainers have two options to get the information and thus if for some reason I no longer can or want to maintain pmeier/pytest-results-action, we can just rip it out. Of course forking is always an option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, it is just me. We didn't have any discussions around that.
I'll stamp to unblock. However, I want to be clear upfront that while this is a much needed feature, this isn't something the torchvision engineers can maintain.
Hey @pmeier! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Reviewed By: vmoens Differential Revision: D44416567 fbshipit-source-id: b8101bee132ebbdc2313eae2030e3f82509d9eb9
Running a large test suite like
torchvision
results in really sluggish behavior of GitHub Actions. To counteract this somewhat, we recently merged #7267 although that meant we needed to give up the-v
flag for thepytest
invocation. As explained in 2. of #7267 (comment), this is not a big deal on CircleCI due to its ability to surface failing tests natively in a separate tab. GHA does not have this ability builtin, but I have written a small action that makes it possible: https://github.com/pmeier/pytest-results-action/@NicolasHug, @vfdev-5
I've failed a few tests for demonstration purposes. To view them
I've reinstated the
-v
flag for a realistic experience. Could you try it and report back if this is good enough? IMO this is the best UX that we can get out of GHA. If we want more, we need other custom solutions.@osalpekar @DanilBaibak
I've used this patch to upload the results: https://github.com/pytorch/test-infra/compare/3660cdcfa532b664dd3afdd6b25809725c065da2..c2ab5de5211b4eea97dd661a26c1f938645ee2c6. If we want to go that route, we should probably add a
test-results
parameter and only act if that is set.cc @seemethere