-
-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check precise Windows GIX_TEST_IGNORE_ARCHIVES
expectations on CI
#1663
Check precise Windows GIX_TEST_IGNORE_ARCHIVES
expectations on CI
#1663
Conversation
This modifies the `test-fixtures-windows` job that tests on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` so that, instead of checking that no more than 14 failures occur, it checks that the failing tests are exactly those that are documented in GitoxideLabs#1358 as expected to fail. The initial check that no tests have *error* status is preserved, with only stylistic changes, and kept separate from the subsequent logic so that the output is clearer. The new steps are no longer conditional on `nextest` having exited with a failure status, since (a) that was probably unnecessary before and definitely unnecessary now, (b) at last for now, the comparison is precise, so it would be strange to pass if the diff were to have changes on *all* lines, and (c) this makes it slightly less likely that GitoxideLabs#1358 will accidentally stay open even once fixed. The current approach is to actually retrieve the list of tests expected to fail on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` from the GitoxideLabs#1358 issue body. This has the advantage that it automatically keeps up to date with changes made to that issue description, but this is of course not the only possible approach for populating the expected value. Two changes should be made before this is ready: - As noted in the "FIXME" comment, the job should currently fail becuase the performance test reported to fail in GitoxideLabs#1358 is not being filtered out from the expected failures list. It's left in as of this commit, to verify that the job is capable of failing. (After that, the performance test should either be filtered out or removed from the list in GitoxideLabs#1358, but the former approach is currently preferable because I have not done diverse enough testing to check if the failure on my main Windows system is due to that system being too slow rather than a performance bug.) - The scratchwork file should be removed once no longer needed.
This is to fix the error: gh: To use GitHub CLI in a GitHub Actions workflow, set the GH_TOKEN environment variable. Example: env: GH_TOKEN: ${{ github.token }} InvalidOperation: D:\a\_temp\ba9d92b7-3a94-4f7c-b541-d19003f40e19.ps1:9 Line | 9 | $expected_failures = $match_info.Matches.Groups[1].Value -split "`n" … | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | Cannot index into a null array. Error: Process completed with exit code 1.
Including color in the diff of expected vs. actual failed tests makes the output easier to see (much as color helps in `nextest`). This commit also makes some stylistic changes to the command so it is easier to read.
This omits tests containing `performance` (and not as part of a larger "word", not even with `_`) from being expected to fail on CI with `GIX_TEST_IGNORE_ARCHIVES=1` on Windows. Currently there is one such test listed in GitoxideLabs#1358, `gix-ref-tests::refs packed::iter::performance`.
The relevant code is in the `test-fixtures-windows` CI job and it is working (both to fail the job when there is a mismatch and to have the job succeed when there is agreement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for making this happen!
I am very impressed by what can be done with powershell, it clearly is .NET
in a shell :).
And indeed, like predicted, reading the text block from the issue breaks isolation too much and is the reason I'd love to see these extracted into a file, for example if that's OK with you.
Thanks again.
Instead of retrieving them from GitoxideLabs#1358. (See discussion in GitoxideLabs#1663.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful, thanks so much!
Since GitoxideLabs#1663, the `test-fixtures-windows` CI job checks actual failures against a list of specific tests that are known to fail on Windows when `GIX_TEST_IGNORE_ARCHIVES=1`. It is therefore capable of providing useful information about new failures, or newly passing tests that should be removed from the list, if the job ever does fail. The job also seems not to fail. This is to say that while GitoxideLabs#1358 is not fixed, the `test-fixtures-windows` job has a very low rate of failure and, if it does fail, something new and interesting would be happening such that we would want to know about it and probably not immediately merge a PR that caused it without checking how and why that happened. This adds `test-fixtures-windows` to the list of jobs that are dependencies of a required check for branch protection based PR auto-merge.
Since GitoxideLabs#1663, the `test-fixtures-windows` CI job checks actual failures against a list of specific tests that are known to fail on Windows when `GIX_TEST_IGNORE_ARCHIVES=1`. It is therefore capable of providing useful information about new failures, or newly passing tests that should be removed from the list, if the job ever does fail. The job also seems not to fail. This is to say that while GitoxideLabs#1358 is not fixed, the `test-fixtures-windows` job has a very low rate of failure and, if it does fail, something new and interesting would be happening such that we would want to know about it and probably not immediately merge a PR that caused it without checking how and why that happened. This adds `test-fixtures-windows` to the list of jobs that are dependencies of a required check for branch protection based PR auto-merge.
This modifies the CI
test-fixtures-windows
job introduced in #1657, by making it so that, instead of checking that no more than the expected number of test cases fail, it checks that the exact test cases expected to fail are the ones that do fail. (The other check done before as part of this job, that no tests report errors, is still done.) The approach taken is:To collect actual failures, parse the XML output saved by
cargo nextest
to get the package name and test name of each failing test and save them, one per line, to a file.Getting this information is pretty straightforward in PowerShell. When iterating through
testcase
nodes, which are filtered by the presence of afailure
subnode,$_.name
gives the name of the test case, including::
qualifications within the containing package, and$_.classname
gives the name of the containing package. Thus"$($_.classname) $($_.name)"
identifies a test in the same way it is identified in human-readablecargo nextest
output.The unintuitive attribute name
classname
is becausecargo nextest
XML is in the JUnit format, which originated for Java tests where test cases are methods in classes. An alternative approach is to use$_.ParentNode.name
instead of$_.classname
. I've verified that this works too, but I don't know of any strong reason to prefer one over another, so I went with the more compact$_.classname
.To collect expected failures, use
gh
(the GitHub CLI) to download the text of #1358, and parse the first```text
code block with regex, extracting the list of failing tests and saving them, one per line, to a file.One of the tests currently listed there,
gix-ref-tests::refs packed::iter::performance
, is a performance test that seems not to fail on CI (anymore?), though I still find it to fail locally. If it were to start failing on CI, we would want to know about it. So, before these changes, it was not included in the count. I've carried that over to the precise matching done here: tests that appear to be performance tests due to havingperformance
in their names (not as part of any longer\w+
word) are not regarded as expected to fail.It is not obvious that the general approach of downloading the information from an issue is the best one. Although I like this approach and I think it's not too cumbersome, this is admittedly the less elegant step. Other approaches, including hard-coding the expected failures in the workflow or another file, could be used if this approach is not wanted.
To compare them, use
git diff --no-index
with various other options, printing the entire list of failures as context if they differ. The job fails if there are any differences.Because this neither a required check for PR auto-merge nor a dependency of one, I think failing not only if tests unexpectedly fail but also if any unexpectedly pass is unlikely to cause any significant problems, and that knowing whenever the set of failing tests changes in any way is worthwhile.
I intentionally excluded performance tests late in the process, so that there would be a commit whose CI results could be inspected that would verify that the check is capable of failing when the diff is nonempty (and that the output would display in a useful way, with effective colorization, etc., in this situation). That can be observed in this run output and compared to passing runs, such as the runs in this PR.