-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
build system: unit test enhancements #25029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
is this only global, eg is it planned to in the future to be able to set this in code at the test level? |
Yes, see #19821 (which this PR makes progress towards but does not close).
yep |
Okay, here's all the information I've gathered from the results of this first CI run. Detailed Analysis
TL;DR / SummaryMost of the failures are just caused by slow tests, particularly when running under QEMU, and will be solved by simplifying tests and/or increasing timeouts. However, there are two interesting problems which require further investigation:
On the whole, I'm quite happy with how this first run went -- it's caught both slow tests and bugs, exactly what we want from these build system enhancements! I'll spend some time hacking away at this branch over the next few days to hopefully get it into a better state. I'm actually pleasantly surprised with how few failures there were on this initial run. |
I ended up getting caught up in a side quest for the past few weeks, culminating in #25227. Now that that PR is finally open, I'll be working on getting this merge-ready. |
375dea6
to
e893808
Compare
Noting that this closes #25386. |
e893808
to
0248d41
Compare
0248d41
to
d6a1c27
Compare
Cancelling this workflow because it's just a rebase so this will still be failing; I've only rebased so I can get a PR up on Codeberg to see what the other CI runners are looking like. |
@jedisct1, a question for you (I already messaged you on Zulip but then realised idk if you're particularly active there, hence this comment; sorry!). The test |
@mlugg Yes, this is a massive test suite, which is good, but can indeed take a long time to complete. I'll see what entries can be trimmed. |
8f4b713
to
f846e10
Compare
lib/std/crypto/ecdsa.zig
Outdated
for (vectors) |vector| { | ||
if (tvTry(EcdsaP384Sha384, vector)) { | ||
try std.testing.expect(vector.result == .valid or vector.result == .acceptable); | ||
} else |_| { | ||
try std.testing.expectEqual(vector.result, .invalid); | ||
} | ||
} | ||
|
||
if (false) { // non-critical test vectors, skipped because they slow down CI too much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add some extensive_tests
flag to the std lib testing facilities, or to std.testing
in general?
It might still make sense to run them manually before releases / release candidates. As-is if (false)
somewhere in std
are much harder to track/remember than a global flag/mode documenting the intent.
f846e10
to
4c18a8e
Compare
For now, there is a flag to `zig build` called `--test-timeout-ms` which accepts a value in milliseconds. If the execution time of any individual unit test exceeds that number of milliseconds, the test is terminated and marked as timed out. In the future, we may want to increase the granularity of this feature by allowing timeouts to be specified per-step or even per-test. However, a global option is actually very useful. In particular, it can be used in CI scripts to ensure that no individual unit test exceeds some reasonable limit (e.g. 60 seconds) without having to assign limits to every individual test step in the build script. Also, individual unit test durations are now shown in the time report web interface -- this was fairly trivial to add since we're timing tests (to check for timeouts) anyway. This commit makes progress on ziglang#19821, but does not close it, because that proposal includes a more sophisticated mechanism for setting timeouts. Co-Authored-By: David Rubin <david@vortan.dev>
This is a major refactor to `Step.Run` which adds new functionality, primarily to the execution of Zig tests. * All tests are run, even if a test crashes. This happens through the same mechanism as timeouts where the test processes is repeatedly respawned as needed. * The build status output is more precise. For each unit test, it differentiates pass, skip, fail, crash, and timeout. Memory leaks are reported separately, as they do not indicate a test's "status", but are rather an additional property (a test with leaks may still pass!). * The number of memory leaks is tracked and reported, both per-test and for a whole `Run` step. * Reporting is made clearer when a step is failed solely due to error logs (`std.log.err`) where every unit test passed.
Recording the command in a separate field will give the build runner more freedom to choose how and when the command should be printed.
…-style` The new `--error-style` option decides how build failures are printed. The default mode "verbose" prints all context including the step graph fragment and the failed command (if any). The alternative mode "minimal" prints only the failed step itself, and does not print the failed command. There are also "verbose_clear" and "minimal_clear" modes, which have the distinction that the output is cleared (through ANSI escape codes) between updates, preventing different updates from being confused in the output. If `--error-style` is not specified, the environment variable `ZIG_BUILD_ERROR_STYLE` is checked before falling back to the default of "verbose"; this means the value can effectively be chosen system-wide since it is generally a personal preference. Also introduced is a `--multiline-errors` option which decides how to print errors which span multiple lines. By default, non-initial lines are indented to align with the first. Alternatively, a leading newline can be printed to align everyting on the first column, or no special treatment can be applied, resulting in misaligned output. Again, there is an environment variable (`ZIG_BUILD_MULTILINE_ERRORS`) to specify a preferred default if the option is not explicitly provided. Resolves: ziglang#23472
…kends For instance, when running a Zig test using the self-hosted aarch64 backend, this logic was previously expecting `std.zig.Server` to be used, but the default test runner intentionally does not do this because the backend is too immature to handle it. On 'master', this is causing sporadic failures; on this branch, they became consistent failures.
This test called `yield` 80,000 times, which is nothing on a system with little load, but murder on a CI system. macOS' scheduler in particular doesn't seem to deal with this very well. The `yield` calls also weren't even necessarily doing what they were meant to: if the optimizer could figure out that it doesn't clobber some memory, then it could happily reorder around the `yield`s anyway! The test has been simplified and made to work better, and the number of yields have been reduced. The number of overall iterations has also been reduced, because with the `yield` calls making races very likely, we don't really need to run too many iterations to be confident that the implementation is race-free.
The Wycheproof test suite is extensive, but takes a long time to complete on CI. Keep only the most relevant ones and take it as an opportunity to describe what they are. The remaining ones are still available for manual testing when required.
The unit can now be specified in the argument.
i am in purgatory as a punishment bestowed upon me for daring to question the sanctity of windows' scheduler
Unfortunately, Windows' scheduler means that test timeouts get hit very easily, because it seems the system can refuse to schedule a waiting process for *upwards of 10 minutes*. We should look for a better solution for this problem going forwards, but for now, just give Windows a very high test timeout. The 30 minute timeout set here is around the duration of a *full CI run* on Windows, so it should be impossible to hit normally, but it means that if a test gets stuck we'll at least get told (eventually).
4c18a8e
to
fc4c6f8
Compare
This branch started out as an attempt to work on #19821, but ended up including a whole host of enhancements to the build system and unit test evaluation specifically. In no particular order:
--test-timout-ms
; if a test exceeds this, it is killed--multiline-errors
)--prominent-compile-errors
replaced with--error-style
(orZIG_BUILD_ERROR_STYLE
environment variable)--summary line
modeSome of these changes have more detailed explanations in the commit messages.
Here's a screenshot showing some of the new stuff:
The last commit here sets a 60 second unit test timeout in all of the CI scripts. I know that this is going to fail; I expect this PR to spend about a week in purgatory as I bump timeouts and split up tests. Consider it a CI stress-test!