build system: unit test enhancements #25029

mlugg · 2025-08-26T22:39:00Z

This branch started out as an attempt to work on #19821, but ended up including a whole host of enhancements to the build system and unit test evaluation specifically. In no particular order:

A per-unit-test timeout can be specified on the command line with --test-timout-ms; if a test exceeds this, it is killed
If a unit test process crashes or is killed, it is respawned to run the remaining tests
Unit test durations are shown in the time report web interface
Failed commands are made visually distinct
The exact number of memory leaks is reported
Test pass/fail/etc summary output is made clearer
Multi-line build step errors are properly aligned by default (overrideable with --multiline-errors)
--prominent-compile-errors replaced with --error-style (or ZIG_BUILD_ERROR_STYLE environment variable)
New --summary line mode

Some of these changes have more detailed explanations in the commit messages.

Here's a screenshot showing some of the new stuff:

The last commit here sets a 60 second unit test timeout in all of the CI scripts. I know that this is going to fail; I expect this PR to spend about a week in purgatory as I bump timeouts and split up tests. Consider it a CI stress-test!

nektro · 2025-08-26T23:58:38Z

A per-unit-test timeout can be specified on the command line with --test-timout-ms; if a test exceeds this, it is killed

is this only global, eg is it planned to in the future to be able to set this in code at the test level?
is the default still infinity?

mlugg · 2025-08-27T00:29:02Z

eg is it planned to in the future to be able to set this in code at the test level?

Yes, see #19821 (which this PR makes progress towards but does not close).

is the default still infinity?

yep

mlugg · 2025-08-27T09:02:39Z

Okay, here's all the information I've gathered from the results of this first CI run.

Detailed Analysis

Both aarch64-macos runs are failing due to a fuckton of timeouts in Thread.RwLock.test.concurrent access. It seems like this is probably just a test which happens to be quite slow on that target; I'll probably just try to speed it up a little.
x86_64-windows-debug has failed with this error:
```
test
+- test-link
   +- link_test_cases
      +- link_test_cases.common_symbols_alignment
         +- run test failure
error: test runner failed to respond for 30.013s
```
This is interesting -- that error isn't a test timeout, but rather indicates that the test runner itself (i.e. the std.zig.Server logic) is failing to respond. I actually hardcoded this timeout with the expectation that it would only ever be hit if a test triggered some really unusual IB, but it was the first thing we hit! This could just be caused by Windows' scheduler shitting the bed under extreme load and not scheduling that test process for 30s, in which case I'll just bump this fixed timeout a bunch (it's not a common failure mode, so the intention is that it's never something you need to worry about in normal operation). However, this could alternatively be a new manifestation of some common Windows flakiness. More investigation is required.

All of the x86_64-linux runs are seeing various failures that look like this:

test
+- test-cases
   +- run test fn_typeinfo_passed_to_comptime_fn failure
error: unable to write stdin (BrokenPipe); test process unexpectedly exited with code 0
failed command: qemu-aarch64 /home/ci/actions-runner9/_work/zig/zig/zig-local-cache/o/ba60b66fd473c023412751b5b86e50ed/fn_typeinfo_passed_to_comptime_fn --cache-dir=/home/ci/actions-runner9/_work/zig/zig/zig-local-cache --seed=0x33143409 --listen=-

test
+- test-modules
   +- test-compiler-rt
      +- run test compiler-rt-aarch64-linux-none-neoverse_n1-ReleaseFast-selfhosted-no-lld failure
error: test process unexpectedly exited with code 0
failed command: qemu-aarch64 /home/ci/actions-runner9/_work/zig/zig/zig-local-cache/o/97a26e69e91b460166a336f8e8962e15/test --cache-dir=/home/ci/actions-runner9/_work/zig/zig/zig-local-cache --seed=0x33143409 --listen=-

test
+- test-modules
   +- test-compiler-rt
      +- run test compiler-rt-aarch64-linux-none-generic-ReleaseFast-selfhosted-no-lld failure
error: test process unexpectedly exited with code 0
failed command: qemu-aarch64 /home/ci/actions-runner9/_work/zig/zig/zig-local-cache/o/710431092f32bd8900bf1b9c57fbc3f6/test --cache-dir=/home/ci/actions-runner9/_work/zig/zig/zig-local-cache --seed=0x33143409 --listen=-

These look weird at first glance, but what's actually happening here is that this branch introduces reporting for a failure mode which is ignored on master branch: the test runner calling exit(0) (i.e. exiting with success code) despite not being done (the build runner has not yet asked it to terminate because the tests are not complete).

run test fn_typeinfo_passed_to_comptime_fn failure sounds familiar -- that's because it has been flaky on master, occasionally reporting error.BrokenPipe. That's because on master, this unexpected termination will lead to an error only under a specific race condition (if the child terminates between the parent reading a std.zig.Server.Message from it and attempting to send a std.zig.Client.Message to it). In other cases, master will treat it as a success, claiming all remaining tests passed.

Therefore, this is a real existing bug, which we sometimes see as flakiness, being exposed by this branch. I haven't investigated it yet, but I am highly confident that this is an instance of this branch's improved error reporting helping to catch a previously-unnoticed issue.

The x86_64-linux runs are also seeing various miscellaneous timeouts. Here are a few examples (of many):

test
+- test-modules
   +- test-std
      +- run test std-native-znver2-Debug 2918 pass, 28 skip, 1 timeout (2947 total)
error: 'posix.test.test.sync' timed out after 1m57.497ms
failed command: /home/ci/actions-runner9/_work/zig/zig/zig-local-cache/o/7fe801488ac3d8951c539f01f7673963/test --cache-dir=/home/ci/actions-runner9/_work/zig/zig/zig-local-cache --seed=0x33143409 --listen=-

test
+- test-modules
   +- test-std
      +- run test std-riscv64-linux-musl-baseline_rv64-Debug-libc 2874 pass, 70 skip, 3 timeout (2947 total)
error: 'hash_map.test.remove one million elements in random order' timed out after 1m55.589ms
error: 'crypto.ecdsa.test.Test vectors from Project Wycheproof - EcdsaP256Sha256' timed out after 1m58.725ms
error: 'crypto.ecdsa.test.Test vectors from Project Wycheproof - EcdsaP384Sha384' timed out after 1m56.963ms
failed command: qemu-riscv64 /home/ci/actions-runner6/_work/zig/zig/zig-local-cache/o/5dca11983b0cddc2a57298def70ab5d2/test --cache-dir=/home/ci/actions-runner6/_work/zig/zig/zig-local-cache --seed=0xed2ad7ba --listen=-

test
+- test-modules
   +- test-std
      +- run test std-s390x-linux-musl-arch8-Debug-libc 2861 pass, 83 skip, 1 timeout (2945 total)
error: 'crypto.ecdsa.test.Test vectors from Project Wycheproof - EcdsaP384Sha384' timed out after 1m55.873ms
failed command: qemu-s390x /home/ci/actions-runner6/_work/zig/zig/zig-local-cache/o/503a5d18a494e6b4031049e0b03f1905/test --cache-dir=/home/ci/actions-runner6/_work/zig/zig/zig-local-cache --seed=0xed2ad7ba --listen=-

We were expecting these -- currently there are a handful of very slow tests in our test suite, and those performance issues are of course exacerbated if running tests under a foreign executor like QEMU. Solving these will be a gradual process of a) splitting up and/or simplifying excessively slow tests, and b) tastefully bumping timeouts for slower CI runners. I want to try and avoid bumping the timeouts too far past the current value of 60 seconds.

riscv64-linux-release (logs here; GitHub Actions had one of its many moments and completely failed to link the workflow from the PR?!) just had a few timeouts:

test-modules
+- test-compiler-rt
   +- run test compiler-rt-native-spacemit_x60-Debug 269 pass, 7 skip, 2 timeout (278 total)
error: 'compiler_rt.memcpy.test.memcpy' timed out after 1m56.066ms
error: 'compiler_rt.memmove.decltest.memmoveFast' timed out after 1m55.848ms
failed command: /home/ci/runner1/_layout/_work/zig/zig/zig-local-cache/o/2bab7262359c84e24f4dd4eb1dbbeb91/test --cache-dir=/home/ci/runner1/_layout/_work/zig/zig/zig-local-cache --seed=0x3516b1f5 --listen=-

test-modules
+- test-std
   +- run test std-native-spacemit_x60-Debug 2912 pass, 34 skip, 1 timeout (2947 total)
error: 'crypto.ecdsa.test.Test vectors from Project Wycheproof - EcdsaP384Sha384' timed out after 1m57.555ms
failed command: /home/ci/runner1/_layout/_work/zig/zig/zig-local-cache/o/41ccb4d707ab19113984ad94c9d56cb9/test --cache-dir=/home/ci/runner1/_layout/_work/zig/zig/zig-local-cache --seed=0x3516b1f5 --listen=-

test-modules
+- test-std
   +- run test std-native-spacemit_x60-Debug-libc 2911 pass, 35 skip, 1 timeout (2947 total)
error: 'crypto.ecdsa.test.Test vectors from Project Wycheproof - EcdsaP384Sha384' timed out after 1m57.574ms
failed command: /home/ci/runner1/_layout/_work/zig/zig/zig-local-cache/o/8826764be6cccc2ff29183d32066c78b/test --cache-dir=/home/ci/runner1/_layout/_work/zig/zig/zig-local-cache --seed=0x3516b1f5 --listen=-

It looks like these are just slow tests (as discussed above) on a slow runner; they'll be fixed by test simplifications and/or timeout bumps.

At the time of writing, riscv64-linux-debug hasn't finished, but I expect it to look much like the above.

TL;DR / Summary

Most of the failures are just caused by slow tests, particularly when running under QEMU, and will be solved by simplifying tests and/or increasing timeouts. However, there are two interesting problems which require further investigation:

x86_64-windows-release saw a concerning timeout in the test runner logic itself which I can only assume indicates either a bug on our side or absolutely insane scheduling behavior from Windows
x86_64-linux-* saw several instances of a failure condition which master branch mostly does not report, and which is the cause of some flaky tests on master; this definitely indicates a bug on our end

On the whole, I'm quite happy with how this first run went -- it's caught both slow tests and bugs, exactly what we want from these build system enhancements!

I'll spend some time hacking away at this branch over the next few days to hopefully get it into a better state. I'm actually pleasantly surprised with how few failures there were on this initial run.

mlugg · 2025-09-13T14:47:47Z

I ended up getting caught up in a side quest for the past few weeks, culminating in #25227. Now that that PR is finally open, I'll be working on getting this merge-ready.

alexrp · 2025-09-28T13:30:34Z

Noting that this closes #25386.

mlugg · 2025-10-14T12:32:44Z

Cancelling this workflow because it's just a rebase so this will still be failing; I've only rebased so I can get a PR up on Codeberg to see what the other CI runners are looking like.

mlugg · 2025-10-14T12:53:13Z

@jedisct1, a question for you (I already messaged you on Zulip but then realised idk if you're particularly active there, hence this comment; sorry!). The test std.crypto.ecdsa.test.Test vectors from Project Wycheproof - EcdsaP384Sha384 is causing this PR some trouble, because it's really slow to execute: it takes several seconds even on a fast native build of the std tests, so under qemu it can take minutes, which is just too slow for a unit test. I think it just contains too many cases, so we need to remove some of them. Is there any good metric for which cases should be kept vs deleted? Ideally I'd like to delete 50% or more of the .result = .valid cases (those seem to be the slower ones)

jedisct1 · 2025-10-14T13:41:54Z

@mlugg Yes, this is a massive test suite, which is good, but can indeed take a long time to complete.

I'll see what entries can be trimmed.

jedisct1 · 2025-10-14T14:25:12Z

@mlugg #25575

rohlem · 2025-10-15T09:37:23Z

lib/std/crypto/ecdsa.zig

    for (vectors) |vector| {
        if (tvTry(EcdsaP384Sha384, vector)) {
            try std.testing.expect(vector.result == .valid or vector.result == .acceptable);
        } else |_| {
            try std.testing.expectEqual(vector.result, .invalid);
        }
    }
+
+    if (false) { // non-critical test vectors, skipped because they slow down CI too much


Would it make sense to add some extensive_tests flag to the std lib testing facilities, or to std.testing in general?
It might still make sense to run them manually before releases / release candidates. As-is if (false) somewhere in std are much harder to track/remember than a global flag/mode documenting the intent.

For now, there is a flag to `zig build` called `--test-timeout-ms` which accepts a value in milliseconds. If the execution time of any individual unit test exceeds that number of milliseconds, the test is terminated and marked as timed out. In the future, we may want to increase the granularity of this feature by allowing timeouts to be specified per-step or even per-test. However, a global option is actually very useful. In particular, it can be used in CI scripts to ensure that no individual unit test exceeds some reasonable limit (e.g. 60 seconds) without having to assign limits to every individual test step in the build script. Also, individual unit test durations are now shown in the time report web interface -- this was fairly trivial to add since we're timing tests (to check for timeouts) anyway. This commit makes progress on ziglang#19821, but does not close it, because that proposal includes a more sophisticated mechanism for setting timeouts. Co-Authored-By: David Rubin <david@vortan.dev>

This is a major refactor to `Step.Run` which adds new functionality, primarily to the execution of Zig tests. * All tests are run, even if a test crashes. This happens through the same mechanism as timeouts where the test processes is repeatedly respawned as needed. * The build status output is more precise. For each unit test, it differentiates pass, skip, fail, crash, and timeout. Memory leaks are reported separately, as they do not indicate a test's "status", but are rather an additional property (a test with leaks may still pass!). * The number of memory leaks is tracked and reported, both per-test and for a whole `Run` step. * Reporting is made clearer when a step is failed solely due to error logs (`std.log.err`) where every unit test passed.

Recording the command in a separate field will give the build runner more freedom to choose how and when the command should be printed.

…-style` The new `--error-style` option decides how build failures are printed. The default mode "verbose" prints all context including the step graph fragment and the failed command (if any). The alternative mode "minimal" prints only the failed step itself, and does not print the failed command. There are also "verbose_clear" and "minimal_clear" modes, which have the distinction that the output is cleared (through ANSI escape codes) between updates, preventing different updates from being confused in the output. If `--error-style` is not specified, the environment variable `ZIG_BUILD_ERROR_STYLE` is checked before falling back to the default of "verbose"; this means the value can effectively be chosen system-wide since it is generally a personal preference. Also introduced is a `--multiline-errors` option which decides how to print errors which span multiple lines. By default, non-initial lines are indented to align with the first. Alternatively, a leading newline can be printed to align everyting on the first column, or no special treatment can be applied, resulting in misaligned output. Again, there is an environment variable (`ZIG_BUILD_MULTILINE_ERRORS`) to specify a preferred default if the option is not explicitly provided. Resolves: ziglang#23472

…kends For instance, when running a Zig test using the self-hosted aarch64 backend, this logic was previously expecting `std.zig.Server` to be used, but the default test runner intentionally does not do this because the backend is too immature to handle it. On 'master', this is causing sporadic failures; on this branch, they became consistent failures.

This test called `yield` 80,000 times, which is nothing on a system with little load, but murder on a CI system. macOS' scheduler in particular doesn't seem to deal with this very well. The `yield` calls also weren't even necessarily doing what they were meant to: if the optimizer could figure out that it doesn't clobber some memory, then it could happily reorder around the `yield`s anyway! The test has been simplified and made to work better, and the number of yields have been reduced. The number of overall iterations has also been reduced, because with the `yield` calls making races very likely, we don't really need to run too many iterations to be confident that the implementation is race-free.

The Wycheproof test suite is extensive, but takes a long time to complete on CI. Keep only the most relevant ones and take it as an opportunity to describe what they are. The remaining ones are still available for manual testing when required.

The unit can now be specified in the argument.

i am in purgatory as a punishment bestowed upon me for daring to question the sanctity of windows' scheduler

Unfortunately, Windows' scheduler means that test timeouts get hit very easily, because it seems the system can refuse to schedule a waiting process for *upwards of 10 minutes*. We should look for a better solution for this problem going forwards, but for now, just give Windows a very high test timeout. The 30 minute timeout set here is around the duration of a *full CI run* on Windows, so it should be impossible to hit normally, but it means that if a test gets stuck we'll at least get told (eventually).

mlugg added the ci-riscv64-linux label Aug 26, 2025

mlugg force-pushed the unit-test-timing branch 3 times, most recently from 375dea6 to e893808 Compare September 18, 2025 13:17

mlugg mentioned this pull request Sep 28, 2025

test: delete fn_typeinfo_passed_to_comptime_fn.zig due to flakiness #25387

Closed

mlugg force-pushed the unit-test-timing branch from e893808 to 0248d41 Compare October 1, 2025 13:32

alexrp removed the ci-riscv64-linux label Oct 14, 2025

mlugg force-pushed the unit-test-timing branch from 0248d41 to d6a1c27 Compare October 14, 2025 12:32

mlugg added zig build system std.Build, the build runner, `zig build` subcommand, package management release notes This PR should be mentioned in the release notes. labels Oct 14, 2025

mlugg mentioned this pull request Oct 14, 2025

crypto.ecdsa: trim the number of tests we perform #25575

Closed

mlugg added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Oct 14, 2025

mlugg force-pushed the unit-test-timing branch 2 times, most recently from 8f4b713 to f846e10 Compare October 14, 2025 22:30

rohlem reviewed Oct 15, 2025

View reviewed changes

mlugg force-pushed the unit-test-timing branch from f846e10 to 4c18a8e Compare October 15, 2025 13:41

mlugg and others added 5 commits October 16, 2025 12:41

std.Build: separate errors from failed commands

b8a7e07

Recording the command in a separate field will give the build runner more freedom to choose how and when the command should be printed.

build runner: final tweaks to output

48e29ee

mlugg and others added 11 commits October 16, 2025 12:41

ci: set unit test timeouts

c189609

tweak tests to avoid timeouts

6b13f30

ci: bump unit test timeouts

2976d79

std: split up ecdsa tests

765d7e7

ci: add unit test timeouts to loongarch and x86_64-freebsd

f989191

compiler: rename --test-timeout-ms to --test-timeout

6895a22

The unit can now be specified in the argument.

ci: bump unit test timeouts

fe869e0

i am in purgatory as a punishment bestowed upon me for daring to question the sanctity of windows' scheduler

mlugg force-pushed the unit-test-timing branch from 4c18a8e to fc4c6f8 Compare October 16, 2025 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

build system: unit test enhancements #25029

build system: unit test enhancements #25029

Uh oh!

mlugg commented Aug 26, 2025

Uh oh!

nektro commented Aug 26, 2025

Uh oh!

mlugg commented Aug 27, 2025

Uh oh!

mlugg commented Aug 27, 2025 •

edited

Loading

Uh oh!

mlugg commented Sep 13, 2025

Uh oh!

alexrp commented Sep 28, 2025

Uh oh!

mlugg commented Oct 14, 2025 •

edited

Loading

Uh oh!

mlugg commented Oct 14, 2025

Uh oh!

jedisct1 commented Oct 14, 2025

Uh oh!

jedisct1 commented Oct 14, 2025

Uh oh!

rohlem Oct 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

build system: unit test enhancements #25029

Are you sure you want to change the base?

build system: unit test enhancements #25029

Uh oh!

Conversation

mlugg commented Aug 26, 2025

Uh oh!

nektro commented Aug 26, 2025

Uh oh!

mlugg commented Aug 27, 2025

Uh oh!

mlugg commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR / Summary

Uh oh!

mlugg commented Sep 13, 2025

Uh oh!

alexrp commented Sep 28, 2025

Uh oh!

mlugg commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlugg commented Oct 14, 2025

Uh oh!

jedisct1 commented Oct 14, 2025

Uh oh!

jedisct1 commented Oct 14, 2025

Uh oh!

rohlem Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mlugg commented Aug 27, 2025 •

edited

Loading

mlugg commented Oct 14, 2025 •

edited

Loading

rohlem Oct 15, 2025 •

edited

Loading