-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nextest seems to handle stdio-intensive tests less efficiently than cargo test #1068
Comments
Thanks for the report! Huh, that's interesting -- how big are stdout and stderr for your tests? The default Linux x86_64 pipe size is 65536 bytes. Regardless, 700ms+ just in write and read is a lot and is definitely worth fixing. |
To be honest, I can't reproduce this. With a Ryzen 7950x and Linux kernel 6.5.6 and Rust 1.73.0, with this test that writes roughly 500MiB, I get:
Replacing I do believe the flamegraph, I'm just trying to figure out what might be going on. |
I do see from I also tried applying this patch (Linux only) to make pipe buffers bigger and it didn't appear to make things faster or slower -- I suspect the limit is not that. This post suggests that tweaking buffer sizes etc will not make a difference, and the main way to make things faster than Unless there are some easy wins elsewhere I don't think we can make this better unfortunately. |
Does nextest use single-thread tokio runtime and does it spawn the wait tasks so that it can be run in parallel? |
Nextest uses a multithreaded runtime, and each test is its own task. However, reading from stdout and stderr are not split up into their own tasks. |
I think splitting reading stdout/stderr might be beneficial if the tests writes a lot to both stdout and stderr. |
First things first : for a reproducer you can...
...then run the following commands: # Only observable with backtraces, which greatly increase the stdout/stderr output
export RUST_BACKTRACE=1
# Takes 2.3s on an AMD R5 5600G
cargo test ffi::int::tests -- --test-threads=1
# Takes 18.3s on the same CPU as above
cargo nextest run -j1 -E 'test(/ffi::int::tests/)' I'm using only 1 test thread here because it makes timings more reproducible and reduces the perf workload during profiling (which enables going to higher sampling rates). To be clear, the slowdown of enabling backtraces is already massive with cargo test (the same test run takes 0.3s without), and although it has implications that I don't like when running tests over multiple threads, replacing the standard panic hook with a no-op while running those tests is an acceptable workaround. What I find suspicious here is that the slowdown is almost an order of magnitude higher with nextest than with cargo test, even though in this particular scenario neither of them should be doing anything with the test output, they should be just collecting it in some buffer in order to display it in case of failure. Somehow, cargo test must use a different test output collection strategy that's a lot more efficient. Trying to investigate, a perf profile of the I wonder how they manage to avoid the pipe overhead, assuming that's the only issue here. Perhaps they redirect the test output to some tempfile instead of a pipe so the kernel can handle all the data transfers without needing round trips to user space? Or perhaps they manage to have libtest do the buffering in-process somehow, and only print the output when a failure occurs? To answer your first question, Also, knowing that modern RAM can transfer data at tens of GB/s on consumer hardware does put the ~10MiB/s data transfer rate that is implied by taking 18s to do what amounts to filling up a ~180 MB buffer in a less favorable light. |
Perhaps the difference is that the std::backtrace printer in the panic hook does finer-grained writes? Because stderr is unbuffered, its performance can vary greatly depending on write granularity. |
Your point about stderr being unbuffered is well-taken (and is easy to verify by looking at https://doc.rust-lang.org/src/std/io/stdio.rs.html -- |
Sounds pretty neat, is there a way nextest could leverage the same infrastructure? Or if not, is there a way the infrastructure could be modified to make it happen?
This could also be a fallback plan if the above, more general optimization can't be made to work. I'm wondering if this would fit the design constraints of std though: the other day I remember seeing a rust-lang issue that proposed turning OOM errors into panics, if that's the plan you probably don't want to allocate when printing the OOM backtrace... Of course, this is assuming all other allocations can be removed from the backtrace computation, which is perhaps a little optimistic. And maybe a statically allocated or stack-allocated buffer could be made to work? |
Hmm, so a thing I realized is that nextest currently always passes in |
While writing this, I sadly realized that --nocapture is pretty load bearing for tests that time out -- we want to show whatever output the test produced by the time it was killed, which means instructing libtest not to capture any output. I think the only way to resolve this is to add buffering upstream (though maybe that's out of the question). There's nothing that can be done on nextest's side as far as I can tell. |
How would you feel about an upstream feature that dumps the stdio buffer when a certain signal (say, SIGUSR1) is received? I think it is the minimal upstream change that would preserve optimal I/O performance in the common case, while allowing for the buffer to be dumped in case of timeout. |
What if a test sets a SIGUSR1 handler for itself? I don't know how common that is but it is a thing that's possible. To be honest I still don't see how the effort is worth it. This seems to be an edge case that is addressed by setting your own panic handler.
We can document this and maybe even detect it if we're getting a ton of small reads. |
I would obviously be fine with a "wontfix please silence your panics" resolution since I already have a panic handler that does this around ;) |
This is a FAILED experiment: see #1068.
So I was actually wrong here -- nextest used to use the same task for all tests. Curious how it is these days -- nextest actually now uses a separate task per test, and I suspect it improves your case considerably. |
In the standard panic configuration that I used at the time, where a backtrace is printed even if the panic is caught (which if you ask me is still a bad default that should be changed at the std level), nextest handles the stdio spam much better than before but still significantly worse than cargo test:
Of course, suppressing the backtrace as I did at the time still greatly improves performance for both test harnesses:
Interestingly,
|
Thanks. That's probably the best we can do for now -- this is the difference between cross-process and in-process pipe reads with lots of small writes. Glad that nextest is doing better now. |
I have some tests around which assert that panics occur when invalid inputs are sent to certain functions. These tests run several times slower under nextest than under cargo test when
RUST_BACKTRACE=1
, and a quick perf profile suggests that they spend most of their time waiting for nextest to read out the data that they have sent down the stdout/stderr pipe.The most likely reason why these tests are the only ones affected is that the default Rust panic handler sends lots of text down the pipe when backtraces are enabled, so these are probably my most chatty tests by a large margin.
In this particular situation, I can sidestep the problem entirely by replacing the Rust panic handler with a no-op during execution of these tests, which will eliminate the constant stdio traffic linked to the constant printing of backtraces. But it might be worth investigating whether nextest can handle heavy stdio traffic more efficiently by e.g. using larger pipe buffer sizes.
The text was updated successfully, but these errors were encountered: