Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use anonymous pipes for stdout/stderr redirection. #2472

Open
wants to merge 5 commits into
base: devel
Choose a base branch
from

Conversation

davidmatson
Copy link

@davidmatson davidmatson commented Jun 30, 2022

Closes #2444.

Avoid the overhead of creating and deleting temporary files.
Anonymous pipes have a limited buffer and require an active reader to
ensure the writer does not become blocked. Use a separate thread to
ensure the buffer does not get stuck full.

A couple of questions:

  1. Is adding C++20-conditional usage of jthread the recommended approach here? (The code as written prefers std::jthread when available but falls back to a custom implementation when needed.)
  2. Is adding CATCH_SYSTEM_ERROR reasonable here? (system_error seems to be the best exception type for these APIs)

@codecov
Copy link

codecov bot commented Jun 30, 2022

Codecov Report

Merging #2472 (3b474ce) into devel (5a1ef7e) will decrease coverage by 0.20%.
The diff coverage is 90.00%.

@@            Coverage Diff             @@
##            devel    #2472      +/-   ##
==========================================
- Coverage   91.49%   91.28%   -0.20%     
==========================================
  Files         181      185       +4     
  Lines        7519     7596      +77     
==========================================
+ Hits         6879     6934      +55     
- Misses        640      662      +22     

Closes catchorg#2444.

Avoid the overhead of creating and deleting temporary files.
Anonymous pipes have a limited buffer and require an active reader to
ensure the writer does not become blocked. Use a separate thread to
ensure the buffer does not get stuck full.
@davidmatson
Copy link
Author

@horenmar - Please let me know what you think of how this approached worked out.

@davidmatson
Copy link
Author

@horenmar - it looks like you triggered a new validation run (thanks!). For this one:
Linux builds (complex) / CMake configuration tests, clang++-10, C++14 Debug (pull_request)
I'm guessing that's a new failure in this PR, but I'm not sure what the root cause is. The error message says:

Linking CXX executable SelfTest
/usr/bin/ld: ../src/libCatch2d.a(catch_output_redirect.cpp.o): in function `std::thread::thread<Catch::OutputFileRedirector::OutputFileRedirector(_IO_FILE*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}, , void>(Catch::OutputFileRedirector::OutputFileRedirector(_IO_FILE*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)::{lambda()#1}&&)':
/usr/include/c++/9/thread:126: undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
make[2]: *** [tests/CMakeFiles/SelfTest.dir/build.make:867: tests/SelfTest] Error 1
make[1]: *** [CMakeFiles/Makefile2:927: tests/CMakeFiles/SelfTest.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

For pthread_create - that's not something I wrote directly (actually I can't find "pthread_create" at all in this repo). I'm guessing it's some kind of build setting needed on Linux since it's using threads. The closest reference I could find in the repo was:

#if defined(_REENTRANT) || defined(_MSC_VER)
// Enable async processing, as -pthread is specified or no additional linking is required
# define CATCH_INTERNAL_CONFIG_USE_ASYNC

Is there some place -pthread needs to be enabled in a build script somewhere for this change to work? (Enabling threading on Linux is something I'm not familiar with.) Thanks!

@horenmar
Copy link
Member

horenmar commented Aug 8, 2022

it looks like you triggered a new validation run (thanks!)

Technically, first one. 😃 GHA don't run by default for PRs by new contributors, because people are asshats and try to use free resources for mining. But that's not the important part.


The underlying issue is that libstdc++ (and IIRC libc++ as well) rely on pthread library for their implementation of std threading support. This means that building code that uses std::thread & friends requires passing -pthread/-lpthread* to the build.

The code you've found detects whether Catch2 is being compiled in a way that allows us to optionally use threads (MSVC always has thread support, _REENTRANT is defined by GCC/Clang when compiling a TU with -pthread).

Enabling thread support for a target in CMake is simple enough, but the issue is whether doing so is something I want to do.

  1. Catch2 tries to support platforms with limited stdlib support (e.g. iostreams are optional, RTTI is optional, exceptions are optional), and I don't think that this is enough motivation to make threads an unconditional requirement.
  2. The amalgamated distribution can't actually affect its compilation options in any way.

This would mean that the threading requirement would have to be optional and the implementation of the experimental capture would have to properly handle the option that there are no threads enabled.

One more thing I don't like about using threads in the experimental capture is that given the issue in 1), I don't think it could ever be promoted into non-experimental implementation.


* Actually it is slightly more complex, as the compiler flag does something different than the linker flag, and their behaviour still differs between GCC and Clang. Let nobody say that the C++ toolchains are sane.

@davidmatson
Copy link
Author

davidmatson commented Aug 8, 2022

@horenmar - thanks for the reply!

With pipes, something needs to keep reading the pipe, or the OS buffer can get full and the writer will block.

The only solutions I'm aware of for this problem are:
a. Use a dedicated thread for each pipe, to ensure it keeps getting drained.
b. Use an asynchronous read/poll on each pipe, and switch to reading from it when it is signaled.
c. Ensure that the total amount of data ever written to a pipe is smaller than the OS buffer.

We could investigate option b, though I suspect platform support for non-blocking reads is less common than threads.

I can understand the desire to support Catch2 on more-limited platforms and thus avoid the use of threads as a hard requirement. At the same time, I think for capturing stdout/stderr, pipes are the best solution, as they avoid the problems with temp files we've seen earlier. Other currently problems with files seem likely either just to go away or be more readily soleable as well (perhaps including #1902, #1991).

A few overall options I can think of:

  1. Only support redirecting stdout/stderr on platforms that have support for threads.
  2. Add a polling (async read when signaled) implementation, and only support redirecting stdout/stderr on platforms that have support for threads or polling.
  3. On platforms where threads are not available, rely on the OS buffer and document that stdout/stderr redirection requires threads for larger output volume.
  4. Detect which implementation to use at compile-time, and fallback to a file-based solution when threads are unavailable.

There may be other options I'm missing as well.

I tend to think option 4 is likely the best option - thoughts?

EDIT:
Perhaps depending on threads isn't really all that different in terms of whether the experimental implementation can become non-experimental. From this earlier thread:
#2444 (comment)

It [the current experimental implementation] has some compilation issues on less common platforms (e.g. missing posix parts), and IIRC runs into issues when too many tests are run in single binary (IIRC that's due to trying to open too many temp files).

Maybe any implementation that goes all the way down to the handle level is going not to be supported on the most limited platforms, so maybe saying such features only work on more full-featured platforms is just fine, and in keeping with what's possible without this PR anyway. (i.e., so maybe option 1 is more attractive) Thoughts?

@davidmatson
Copy link
Author

@horenmar - one option is just to go with the original plan and make the pipes implementation be just for Windows, and keep the existing temp file-based code for other environments. (Though since basically the same code works on Linux as well, when threads are enabled, it would probably make the most sense to have the #if be something like "MSVC/Windows or Pthreads enabled", since that works too.)

I'm not sure what types of more constrained systems you're most interested in supporting, so I'm curious to hear your thoughts on how to consider more limited systems. On that topic, it's worth mentioning that there may be some systems where pipes + threads would be a better option than temp files. For example, Stroustrup mentions in chapter 15 of The C++ Programming Language how some systems compile without using files. At runtime, some systems have no filesystem (or may have a read-only filesystem). Interestingly, I found one person looking for a unit testing framework for a system with no filesystem, and the one he found didn't work because it used a temp file to redirect stdout/stderr:
nemequ/munit#69

So I think which technique works best depends on the system, and I'm not sure which of the following systems are in-scope for what you'd want to support:

System Best Option
Windows Pipes + Threads
Other full-featured system, with PThreads Pipes + Threads
Other full-featured system, without PThreads Temp files
Limited system with pipes and threads Pipes + Threads
Limited system without threads or pipes, but with a filesystem (that's writable) Temp files
Limited system without threads, pipes, or a writeable filesystem Redirect std::cout/cerr/clog only (can't redirect stdout/stderr)

Copy link
Member

@horenmar horenmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but it will have to be a parallel implementation to the current temp-file based approach, so that there will be 3 different capture impls in the end.

(Also please set up your IDE to use .clang-format from the repo)

src/catch2/internal/catch_output_redirect.cpp Outdated Show resolved Hide resolved
src/catch2/internal/catch_output_redirect.cpp Outdated Show resolved Hide resolved
src/catch2/internal/catch_output_redirect.cpp Outdated Show resolved Hide resolved
src/catch2/internal/catch_output_redirect.cpp Outdated Show resolved Hide resolved
src/catch2/internal/catch_output_redirect.cpp Show resolved Hide resolved
@horenmar
Copy link
Member

horenmar commented Sep 2, 2022

Is adding C++20-conditional usage of jthread the recommended approach here? (The code as written prefers std::jthread when available but falls back to a custom implementation when needed.)

I am not convinced that a quarter-baked jthread polyfill is a good idea. Given that the only feature used from it is defaulting to join, I would rather see the OutputFileRedirector join in destructor by itself.

Is adding CATCH_SYSTEM_ERROR reasonable here? (system_error seems to be the best exception type for these APIs)

🤷 I don't think it will provide useful error messages like this, but that can be fixed later on.

Keep previous implementation when threads are not available.
@davidmatson
Copy link
Author

I am not convinced that a quarter-baked jthread polyfill is a good idea. Given that the only feature used from it is defaulting to join, I would rather see the OutputFileRedirector join in destructor by itself.

Sure, I've moved it just to join in the destructor.

@davidmatson
Copy link
Author

I'm not sure how much coverage there is for both variations of CATCH_CONFIG_NEW_CAPTURE (temp file vs pipe/thread). Ideally, we'd have:

  1. Temp file on Linux (CATCH_CONFIG_NEW_CAPTURE but not threads)
  2. Pipe/thread on Windows (CATCH_CONFIG_NEW_CAPTURE and CATCH_INTERNAL_CONFIG_USE_ASYNC).
  3. Pipe/thread on Linux (CATCH_CONFIG_NEW_CAPTURE and CATCH_INTERNAL_CONFIG_USE_ASYNC).
    and CI would run the testConfigureExperimentalRedirect for each of these cases. I'm not sure what happens currently and which portions of the pipe vs temp file are validated in terms of either compiling correctly or working functionally by CI. (I don't have a way set up to do that locally.)

@davidmatson
Copy link
Author

@horenmar - any further thoughts on this PR? thanks!

@davidmatson
Copy link
Author

@horenmar - does this PR look good to you? Thanks!

@horenmar horenmar added the Finish before break Things to finish before I am taking a long break label Jun 25, 2023
@horenmar
Copy link
Member

So I finally got around to this after redoing the redirect to avoid issues
with combining non-capturing reporters writing to stdout and capturing
reporters in f24569a.

I refactored the code here on top of the current interface in branch
devel-piped-redirect, so that there is just 1 thread and 1 pipe per
standard output stream and test run. Reading the original issue, this is
supposed to be a performance optimization compared to the tempfile based
redirect, so I ran

Measure-Command {start-process .\SelfTest.exe -argumentlist "-r junit -o NUL" -wait}

against binary compiled on the above mentioned branch, and against binary
configured to use the file redirect.

time to run in ms

Release Debug
pipe + Junit 1014 4066
pipe + cons 1015 3038
file + Junit 1023 4068
file + cons 1026 3037

From this, the difference does not seem to rise beyond noise.

I am also seeing an issue, where with the optimized implementation,
the output does not always show up for the reporter -> I assume this is
because there is now a race between the reading thread getting the output
and storing it in the buffer, and the main thread that wants to get the
buffer.

You can play with this with the "has printf" test in Tricky.tests.cpp,
where I added an explicit flush into the test and then a 1 second sleep,
and the output still does not sometimes show. If I remove either of these,
the output does not show most of the time, if both are removed, the output
shows only very rarely.

I am reasonably sure I can come up with a more complex synchronization
scheme between the reading and main threads to ensure that the print is
always present, but I see two issues in that:

  • The additional synchronization overhead is likely to bring down the
    performance further.
  • This is increasing the complexity of the implementation.

Given this, I am currently not convinced that this is worth merging - the
one utility I can see is for platforms where it is impossible to create
tempfiles, but it is possible to create pipes. I am currently unaware of
any such platform, but that does not mean it does not exist.

@davidmatson
Copy link
Author

@horenmar - thanks for taking a look!

I think the benefits of a pipes approach would be functional as much as, or more than, performance. For example, printf working, as your test shows, or launching a child process (at least on Windows) that writes to the same stdout/stderr stream.

I've had situations where we needed to do this kind of redirection before. It can be a little tricky to get it working correctly, but once it's working reliably, we haven't had further issues. My best guess is there's a missing flush (or assumption of futher buffering) at a higher layer that's causing the missing writes. For example, I think it's more complex to get this code right with a combined thread and pipe for stdout vs stderr - I tend to think it's simpler and more reliable to have distinct channels for each. Did you see cases where the original PR here was sometimes not getting all the text through?

@horenmar
Copy link
Member

Re functionality:
Catch2 can already handle capturing printf using the file based redirect, but it is not enabled by default, because some platforms don't support temp files/dup/etc. Pipe based redirect also would not be enabled by default due to the dependency on threading support, so merging this would not change the actual capabilities outside of platforms where threads and pipes are available, but temp files are not.

When the original issue was open, the temp files implementation was already there, but it could not handle running many tests in single process (the limit was platform dependent, but even the SelfTest binary would reach it very quickly). The underlying issue was that Catch2 would create at least 2 temp files per partial test case run, and quickly run into temp file limit.

However this limitation was lifted with the linked commit, so the temp file approach works perfectly on Windows, and reasonably well on Linux*.

For example, I think it's more complex to get this code right with a combined thread and pipe for stdout vs stderr - I tend to think it's simpler and more reliable to have distinct channels for each.

That's what is in the updated branch as well, I am not going to write code polling multiple fds unless I really have to 😃

Did you see cases where the original PR here was sometimes not getting all the text through?

No, but the performance was completely awful. In the devel-piped-redirect branch, there are three commits -> the first one updates this PR naively to the current interface, the second one refactors it for reasonable performance and the third one contains some changes to the test with printf. The first commit does not lose any output, but because it closes the pipe and joins the thread every time the capture is deactivated, it takes 45s to run SelfTest in Debug build with Junit reporter configured to write to NUL. For Release build it was 38s.

The refactoring in second commit makes it so that when the capture is deactivated, the main thread first flushes the FD, then replaces it with the original FD of that stream, and then gets the buffer from the reading thread. However, it is simple to lose output here, because the reading thread might not have been scheduled to finish reading the pipe yet, before the main thread takes the mutex protecting the combined output.


* Older libcs also exhibit a lost-print issue, but only when running many tests in single process. I never reproduced it when running just a single test in the process.

@davidmatson
Copy link
Author

Thanks for the additional context.

With the original approach, I'm guessing it was the thread that was the higher cost than the pipe creation - do we know which was expensive, or was it both?

Would setting up the pipes/threads at process startup, and keeping them installed until process cleanup, be an option? Then it could just capture whatever had been passed in since the last test output was connected; if something in the stack failed to flush, it would still be captured but would just show up in the next test instead of getting lost. Would that be any better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Finish before break Things to finish before I am taking a long break
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stdout/stderr is not captured (only cout/cerr are)
2 participants