From c4df8ad79cc62509af85311a6e03806d94f6c75b Mon Sep 17 00:00:00 2001 From: Tom de Vries Date: Fri, 22 Nov 2024 12:54:57 +0100 Subject: [PATCH] [gdb/build] Workaround tsan select bug When building gdb with -O0 and -fsanitize-thread, I run into a large number of timeouts caused by gdb hanging, for instance: ... (gdb) continue^M Continuing.^M [Inferior 1 (process 378) exited normally]^M FAIL: gdb.multi/stop-all-on-exit.exp: continue until exit (timeout) ... What happens is the following: - two inferiors are added, stopped at main - inferior 1 is setup to exit after 1 second - inferior 2 is setup to exit after 10 seconds - the continue command is issued - because of set schedule-multiple on, both inferiors continue - the first inferior exits - gdb sends a SIGSTOP to the second inferior - the second inferior receives the SIGSTOP, and raises a SIGCHILD - gdb calls select, and blocks - the signal arrives, and interrupts select - ThreadSanitizers signal handler is called, which marks the signal pending internally - select returns -1 with errno == EINTR - gdb calls select again, and blocks - gdb hangs, waiting for gdb's sigchild_handler to be called This is a bug [1] in ThreadSanitizer. When select is called with timeout == nullptr, it is blocking but ThreadSanitizer doesn't consider it so, and consequently doesn't see the need to call sigchild_handler. Work around this by: - instead of using the blocking select variant, forcing a small timeout and - upon timeout calling a function that ThreadSanitizer does consider blocking: usleep, forcing sigchild_handler to be called. Tested on x86_64-linux. PR build/32295 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32295 [1] https://github.com/google/sanitizers/issues/1813 --- gdb/event-top.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/gdb/event-top.c b/gdb/event-top.c index 9c0087adb10..cab6c848c7c 100644 --- a/gdb/event-top.c +++ b/gdb/event-top.c @@ -1344,6 +1344,31 @@ interruptible_select (int n, if (n <= fd) n = fd + 1; + bool tsan_forced_timeout = false; +#if defined (__SANITIZE_THREAD__) + struct timeval tv; + if (timeout == nullptr) + { + /* A nullptr timeout means select is blocking, and ThreadSanitizer has + a bug that it considers select non-blocking, and consequently when + intercepting select it will not call signal handlers for pending + signals, and gdb will hang in select waiting for those signal + handlers to be called. + + Filed here ( https://github.com/google/sanitizers/issues/1813 ). + + Work around this by: + - forcing a small timeout, and + - upon timeout calling a function that ThreadSanitizer does consider + blocking: usleep, forcing signal handlers to be called for pending + signals. */ + tv.tv_sec = 0; + tv.tv_usec = 1000; + timeout = &tv; + tsan_forced_timeout = true; + } +#endif + { fd_set ret_readfds, ret_writefds, ret_exceptfds; struct timeval ret_timeout; @@ -1359,6 +1384,12 @@ interruptible_select (int n, if (res == -1 && errno == EINTR) continue; + if (tsan_forced_timeout && res == 0) + { + usleep (0); + continue; + } + break; }