From c4df8ad79cc62509af85311a6e03806d94f6c75b Mon Sep 17 00:00:00 2001
From: Tom de Vries <tdevries@suse.de>
Date: Fri, 22 Nov 2024 12:54:57 +0100
Subject: [PATCH] [gdb/build] Workaround tsan select bug

When building gdb with -O0 and -fsanitize-thread, I run into a large number of
timeouts caused by gdb hanging, for instance:
...
(gdb) continue^M
Continuing.^M
[Inferior 1 (process 378) exited normally]^M
FAIL: gdb.multi/stop-all-on-exit.exp: continue until exit (timeout)
...

What happens is the following:
- two inferiors are added, stopped at main
- inferior 1 is setup to exit after 1 second
- inferior 2 is setup to exit after 10 seconds
- the continue command is issued
- because of set schedule-multiple on, both inferiors continue
- the first inferior exits
- gdb sends a SIGSTOP to the second inferior
- the second inferior receives the SIGSTOP, and raises a SIGCHILD
- gdb calls select, and blocks
- the signal arrives, and interrupts select
- ThreadSanitizers signal handler is called, which marks the signal pending
  internally
- select returns -1 with errno == EINTR
- gdb calls select again, and blocks
- gdb hangs, waiting for gdb's sigchild_handler to be called

This is a bug [1] in ThreadSanitizer.  When select is called with
timeout == nullptr, it is blocking but ThreadSanitizer doesn't consider it so,
and consequently doesn't see the need to call sigchild_handler.

Work around this by:
- instead of using the blocking select variant, forcing a small timeout and
- upon timeout calling a function that ThreadSanitizer does consider
  blocking: usleep, forcing sigchild_handler to be called.

Tested on x86_64-linux.

PR build/32295
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32295

[1] https://github.com/google/sanitizers/issues/1813
---
 gdb/event-top.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/gdb/event-top.c b/gdb/event-top.c
index 9c0087adb10..cab6c848c7c 100644
--- a/gdb/event-top.c
+++ b/gdb/event-top.c
@@ -1344,6 +1344,31 @@ interruptible_select (int n,
   if (n <= fd)
     n = fd + 1;
 
+  bool tsan_forced_timeout = false;
+#if defined (__SANITIZE_THREAD__)
+  struct timeval tv;
+  if (timeout == nullptr)
+    {
+      /* A nullptr timeout means select is blocking, and ThreadSanitizer has
+	 a bug that it considers select non-blocking, and consequently when
+	 intercepting select it will not call signal handlers for pending
+	 signals, and gdb will hang in select waiting for those signal
+	 handlers to be called.
+
+	 Filed here ( https://github.com/google/sanitizers/issues/1813 ).
+
+	 Work around this by:
+	 - forcing a small timeout, and
+	 - upon timeout calling a function that ThreadSanitizer does consider
+	   blocking: usleep, forcing signal handlers to be called for pending
+	   signals.  */
+      tv.tv_sec = 0;
+      tv.tv_usec = 1000;
+      timeout = &tv;
+      tsan_forced_timeout = true;
+    }
+#endif
+
   {
     fd_set ret_readfds, ret_writefds, ret_exceptfds;
     struct timeval ret_timeout;
@@ -1359,6 +1384,12 @@ interruptible_select (int n,
 	if (res == -1 && errno == EINTR)
 	  continue;
 
+	if (tsan_forced_timeout && res == 0)
+	  {
+	    usleep (0);
+	    continue;
+	  }
+
 	break;
       }