Skip to content

Commit

Permalink
stop_all_threads: (re-)enable async before waiting for stops
Browse files Browse the repository at this point in the history
Running the
gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase
added later in the series against gdbserver, after the
TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run
into an infinite loop in stop_all_threads, leading to a timeout:

  FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout)

The is really a latent bug, and it is about the fact that
stop_all_threads stops listening to events from a target as soon as it
sees a TARGET_WAITKIND_NO_RESUMED, ignoring that
TARGET_WAITKIND_NO_RESUMED may be delayed.  handle_no_resumed knows
how to handle delayed no-resumed events, but stop_all_threads was
never taught to.

In more detail, here's what happens with that testcase:

#1 - Multiple threads report breakpoint hits to gdb.

openhwgroup#2 - gdb picks one events, and it's for thread 1.  All other stops are
     left pending.  thread 1 needs to move past a breakpoint, so gdb
     stops all threads to start an inline step over for thread 1.
     While stopping threads, some of the threads that were still
     running report events that are also left pending.

openhwgroup#2 - gdb steps thread 1

openhwgroup#3 - Thread 1 exits while stepping (it steps over an exit syscall),
     gdbserver reports thread exit for thread 1

openhwgroup#4 - Thread 1 was the last resumed thread, so gdbserver also reports
     no-resumed:

    [remote]   Notification received: Stop:w0;p3445d0.3445d3
    [remote] Sending packet: $vStopped#55
    [remote] Packet received: N
    [remote] Sending packet: $vStopped#55
    [remote] Packet received: OK

openhwgroup#5 - gdb processes the thread exit for thread 1, finishes the step
     over and restarts threads.

openhwgroup#6 - gdb picks the next event to process out of one of the resumed
     threads with pending events:

    [infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting openhwgroup#11

openhwgroup#7 - This is again a breakpoint hit and the breakpoint needs to be
     stepped over too, so gdb starts a step-over dance again.

openhwgroup#8 - We reach stop_all_threads, which finds that some threads need to
     be stopped.

openhwgroup#9 - wait_one finally consumes the no-resumed event queue by openhwgroup#4.
     Seeing this, wait_one disable target async, to stop listening for
     events out of the remote target.

openhwgroup#10 - We still haven't seen all the stops expected, so
      stop_all_threads tries another iteration.

openhwgroup#11 - Because the remote target is no longer async, and there are no
      other targets, wait_one return no-resumed immediately without
      polling the remote target.

openhwgroup#12 - We still haven't seen all the stops expected, so
      stop_all_threads tries another iteration.  goto openhwgroup#11, looping
      forever.

Fix this by explicitly enabling/re-enabling target async on targets
that can async, before waiting for stops.

Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f
  • Loading branch information
palves committed Nov 13, 2023
1 parent 21d4830 commit d828dbe
Showing 1 changed file with 81 additions and 0 deletions.
81 changes: 81 additions & 0 deletions gdb/infrun.c
Original file line number Diff line number Diff line change
Expand Up @@ -5202,6 +5202,8 @@ wait_one ()
if (nfds == 0)
{
/* No waitable targets left. All must be stopped. */
infrun_debug_printf ("no waitable targets left");

target_waitstatus ws;
ws.set_no_resumed ();
return {nullptr, minus_one_ptid, std::move (ws)};
Expand Down Expand Up @@ -5461,6 +5463,83 @@ handle_one (const wait_one_event &event)
return false;
}

/* Helper for stop_all_threads. wait_one waits for events until it
sees a TARGET_WAITKIND_NO_RESUMED event. When it sees one, it
disables target_async for the target to stop waiting for events
from it. TARGET_WAITKIND_NO_RESUMED can be delayed though,
consider, debugging against gdbserver:
#1 - Threads 1-5 are running, and thread 1 hits a breakpoint.
#2 - gdb processes the breakpoint hit for thread 1, stops all
threads, and steps thread 1 over the breakpoint. while
stopping threads, some other threads reported interesting
events, which were left pending in the thread's objects
(infrun's queue).
#2 - Thread 1 exits (it stepped an exit syscall), and gdbserver
reports the thread exit for thread 1. The event ends up in
remote's stop reply queue.
#3 - That was the last resumed thread, so gdbserver reports
no-resumed, and that event also ends up in remote's stop
reply queue, queued after the thread exit from #2.
#4 - gdb processes the thread exit event, which finishes the
step-over, and so gdb restarts all threads (threads with
pending events are left marked resumed, but aren't set
executing). The no-resumed event is still left pending in
the remote stop reply queue.
#5 - Since there are now resumed threads with pending breakpoint
hits, gdb picks one at random to process next.
#5 - gdb picks the breakpoint hit for thread 2 this time, and that
breakpoint also needs to be stepped over, so gdb stops all
threads again.
#6 - stop_all_threads counts number of expected stops and calls
wait_one once for each.
#7 - The first wait_one call collects the no-resumed event from #3
above.
#9 - Seeing the no-resumed event, wait_one disables target async
for the remote target, to stop waiting for events from it.
wait_one from here on always return no-resumed directly
without reaching the target.
#10 - stop_all_threads still hasn't seen all the stops it expects,
so it does another pass.
#11 - Since the remote target is not async (disabled in #9),
wait_one doesn't wait on it, so it won't see the expected
stops, and instead returns no-resumed directly.
#12 - stop_all_threads still haven't seen all the stops, so it
does another pass. goto #11, looping forever.
To handle this, we explicitly (re-)enable target async on all
targets that can async every time stop_all_threads goes wait for
the expected stops. */

static void
reenable_target_async ()
{
for (inferior *inf : all_inferiors ())
{
process_stratum_target *target = inf->process_target ();
if (target != nullptr
&& target->threads_executing
&& target->can_async_p ()
&& !target->is_async_p ())
{
switch_to_inferior_no_thread (inf);
target_async (1);
}
}
}

/* See infrun.h. */

void
Expand Down Expand Up @@ -5587,6 +5666,8 @@ stop_all_threads (const char *reason, inferior *inf)
if (pass > 0)
pass = -1;

reenable_target_async ();

for (int i = 0; i < waits_needed; i++)
{
wait_one_event event = wait_one ();
Expand Down

0 comments on commit d828dbe

Please sign in to comment.