Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement handling of blocking syscalls #7

Closed
joneschrisg opened this issue May 7, 2013 · 26 comments · Fixed by #179
Closed

Implement handling of blocking syscalls #7

joneschrisg opened this issue May 7, 2013 · 26 comments · Fixed by #179
Assignees
Milestone

Comments

@joneschrisg
Copy link
Contributor

From @rocallahan

The current implementation has a few problems but the main one is that it doesn't handle potentially-blocking syscalls at all. Blocking syscalls are tricky to handle; ideally when a wrapped syscall suspends its thread, the rr supervisor process would be invoked and handle the wrapped syscall much like a regular syscall --- we'd suspend the thread and to try to wake up another thread. However it's not easy to detect when a syscall will block. For example our read syscalls usually do not block, but might. My current best idea for handling this is to use perf to listen for context-switch events and when a context-switch event is triggered during a wrapped syscall, treat that as a signal that the syscall has blocked. I did some experiements monitoring context-switch events on test programs, and it looks promising, but I never got around to hacking that into rr.

This is a good idea because it doesn't introduce the concept of a "blocking syscall", which is a bit hard to define. Instead it handles all syscalls the same way. More details forthcoming.

@ghost ghost assigned joneschrisg Jun 26, 2013
@joneschrisg
Copy link
Contributor Author

I put together a PoC that signals the ptrace-er when a ptrace-ee has been context-switched a configurable number of times (we'd want that to be "1" for our purposes here). Initial testing suggests that the signals are being delivered as one would expect. The signaling works basically the same way as for the hpc interrupts, except that in our case here the trace-ee has to program the interrupt, from within the syscall buffer lib. There are some corner cases to figure out but this seems like it might work.

@joneschrisg
Copy link
Contributor Author

After a few more experiments, I'm now pretty confident we can make this work. My little test program is here. To make this "actually" work, we need to process the de-sched interrupts in the parent process and disarm the counter there, which is pretty straightforward but needs a bunch of nasty POSIX gloop to pull off. Will prove that out next, since that part must be bulletproof.

@rocallahan I should have asked this before I got started, but (i) was your approach about the same as this? (iI) if not, do you happen to have your old test program around?

@rocallahan
Copy link
Collaborator

You've already gone way beyond me --- I just had a little test program and manually ran "perf" to observe the progress of the context-switch counter.

@joneschrisg
Copy link
Contributor Author

The ptrace and UNIXbeard hackery is mostly working here, but I'm getting a seemingly-spurious desched notification that I don't understand yet. Or, there's an actual desched happening and I don't know where it's coming from. I have a hackaround that's not so bad and works, but will see if I can tell what's causing it.

@joneschrisg
Copy link
Contributor Author

The extra SIGIO doesn't seem to be an extra desched, because the switch counter is the same in the first and second. I also tried monitoring desched on syscalls that block for increasingly longer durations, and the extra SIGIO is always sent immediately after desched, not at arbitrary times. So I'm pretty close to ascribing this to magic for the time being and moving on.

@joneschrisg
Copy link
Contributor Author

Here's about the only hypothesis I have

  • child gets descheduled, bumps counter to i and schedules SIGIO
  • SIGIO notification "schedules" child, but it doesn't run
  • child is being ptraced, so we "deschedule" child to notify parent and bump counter to i+1
  • counter signal generated, but SIGIO is already pending so this one is queued
  • parent is notified and sees counter value i+1
  • parent stops delivery of first signal and shuts off counter
  • second SIGIO dequeued and delivered, notififying parent (counter is off, so no pseudo-desched possible here)
  • parent notified and sees counter value i+1 again
  • parent stops delivery of second SIGIO and we continue on

If this is what's happening, then the current hackaround should be a reliable way of dealing with this.

@joneschrisg
Copy link
Contributor Author

(I also confirmed that the $ip doesn't advance during the pseudo-step, when the program being traced isn't blocked on a syscall.)

@joneschrisg
Copy link
Contributor Author

I have a PoC for "rpc syscalls", where a tracee requests that the tracer make syscall(s) on its behalf, here. I think this is the last piece of technology we need.

@rocallahan
Copy link
Collaborator

Why is that needed?

@joneschrisg
Copy link
Contributor Author

We have to do a pretty tricky dance to set up the cs counter and syscall buffer region across record/replay. During recording, the counter fd has to be shared with rr, but in replay it should be emulated. This requires letting rr know the fd number that's to be shared. In both record and replay, the buffer region has to be actually opened and mmap'd by both the tracee and rr. rr also has to know it's the syscall buffer initialization code it's talking to. The cleanest way I could think of to implement this is with a fake syscall; use an unallocated syscall number (like -42) to trap into rr, then let rr decide what to do. That's what the PoC does.

I think we could make this work by keying off syscall param pattern matching and so forth, but I'd rather get away from that.

@rocallahan
Copy link
Collaborator

Sounds reasonable.

@joneschrisg
Copy link
Contributor Author

In other words, instead of assuming various things about the sequence of mmap/socket/recv/etc calls and patterns of parameters thereto, we could instead do something like

int counterfd = perf_event_open(...);
char* shmfile = "/dev/shm/rr-tracee-[tid]";
struct msghdr msg;
// set up msg
syscall(MAGIC_SYSCALL_BUF_INIT, counterfd, shmfile, &msg);

then rr decides what do with the params based on record vs. replay.

@joneschrisg
Copy link
Contributor Author

(Note to self: need to sigprocmask during this setup or things will get very hard.)

joneschrisg added a commit to joneschrisg/rr that referenced this issue Jun 29, 2013
…rr-debugger#7.

This also comments out the (already dead) futex() syscallbuf code,
since it turns the ABI for syscall6 wasn't quite right.
joneschrisg added a commit that referenced this issue Jun 29, 2013
Convert syscallbuf initialization into a "magic" rrcall, preparing for #7.
joneschrisg added a commit that referenced this issue Jul 1, 2013
"Anonymously" share shmem segment with tracee.  More prep work for #7.
@joneschrisg
Copy link
Contributor Author

(Note to self: investigate what happens when may-block syscalls are interrupted with signals.)

joneschrisg added a commit to joneschrisg/rr that referenced this issue Jul 2, 2013
…gger#7.

This gets the desched machinery working for recording, but we fall
over in replay because get a trace event at the first syscall stored
in the syscall buffer, instead of the one that the tracee was in when
it was desched'd.  This is fairly tricky, needs more thought.

A few more features were added in service of desched

* sharing desched counter fds
* new fcntl() sub-ops
* clean up buffer-flush replay code a bit
* buffering sched_yield() (for testing)
joneschrisg added a commit that referenced this issue Jul 2, 2013
Set up desched notifications and clear them. Another piece of #7.
@joneschrisg
Copy link
Contributor Author

All the necessary new technology is in place. What remains is

  • figure out the right buffer-flush dance to do during recording so that we know to replay the first buffered syscall, instead of blowing up trying to advance to where rr saw the desched event
  • see how this code works when signals are being delivered, revise as necessary

@joneschrisg
Copy link
Contributor Author

I think I got this licked

  • when rr receives the desched notification for task T at syscall s, record a buffer flush event ... but don't zero the byte counter. So then during replay, we stop at the first buffered syscall, and refill the buffer appropriately to let them finish.
  • record a normal syscall-entry into s. Since we haven't reset the record counter yet, just refilled the buffer, T will still compute the same record pointers during replay of the preamble of s as it did during recording.
  • continue normally, descheduling T if necessary, and advance to the syscall-exit of s. Record a normal syscall-exit event for T at s, including outparam data. This will in effect have T use the syscall buffer as undistinguished scratch space for the syscall s.
  • record a zero-syscallbuf-record-counter event. Now T can proceed using the syscall buffer "normally".

@joneschrisg
Copy link
Contributor Author

Hitting a scary problem now where finishing desched-interrupted syscalls with ptrace(SYSCALL) doesn't seem to be stopping at the syscall exit properly.

@joneschrisg
Copy link
Contributor Author

So the situation is pretty strange, and different for different syscalls. My experiments are here. There's a tracee that essentially looks like (in a loop)

arm_desched_event();  // ioctl(...);
[syscall(...)];
disarm_desched_event();  // ioctl(...);

And the the tracer is basically (in a loop)

waitpid(tracee);
// status 1
ptrace(SYSCALL, tracee);
waitpid(tracee);
// status 2
ptrace(SYSCALL, tracee);
waitpid(tracee);
// status 3
ptrace(SYSCALL, tracee);
waitpid(tracee);
// status 4

The behaviors I see for the different syscalls are

  • sched_yield()
    1. SIGIO (orig_eax = SYS_sched_yield; other registers look like syscall is finished)
    2. SIGIO (orig_eax = SYS_sched_yield; same regs)
    3. syscall trap (orig_eax = SYS_ioctl; regs look like syscall entry)
    4. syscall trap (orig_eax = SYS_ioctl; regs look like syscall exit)
  • write(stdout, '.', 1)
    1. SIGIO (orig_eax = SYS_write; other registers look like syscall is finished)
    2. SIGIO (orig_eax = SYS_write; same regs)
    3. syscall trap (orig_eax = SYS_ioctl; regs look like syscall entry)
    4. syscall trap (orig_eax = SYS_ioctl; regs look like syscall exit)
  • system(sleep 1) (this blocks on waitpid)
    1. SIGIO (orig_eax = SYS_waitpid; other registers don't look like syscall entry or exit)
    2. SIGIO (orig_eax = SYS_waitpid; other registers don't look like syscall entry or exit)
    3. syscall trap (orig_eax = SYS_waitpid; regs look like syscall entry)
    4. syscall trap (orig_eax = SYS_waitpid; regs look like syscall exit)
  • nanosleep(1sec)
    1. SIGIO (orig_eax = SYS_nanosleep; other registers don't look like syscall entry or exit)
    2. SIGIO (orig_eax = SYS_nanosleep; other registers don't look like syscall entry or exit)
    3. syscall trap (orig_eax = 0; regs look like syscall entry)
    4. syscall trap (orig_eax = 0; regs look like syscall exit)

In the last, an orig_eax of 0 probably means SYS_restart_syscall.

The first two behaviors look like a syscall that finishes, but the task is descheduled on syscall exit. So the tracer just sees the exit. Then we enter and exit the disarm_desched_event() ioctl.

The second two look like actual blocking; the SIGIO seems to arrive before the syscall "really starts" (no idea what that might mean in the kernel). We "restart" the syscall and enter it, and then we see it finish.

I don't understand the SYS_restart_syscall complication.

The annoying things are

  • in the finish-and-preempt cases, we can't access the tracee regs on syscall entry. This makes the plan above trickier. But not a major obstacle.
  • I don't see a way yet to reliably distinguish the finished-but-preempted reg state from the not-yet-entered reg state. We can safely assume that the next syscall following the one seen on SIGIO is always ioctl, which lets us machete-hack our way out of the problem. Ugly though. Will poke more.
  • the SYS_restart_syscall wrinkle is puzzling, but seems relatively easy to paper over

@joneschrisg
Copy link
Contributor Author

I'm getting headaches thinking through the changes I need to make to the recorder code, so I'm afraid I'm going to need to take a detour for some rewritin' :/

@joneschrisg
Copy link
Contributor Author

take a detour for some rewritin'

(That proved to be more than I wanted to chew on right now too. But will need to happen soon.)

The first two behaviors look like a syscall that finishes, but the task is descheduled on syscall exit. So the tracer just sees the exit. Then we enter and exit the disarm_desched_event() ioctl.

This proved to be quite easy to handle: we step over the extraneous SIGIO (status 2 above) and then ptrace(SYSCALL) to see what status 3 we got. If it's for sure the ioctl for disarm_desched_event(), which is fairly simple to check, then we just run the ioctl to completion. (It won't block.)

The cute part is that we don't have to record any trace data or change any execution state. The syscall buffer must not have been full for us to arm the desched event, therefore it doesn't need flushing. And the replayer doesn't need any extra data to know to step over the desched ioctls during replay, so we can just carry on! The next "normal" buffer flush will record what we want.

The other case won't be so easy though ...

@rocallahan
Copy link
Collaborator

Hmm, so will we have to do these two extra ioctls around every potentially-blocking syscall? That's very unfortunate.

@joneschrisg
Copy link
Contributor Author

I have an idea for how we can avoid that, but it's going to be a bit until I can prove it out. Note though that (i) rr has to do something similar already with the hpc counters; (ii) ptrace traps are not cheap compared to this ioctl; (iii) rr invokes a bunch of not-so-cheap syscalls (many unnecessarily) to process traps now anyways. My guess is that we'll see considerably better perf by wrapping as many common syscalls as we can non-optimally, compared to now. But we'll soon see :).

@rocallahan
Copy link
Collaborator

True.

bernhardu added a commit to bernhardu/rr that referenced this issue Jun 23, 2021
These two tests are stuck:
  nested_detach-no-syscallbuf
  nested_detach-32-no-syscallbuf

For some reason just the no-syscallbuf variants get stuck,
the regular ones succeed.

$ bin/rr record -n bin/rr record --nested=detach bin/simple

PID 1305400 uses 100% CPU, until timeout kills all.

1305387    └─ bin/rr record -n bin/rr record --nested=detach bin/simple  12
1305399       ├─ bin/rr record --nested=detach bin/simple   1
1305400       └─ bin/rr record --nested=detach bin/simple   1

$ gdb -q --pid 1305400
(gdb) add-symbol-file /usr/lib/x86_64-linux-gnu/libasan.so.6.0.0 0x7f818ad37a60
(gdb) add-symbol-file .../x86_64_asan/obj/bin/rr 0x564caa04e830
(gdb) add-symbol-file .../x86_64_asan/obj/lib/rr/librrpreload.so 0x7f818ad030d0
(gdb) stepi
0x00007f818ad50339      4150      int res = REAL(pthread_mutex_lock)(m);
1: x/i $pc
=> 0x7f818ad50339 <__interceptor_pthread_mutex_lock(void*)+25>: jmp    *0xf4311(%rip)        # 0x7f818ae44650
(gdb) bt
#0  0x00007f818ad50339 in __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4150
rr-debugger#1  __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4146
rr-debugger#2  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#3  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
rr-debugger#4  0x0000564caa2b9ed2 in rr::RecordSession::RecordSession (this=0x619000003280, exe_path=..., argv=..., envp=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30, bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false) at .../rr/src/RecordSession.cc:2296
rr-debugger#5  0x0000564caa2b92ea in rr::RecordSession::create (argv=..., extra_env=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30 '\036', bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false, force_asan_active=false) at .../rr/src/RecordSession.cc:2261
rr-debugger#6  0x0000564caa29cd01 in rr::record (args=..., flags=...) at .../rr/src/RecordCommand.cc:632
rr-debugger#7  0x0000564caa29e90f in rr::RecordCommand::run (this=0x564caa956fe0, args=...) at .../rr/src/RecordCommand.cc:791
rr-debugger#8  0x0000564caa614909 in main (argc=4, argv=0x7ffd2a91dbe8) at .../rr/src/main.cc:271
...
(gdb) stepi
pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
24        mutex->__data.__kind &= ~PTHREAD_MUTEX_PRIO_INHERIT_NP;
1: x/i $pc
=> 0x7f818ad08660 <pthread_mutex_lock>: mov    0x4ba9(%rip),%rax        # 0x7f818ad0d210
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) stepi
74        return __pthread_mutex_lock(mutex);
1: x/i $pc
=> 0x7f818ad08678 <pthread_mutex_lock+24>:      jmp    0x7f818ad03070
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:74
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) print environ
$1 = (char **) 0x0

$ cat /proc/1305400/environ | tr '\0' '\n' | grep LD_PRELOAD
LD_PRELOAD=libasan.so.6:.../x86_64_asan/obj/bin/../lib/rr/librrpreload.so:::
bernhardu added a commit to bernhardu/rr that referenced this issue Jun 24, 2021
These two tests are stuck:
  nested_detach-no-syscallbuf
  nested_detach-32-no-syscallbuf

For some reason just the no-syscallbuf variants get stuck,
the regular ones succeed.

$ bin/rr record -n bin/rr record --nested=detach bin/simple

PID 1305400 uses 100% CPU, until timeout kills all.

1305387    └─ bin/rr record -n bin/rr record --nested=detach bin/simple  12
1305399       ├─ bin/rr record --nested=detach bin/simple   1
1305400       └─ bin/rr record --nested=detach bin/simple   1

$ gdb -q --pid 1305400
(gdb) add-symbol-file /usr/lib/x86_64-linux-gnu/libasan.so.6.0.0 0x7f818ad37a60
(gdb) add-symbol-file .../x86_64_asan/obj/bin/rr 0x564caa04e830
(gdb) add-symbol-file .../x86_64_asan/obj/lib/rr/librrpreload.so 0x7f818ad030d0
(gdb) stepi
0x00007f818ad50339      4150      int res = REAL(pthread_mutex_lock)(m);
1: x/i $pc
=> 0x7f818ad50339 <__interceptor_pthread_mutex_lock(void*)+25>: jmp    *0xf4311(%rip)        # 0x7f818ae44650
(gdb) bt
#0  0x00007f818ad50339 in __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4150
rr-debugger#1  __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4146
rr-debugger#2  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#3  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
rr-debugger#4  0x0000564caa2b9ed2 in rr::RecordSession::RecordSession (this=0x619000003280, exe_path=..., argv=..., envp=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30, bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false) at .../rr/src/RecordSession.cc:2296
rr-debugger#5  0x0000564caa2b92ea in rr::RecordSession::create (argv=..., extra_env=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30 '\036', bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false, force_asan_active=false) at .../rr/src/RecordSession.cc:2261
rr-debugger#6  0x0000564caa29cd01 in rr::record (args=..., flags=...) at .../rr/src/RecordCommand.cc:632
rr-debugger#7  0x0000564caa29e90f in rr::RecordCommand::run (this=0x564caa956fe0, args=...) at .../rr/src/RecordCommand.cc:791
rr-debugger#8  0x0000564caa614909 in main (argc=4, argv=0x7ffd2a91dbe8) at .../rr/src/main.cc:271
...
(gdb) stepi
pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
24        mutex->__data.__kind &= ~PTHREAD_MUTEX_PRIO_INHERIT_NP;
1: x/i $pc
=> 0x7f818ad08660 <pthread_mutex_lock>: mov    0x4ba9(%rip),%rax        # 0x7f818ad0d210
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) stepi
74        return __pthread_mutex_lock(mutex);
1: x/i $pc
=> 0x7f818ad08678 <pthread_mutex_lock+24>:      jmp    0x7f818ad03070
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:74
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) print environ
$1 = (char **) 0x0

$ cat /proc/1305400/environ | tr '\0' '\n' | grep LD_PRELOAD
LD_PRELOAD=libasan.so.6:.../x86_64_asan/obj/bin/../lib/rr/librrpreload.so:::
bernhardu added a commit to bernhardu/rr that referenced this issue Jun 25, 2021
These two tests are stuck:
  nested_detach-no-syscallbuf
  nested_detach-32-no-syscallbuf

For some reason just the no-syscallbuf variants get stuck,
the regular ones succeed.

$ bin/rr record -n bin/rr record --nested=detach bin/simple

PID 1305400 uses 100% CPU, until timeout kills all.

1305387    └─ bin/rr record -n bin/rr record --nested=detach bin/simple  12
1305399       ├─ bin/rr record --nested=detach bin/simple   1
1305400       └─ bin/rr record --nested=detach bin/simple   1

$ gdb -q --pid 1305400
(gdb) add-symbol-file /usr/lib/x86_64-linux-gnu/libasan.so.6.0.0 0x7f818ad37a60
(gdb) add-symbol-file .../x86_64_asan/obj/bin/rr 0x564caa04e830
(gdb) add-symbol-file .../x86_64_asan/obj/lib/rr/librrpreload.so 0x7f818ad030d0
(gdb) stepi
0x00007f818ad50339      4150      int res = REAL(pthread_mutex_lock)(m);
1: x/i $pc
=> 0x7f818ad50339 <__interceptor_pthread_mutex_lock(void*)+25>: jmp    *0xf4311(%rip)        # 0x7f818ae44650
(gdb) bt
#0  0x00007f818ad50339 in __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4150
rr-debugger#1  __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4146
rr-debugger#2  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#3  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
rr-debugger#4  0x0000564caa2b9ed2 in rr::RecordSession::RecordSession (this=0x619000003280, exe_path=..., argv=..., envp=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30, bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false) at .../rr/src/RecordSession.cc:2296
rr-debugger#5  0x0000564caa2b92ea in rr::RecordSession::create (argv=..., extra_env=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30 '\036', bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false, force_asan_active=false) at .../rr/src/RecordSession.cc:2261
rr-debugger#6  0x0000564caa29cd01 in rr::record (args=..., flags=...) at .../rr/src/RecordCommand.cc:632
rr-debugger#7  0x0000564caa29e90f in rr::RecordCommand::run (this=0x564caa956fe0, args=...) at .../rr/src/RecordCommand.cc:791
rr-debugger#8  0x0000564caa614909 in main (argc=4, argv=0x7ffd2a91dbe8) at .../rr/src/main.cc:271
...
(gdb) stepi
pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
24        mutex->__data.__kind &= ~PTHREAD_MUTEX_PRIO_INHERIT_NP;
1: x/i $pc
=> 0x7f818ad08660 <pthread_mutex_lock>: mov    0x4ba9(%rip),%rax        # 0x7f818ad0d210
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) stepi
74        return __pthread_mutex_lock(mutex);
1: x/i $pc
=> 0x7f818ad08678 <pthread_mutex_lock+24>:      jmp    0x7f818ad03070
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:74
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) print environ
$1 = (char **) 0x0

$ cat /proc/1305400/environ | tr '\0' '\n' | grep LD_PRELOAD
LD_PRELOAD=libasan.so.6:.../x86_64_asan/obj/bin/../lib/rr/librrpreload.so:::
bernhardu added a commit to bernhardu/rr that referenced this issue Jun 25, 2021
These two tests are stuck:
  nested_detach-no-syscallbuf
  nested_detach-32-no-syscallbuf

For some reason just the no-syscallbuf variants get stuck,
the regular ones succeed.

$ bin/rr record -n bin/rr record --nested=detach bin/simple

PID 1305400 uses 100% CPU, until timeout kills all.

1305387    └─ bin/rr record -n bin/rr record --nested=detach bin/simple  12
1305399       ├─ bin/rr record --nested=detach bin/simple   1
1305400       └─ bin/rr record --nested=detach bin/simple   1

$ gdb -q --pid 1305400
(gdb) add-symbol-file /usr/lib/x86_64-linux-gnu/libasan.so.6.0.0 0x7f818ad37a60
(gdb) add-symbol-file .../x86_64_asan/obj/bin/rr 0x564caa04e830
(gdb) add-symbol-file .../x86_64_asan/obj/lib/rr/librrpreload.so 0x7f818ad030d0
(gdb) stepi
0x00007f818ad50339      4150      int res = REAL(pthread_mutex_lock)(m);
1: x/i $pc
=> 0x7f818ad50339 <__interceptor_pthread_mutex_lock(void*)+25>: jmp    *0xf4311(%rip)        # 0x7f818ae44650
(gdb) bt
#0  0x00007f818ad50339 in __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4150
rr-debugger#1  __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4146
rr-debugger#2  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#3  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
rr-debugger#4  0x0000564caa2b9ed2 in rr::RecordSession::RecordSession (this=0x619000003280, exe_path=..., argv=..., envp=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30, bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false) at .../rr/src/RecordSession.cc:2296
rr-debugger#5  0x0000564caa2b92ea in rr::RecordSession::create (argv=..., extra_env=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30 '\036', bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false, force_asan_active=false) at .../rr/src/RecordSession.cc:2261
rr-debugger#6  0x0000564caa29cd01 in rr::record (args=..., flags=...) at .../rr/src/RecordCommand.cc:632
rr-debugger#7  0x0000564caa29e90f in rr::RecordCommand::run (this=0x564caa956fe0, args=...) at .../rr/src/RecordCommand.cc:791
rr-debugger#8  0x0000564caa614909 in main (argc=4, argv=0x7ffd2a91dbe8) at .../rr/src/main.cc:271
...
(gdb) stepi
pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
24        mutex->__data.__kind &= ~PTHREAD_MUTEX_PRIO_INHERIT_NP;
1: x/i $pc
=> 0x7f818ad08660 <pthread_mutex_lock>: mov    0x4ba9(%rip),%rax        # 0x7f818ad0d210
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) stepi
74        return __pthread_mutex_lock(mutex);
1: x/i $pc
=> 0x7f818ad08678 <pthread_mutex_lock+24>:      jmp    0x7f818ad03070
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:74
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) print environ
$1 = (char **) 0x0

$ cat /proc/1305400/environ | tr '\0' '\n' | grep LD_PRELOAD
LD_PRELOAD=libasan.so.6:.../x86_64_asan/obj/bin/../lib/rr/librrpreload.so:::
bernhardu added a commit to bernhardu/rr that referenced this issue Jul 5, 2021
These two tests are stuck:
  nested_detach-no-syscallbuf
  nested_detach-32-no-syscallbuf

For some reason just the no-syscallbuf variants get stuck,
the regular ones succeed.

$ bin/rr record -n bin/rr record --nested=detach bin/simple

PID 1305400 uses 100% CPU, until timeout kills all.

1305387    └─ bin/rr record -n bin/rr record --nested=detach bin/simple  12
1305399       ├─ bin/rr record --nested=detach bin/simple   1
1305400       └─ bin/rr record --nested=detach bin/simple   1

$ gdb -q --pid 1305400
(gdb) add-symbol-file /usr/lib/x86_64-linux-gnu/libasan.so.6.0.0 0x7f818ad37a60
(gdb) add-symbol-file .../x86_64_asan/obj/bin/rr 0x564caa04e830
(gdb) add-symbol-file .../x86_64_asan/obj/lib/rr/librrpreload.so 0x7f818ad030d0
(gdb) stepi
0x00007f818ad50339      4150      int res = REAL(pthread_mutex_lock)(m);
1: x/i $pc
=> 0x7f818ad50339 <__interceptor_pthread_mutex_lock(void*)+25>: jmp    *0xf4311(%rip)        # 0x7f818ae44650
(gdb) bt
#0  0x00007f818ad50339 in __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4150
rr-debugger#1  __interceptor_pthread_mutex_lock (m=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4146
rr-debugger#2  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#3  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
rr-debugger#4  0x0000564caa2b9ed2 in rr::RecordSession::RecordSession (this=0x619000003280, exe_path=..., argv=..., envp=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30, bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false) at .../rr/src/RecordSession.cc:2296
rr-debugger#5  0x0000564caa2b92ea in rr::RecordSession::create (argv=..., extra_env=..., disable_cpuid_features=..., syscallbuf=rr::RecordSession::ENABLE_SYSCALL_BUF, syscallbuf_desched_sig=30 '\036', bind_cpu=rr::BIND_CPU, output_trace_dir=..., trace_id=0x0, use_audit=false, unmap_vdso=false, force_asan_active=false) at .../rr/src/RecordSession.cc:2261
rr-debugger#6  0x0000564caa29cd01 in rr::record (args=..., flags=...) at .../rr/src/RecordCommand.cc:632
rr-debugger#7  0x0000564caa29e90f in rr::RecordCommand::run (this=0x564caa956fe0, args=...) at .../rr/src/RecordCommand.cc:791
rr-debugger#8  0x0000564caa614909 in main (argc=4, argv=0x7ffd2a91dbe8) at .../rr/src/main.cc:271
...
(gdb) stepi
pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
24        mutex->__data.__kind &= ~PTHREAD_MUTEX_PRIO_INHERIT_NP;
1: x/i $pc
=> 0x7f818ad08660 <pthread_mutex_lock>: mov    0x4ba9(%rip),%rax        # 0x7f818ad0d210
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:24
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) stepi
74        return __pthread_mutex_lock(mutex);
1: x/i $pc
=> 0x7f818ad08678 <pthread_mutex_lock+24>:      jmp    0x7f818ad03070
(gdb) bt
#0  pthread_mutex_lock (mutex=0x6110000002c8) at .../rr/src/preload/overrides.c:74
rr-debugger#1  0x0000564caa0e2ddc in rr::CompressedWriter::CompressedWriter (this=0x6110000002c0, filename=..., block_size=1048576, num_threads=1) at .../rr/src/CompressedWriter.cc:62
rr-debugger#2  0x0000564caa5a9453 in rr::TraceWriter::TraceWriter (this=0x619000003380, file_name=..., output_trace_dir=..., ticks_semantics_=rr::TICKS_RETIRED_CONDITIONAL_BRANCHES) at .../rr/src/TraceStream.cc:1307
...
(gdb) print environ
$1 = (char **) 0x0

$ cat /proc/1305400/environ | tr '\0' '\n' | grep LD_PRELOAD
LD_PRELOAD=libasan.so.6:.../x86_64_asan/obj/bin/../lib/rr/librrpreload.so:::
bernhardu added a commit to bernhardu/rr that referenced this issue Apr 7, 2023
…aks=0.

==2156196==Running thread 2156188 was not suspended. False leaks are possible.

=================================================================
==2156193==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 10525163 byte(s) in 11 object(s) allocated from:
    #0 0x7f41d28b94c8 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:95
    rr-debugger#1 0x55c53c6587a4 in std::__new_allocator<unsigned char>::allocate(unsigned long, void const*) /usr/include/c++/12/bits/new_allocator.h:137
    rr-debugger#2 0x55c53c6587a4 in std::allocator_traits<std::allocator<unsigned char> >::allocate(std::allocator<unsigned char>&, unsigned long) /usr/include/c++/12/bits/alloc_traits.h:464
    rr-debugger#3 0x55c53c6587a4 in std::_Vector_base<unsigned char, std::allocator<unsigned char> >::_M_allocate(unsigned long) /usr/include/c++/12/bits/stl_vector.h:378
    rr-debugger#4 0x55c53c6587a4 in std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned long) /usr/include/c++/12/bits/vector.tcc:650
    rr-debugger#5 0x55c53c69370b in std::vector<unsigned char, std::allocator<unsigned char> >::resize(unsigned long) /usr/include/c++/12/bits/stl_vector.h:1011
    rr-debugger#6 0x55c53c69370b in rr::CompressedWriter::compression_thread() /home/bernhard/data/entwicklung/2023/rr/2022-09-11/rr/src/CompressedWriter.cc:162
    rr-debugger#7 0x55c53c693908 in rr::CompressedWriter::compression_thread_callback(void*) /home/bernhard/data/entwicklung/2023/rr/2022-09-11/rr/src/CompressedWriter.cc:29
    rr-debugger#8 0x7f41d22a7fd3 in start_thread nptl/pthread_create.c:442

SUMMARY: AddressSanitizer: 10525163 byte(s) leaked in 11 allocation(s).
bernhardu added a commit to bernhardu/rr that referenced this issue Apr 20, 2023
Related: 7ffb7d6
See issue 3504.

$ bin/rr record -n bin/pid_ns_kill_child

Error message:
  [FATAL src/Task.cc:2167:did_waitpid() errno: EDOM]
   (task 285229 (rec:285229) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7fffb7babb70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056012f0b167d in rr::Task::did_waitpid (this=this@entry=0x56012f9d4640, status=...) at rr/src/Task.cc:2167
rr-debugger#8  0x000056012f086ac7 in rr::Scheduler::reschedule (this=this@entry=0x56012f9c4188, switchable=<optimized out>) at rr/src/Scheduler.cc:882
rr-debugger#9  0x000056012eff4607 in rr::RecordSession::record_step (this=0x56012f9c3f40) at rr/src/RecordSession.cc:2497
rr-debugger#10 0x000056012efe8664 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56012f207a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056012ef4800f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278

With change in Task.cc:
Error message:
  [FATAL src/Task.cc:3894:did_handle_ptrace_exit_event()]
   (task 308641 (rec:308641) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7ffe5e4cdf70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056265d0981b7 in rr::Task::did_handle_ptrace_exit_event (this=this@entry=0x56265dde3640) at rr/src/Task.cc:3894
rr-debugger#8  0x000056265cfdb110 in rr::handle_ptrace_exit_event (t=t@entry=0x56265dde3640) at rr/src/RecordSession.cc:279
rr-debugger#9  0x000056265cfe272b in rr::RecordSession::record_step (this=0x56265ddd2f40) at rr/src/RecordSession.cc:2535
rr-debugger#10 0x000056265cfd6674 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56265d1f5a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056265cf3601f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278
bernhardu added a commit to bernhardu/rr that referenced this issue Apr 21, 2023
Related: 7ffb7d6
See issue 3504.

$ bin/rr record -n bin/pid_ns_kill_child

Error message:
  [FATAL src/Task.cc:2167:did_waitpid() errno: EDOM]
   (task 285229 (rec:285229) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7fffb7babb70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056012f0b167d in rr::Task::did_waitpid (this=this@entry=0x56012f9d4640, status=...) at rr/src/Task.cc:2167
rr-debugger#8  0x000056012f086ac7 in rr::Scheduler::reschedule (this=this@entry=0x56012f9c4188, switchable=<optimized out>) at rr/src/Scheduler.cc:882
rr-debugger#9  0x000056012eff4607 in rr::RecordSession::record_step (this=0x56012f9c3f40) at rr/src/RecordSession.cc:2497
rr-debugger#10 0x000056012efe8664 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56012f207a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056012ef4800f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278

With change in Task.cc:
Error message:
  [FATAL src/Task.cc:3894:did_handle_ptrace_exit_event()]
   (task 308641 (rec:308641) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7ffe5e4cdf70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056265d0981b7 in rr::Task::did_handle_ptrace_exit_event (this=this@entry=0x56265dde3640) at rr/src/Task.cc:3894
rr-debugger#8  0x000056265cfdb110 in rr::handle_ptrace_exit_event (t=t@entry=0x56265dde3640) at rr/src/RecordSession.cc:279
rr-debugger#9  0x000056265cfe272b in rr::RecordSession::record_step (this=0x56265ddd2f40) at rr/src/RecordSession.cc:2535
rr-debugger#10 0x000056265cfd6674 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56265d1f5a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056265cf3601f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278
bernhardu added a commit to bernhardu/rr that referenced this issue Apr 21, 2023
Related: 7ffb7d6
See issue 3504.

$ bin/rr record -n bin/pid_ns_kill_child

Error message:
  [FATAL src/Task.cc:2167:did_waitpid() errno: EDOM]
   (task 285229 (rec:285229) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7fffb7babb70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056012f0b167d in rr::Task::did_waitpid (this=this@entry=0x56012f9d4640, status=...) at rr/src/Task.cc:2167
rr-debugger#8  0x000056012f086ac7 in rr::Scheduler::reschedule (this=this@entry=0x56012f9c4188, switchable=<optimized out>) at rr/src/Scheduler.cc:882
rr-debugger#9  0x000056012eff4607 in rr::RecordSession::record_step (this=0x56012f9c3f40) at rr/src/RecordSession.cc:2497
rr-debugger#10 0x000056012efe8664 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56012f207a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056012ef4800f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278

With change in Task.cc:
Error message:
  [FATAL src/Task.cc:3894:did_handle_ptrace_exit_event()]
   (task 308641 (rec:308641) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7ffe5e4cdf70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056265d0981b7 in rr::Task::did_handle_ptrace_exit_event (this=this@entry=0x56265dde3640) at rr/src/Task.cc:3894
rr-debugger#8  0x000056265cfdb110 in rr::handle_ptrace_exit_event (t=t@entry=0x56265dde3640) at rr/src/RecordSession.cc:279
rr-debugger#9  0x000056265cfe272b in rr::RecordSession::record_step (this=0x56265ddd2f40) at rr/src/RecordSession.cc:2535
rr-debugger#10 0x000056265cfd6674 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56265d1f5a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056265cf3601f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278
bernhardu added a commit to bernhardu/rr that referenced this issue Apr 22, 2023
Related: 7ffb7d6
See issue 3504.

$ bin/rr record -n bin/pid_ns_kill_child

Error message:
  [FATAL src/Task.cc:2167:did_waitpid() errno: EDOM]
   (task 285229 (rec:285229) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7fffb7babb70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056012f0b167d in rr::Task::did_waitpid (this=this@entry=0x56012f9d4640, status=...) at rr/src/Task.cc:2167
rr-debugger#8  0x000056012f086ac7 in rr::Scheduler::reschedule (this=this@entry=0x56012f9c4188, switchable=<optimized out>) at rr/src/Scheduler.cc:882
rr-debugger#9  0x000056012eff4607 in rr::RecordSession::record_step (this=0x56012f9c3f40) at rr/src/RecordSession.cc:2497
rr-debugger#10 0x000056012efe8664 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56012f207a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056012ef4800f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278

With change in Task.cc:
Error message:
  [FATAL src/Task.cc:3894:did_handle_ptrace_exit_event()]
   (task 308641 (rec:308641) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7ffe5e4cdf70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056265d0981b7 in rr::Task::did_handle_ptrace_exit_event (this=this@entry=0x56265dde3640) at rr/src/Task.cc:3894
rr-debugger#8  0x000056265cfdb110 in rr::handle_ptrace_exit_event (t=t@entry=0x56265dde3640) at rr/src/RecordSession.cc:279
rr-debugger#9  0x000056265cfe272b in rr::RecordSession::record_step (this=0x56265ddd2f40) at rr/src/RecordSession.cc:2535
rr-debugger#10 0x000056265cfd6674 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56265d1f5a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056265cf3601f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278
bernhardu added a commit to bernhardu/rr that referenced this issue Apr 25, 2023
Related: 7ffb7d6
See issue 3504.

$ bin/rr record -n bin/pid_ns_kill_child

Error message:
  [FATAL src/Task.cc:2167:did_waitpid() errno: EDOM]
   (task 285229 (rec:285229) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7fffb7babb70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056012f0b167d in rr::Task::did_waitpid (this=this@entry=0x56012f9d4640, status=...) at rr/src/Task.cc:2167
rr-debugger#8  0x000056012f086ac7 in rr::Scheduler::reschedule (this=this@entry=0x56012f9c4188, switchable=<optimized out>) at rr/src/Scheduler.cc:882
rr-debugger#9  0x000056012eff4607 in rr::RecordSession::record_step (this=0x56012f9c3f40) at rr/src/RecordSession.cc:2497
rr-debugger#10 0x000056012efe8664 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56012f207a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056012ef4800f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278

With change in Task.cc:
Error message:
  [FATAL src/Task.cc:3894:did_handle_ptrace_exit_event()]
   (task 308641 (rec:308641) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7ffe5e4cdf70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056265d0981b7 in rr::Task::did_handle_ptrace_exit_event (this=this@entry=0x56265dde3640) at rr/src/Task.cc:3894
rr-debugger#8  0x000056265cfdb110 in rr::handle_ptrace_exit_event (t=t@entry=0x56265dde3640) at rr/src/RecordSession.cc:279
rr-debugger#9  0x000056265cfe272b in rr::RecordSession::record_step (this=0x56265ddd2f40) at rr/src/RecordSession.cc:2535
rr-debugger#10 0x000056265cfd6674 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56265d1f5a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056265cf3601f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278
bernhardu added a commit to bernhardu/rr that referenced this issue Jul 10, 2023
Related: 7ffb7d6
See issue 3504.

$ bin/rr record -n bin/pid_ns_kill_child

Error message:
  [FATAL src/Task.cc:2167:did_waitpid() errno: EDOM]
   (task 285229 (rec:285229) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7fffb7babb70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056012f0b167d in rr::Task::did_waitpid (this=this@entry=0x56012f9d4640, status=...) at rr/src/Task.cc:2167
rr-debugger#8  0x000056012f086ac7 in rr::Scheduler::reschedule (this=this@entry=0x56012f9c4188, switchable=<optimized out>) at rr/src/Scheduler.cc:882
rr-debugger#9  0x000056012eff4607 in rr::RecordSession::record_step (this=0x56012f9c3f40) at rr/src/RecordSession.cc:2497
rr-debugger#10 0x000056012efe8664 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56012f207a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056012ef4800f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278

With change in Task.cc:
Error message:
  [FATAL src/Task.cc:3894:did_handle_ptrace_exit_event()]
   (task 308641 (rec:308641) at time 269)
   -> Assertion `!handled_ptrace_exit_event' failed to hold.

rr-debugger#6  rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=this@entry=0x7ffe5e4cdf70, __in_chrg=<optimized out>) at rr/src/log.cc:484
rr-debugger#7  0x000056265d0981b7 in rr::Task::did_handle_ptrace_exit_event (this=this@entry=0x56265dde3640) at rr/src/Task.cc:3894
rr-debugger#8  0x000056265cfdb110 in rr::handle_ptrace_exit_event (t=t@entry=0x56265dde3640) at rr/src/RecordSession.cc:279
rr-debugger#9  0x000056265cfe272b in rr::RecordSession::record_step (this=0x56265ddd2f40) at rr/src/RecordSession.cc:2535
rr-debugger#10 0x000056265cfd6674 in rr::record (flags=..., args=...) at rr/src/RecordCommand.cc:697
rr-debugger#11 rr::RecordCommand::run (this=0x56265d1f5a40 <rr::RecordCommand::singleton>, args=...) at rr/src/RecordCommand.cc:860
rr-debugger#12 0x000056265cf3601f in main (argc=<optimized out>, argv=<optimized out>) at rr/src/main.cc:278
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants