-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recording Firefox with Gecko Profiler causes crash #1930
Comments
Try running with syscallbuf disabled |
For sanity we'd have to completely unwind all register values. We'd also want to handle changes to the register values. I think it's doable. @Keno, what worries you about it? |
I guess there would be some weirdness where according to the signal handler's stack we're doing a plain syscall but in the syscallbuf code we might be doing a different syscall or not in the kernel at all. But that wouldn't break anything that isn't already broken. |
I suppose that if a signal handler tries to manipulate registers to alter the syscall restart, it's going to be disappointed and angry, whereas today it just might work. |
It might be possible to exit from the syscallbuf code to a trampoline which calls the signal handler with the right register state, and after |
It would be a pain to implement though since I think it would have to complicate replay. |
Yes, I'm mostly worried about signal handlers that try to modify register values. There is also a concern about what happens if the signal handler longjmps out of there. Though I guess that doesn't work today either. |
If we did what I suggested above, longjmp-out would actually work. |
It could be kinda nice since it would mean we never have to worry about reentering the syscallbuf. No nested descheds, no worries about running user code on the syscallbuf alt-stack. However, it would be a large change to a fragile part of the system. |
Now that I know what's going on, I can't say this is a huge deal for me. It is rare that I'd want rr and the profiler running at the same time, and disabling the syscallbuf seems like a fine workaround. The main problem is that rr is just too good (too seamless) these days, so a discrepancy like this is unexpected and is therefore harder to diagnose and understand. I just wish it could be more obvious that something is up. At least for this case, relying on replay to detect this would be fine. If replay knew that it was feeding bogus values to a replayed signal handler, it could... uh, do something. Transmit a warning through the gdb connection or replace them with 0xdeadbeef and then produce a different error message if divergence was detected or... ok, maybe I just jumped the shark. |
Heh. I happen to be working on something that requires running with the profiler on, and of course, I had forgotten about this bug. Running rr record -n is fine, if but it would be nice if it could somehow tell me that they're incompatible. |
I've run into this problem again, but this time |
For https://bugzilla.mozilla.org/show_bug.cgi?id=1322559 I was trying to record a --disable-profiling build of Firefox with the (new) Gecko Profiler enabled ( https://raw.githubusercontent.com/mstange/Gecko-Profiler-Addon/master/gecko_profiler.xpi ). I was seeing a crash in GeckoSampler::doNativeBacktrace, which is actually what I wanted to see and debug, but it appears that it is behaving differently when rr is recording so it isn't the crash I was looking for. (To be clear, this is not a problem of divergence between record and replay; this is the recording affecting the initial run.)
What appears to be happening is that when a SIGPROF signal handler gets invoked, the stack pointer stored in its context argument is the stack pointer for rr's syscall hooking code, which is in a completely different stack from the actual executing program:
doNativeBacktrace grabs a chunk of the stack to memcpy, and ends up biting off more than it can chew -- I mean, access. I guess what I'd like it to do is give the signal handler the register state as of the "call" to _syscall_hook_trampoline?
The text was updated successfully, but these errors were encountered: