-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: misc/cgo/test.TestSetgid panic with unexpected return pc for runtime.sigpanic
on linux-arm64-packet since 2022-06-04
#53374
Comments
(attn @golang/runtime) Marking as release-blocker for Go 1.19 because this appears to be a regression on |
It looks like what happened here is that the stack frame was somehow messed up and we ended up returning to a stack address instead of a valid PC. This caused a SIGSEGV, and while trying to print a traceback we encounted the same messed up frame and reported the "unexpected return pc". I don't think it was actually sigpanic that made the bad return. I think that is just the frame we injected after the fault occurred. |
I agree with @prattmic 's assessment. Picking apart 2022-06-04T16:11:54-fc66cae/linux-arm64-packet in detail: 0x0000ffff537fd830: 0x0000aaaabb49ac24 <runtime.sigpanic+0x00000000000001e4> Saved LR from sigpanic => throw("unexpected signal during runtime execution") minit frame: mstart1 frame: mstart0 frame: So the mstart0 => mstart1 => minit chain makes sense, but I lose the thread. It looks like we may have called minitSignals, but things aren't quite where I would expect for that. |
Based on the fact that this is happening in TestSetgid, I wonder if the a SIGSETXID came in as a new thread was starting up and we received it the moment |
I was thinking on that, too, but I also couldn't think of how it could happen. We set up the signal stack before setting the signal mask. So a signal shouldn't clobber the g0 stack. |
It's extra weird because this is a cgo binary, so we should just be using libc's setxid signal handler for all of this. I wonder if the integration with libc on this is messed up on early thread init? |
Good point. We start a thread with signals blocked https://cs.opensource.google/go/go/+/master:src/runtime/cgo/gcc_linux_arm64.c;l=28 . So we can get a signal early on before the signal stack is set?... |
@cherrymui You're right. That seems entirely possible. Argh. |
Oh no! That does seem plausible. I'm surprised the two failure stacks are so identical, but maybe that just reflects when we made some syscall and the kernel checked the pending signals. We're running on the g0 stack at this point, so it should be okay to take the signal, but maybe something bad happens in our signal path before or after we hand off to libc? |
Change https://go.dev/cl/412474 mentions this issue: |
Cherry, could we test your hypothesis about the signal frame clobbering the stack by doing a self- |
The CL above makes it not rely on the LR saved below the SP (even temporarily). Maybe that will fix it. Or, maybe we can call the actual syscall to block all the signals for thread creation? We'll restore the old signal mask anyway? That may race with a SIGSETXID signal delivering to the parent thread, though, so it may miss the signal and deadlock? |
@aclements Maybe. If it is really what the CL above is trying to fix, the signal has to land between the instruction storing the LR and the instruction decrementing the SP, so perhaps it has to be written ... in assembly (or have the assembler insert it)? |
Oh good point. Maybe a stress test where we constantly create new threads and bombard the process with SIGSETXID? |
Not sure this is the best way to stress this, but running this on thelinux-arm64-packet builder it has a not-too-small rate to fail with seg fault. Plain seg fault with no extra output, not the unwinder error like above, though.
With the CL above, no failure in 1000 run. I'll see if I can make it more likely to fail. |
If I add this to the runtime
The program above fails 100%. I think this is consistent with the theory above. |
I found that this doesn't fail with Go 1.18, even with the runtime hack above. I think this is due to https://go-review.googlesource.com/c/go/+/379075 changes the prologue. Previously, for small frames it is a single So another possibility is to undo CL 379075 for small frames. It is two instructions either way. (Previously |
We chatted about this and the plan is to return to the old prologue for small frames as a stop-gap for 1.19, confirm by hand that this generates signal-safe prologues for the handful of functions that run before we have a signal stack, and look into enhancing the traceback metadata for 1.20 to capture ranges where the frame is non-empty but the LR should be retrieved from the LR register instead of the stack. |
Change https://go.dev/cl/413428 mentions this issue: |
Following CL 412474, for the rest of the LR architectures. On MIPS(32/64), S390X, and RISCV, there is no single instruction that saves the LR and decrements the SP, so we need to insert an instruction to save the LR after decrementing the SP. On ARM(32) and PPC64 we already use a single instruction to save the LR and decrement the SP. Updates #53374. Change-Id: I5a2e211026d95edb0e0f7d084ddb784f8077b86d Reviewed-on: https://go-review.googlesource.com/c/go/+/413428 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
Change https://go.dev/cl/416154 mentions this issue: |
Following CL 412474, for the rest of the LR architectures. On MIPS(32/64), S390X, and RISCV, there is no single instruction that saves the LR and decrements the SP, so we need to insert an instruction to save the LR after decrementing the SP. On ARM(32) and PPC64 we already use a single instruction to save the LR and decrement the SP. Updates golang#53374. Change-Id: I5a2e211026d95edb0e0f7d084ddb784f8077b86d Reviewed-on: https://go-review.googlesource.com/c/go/+/413428 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>
…rames When we create a thread with signals blocked. But glibc's pthread_sigmask doesn't really allow us to block SIGSETXID. So we may get a signal early on before the signal stack is set. If we get a signal on the current stack, it will clobber anything below the SP. This CL makes it to save LR and decrement SP in a single MOVD.W instruction for small frames, so we don't write below the SP. We used to use a single MOVD.W instruction before CL 379075. CL 379075 changed to use an STP instruction to save the LR and FP, then decrementing the SP. This CL changes it back, just this part (epilogues and large frame prologues are unchanged). For small frames, it is the same number of instructions either way. This decreases the size of a "small" frame from 0x1f0 to 0xf0. For frame sizes in between, it could benefit from using an STP instruction instead of using the prologue for the "large" frame case. We don't bother it for now as this is a stop-gap solution anyway. This only addresses the issue with small frames. Luckily, all functions from thread entry to setting up the signal stack have samll frames. Other possible ideas: - Expand the unwind info metadata, separate SP delta and the location of the return address, so we can express "SP is decremented but the return address is in the LR register". Then we can always create the frame first then write the LR, without writing anything below the SP (except the frame pointer at SP-8, which is minor because it doesn't really affect program execution). - Set up the signal stack immediately in mstart in assembly. For Go 1.19 we do this simple fix. We plan to do the metadata fix in Go 1.20 ( golang#53609 ). Other LR architectures are addressed in CL 413428. Fix golang#53374. Change-Id: I9d6582ab14ccb06ac61ad43852943d9555e22ae5 Reviewed-on: https://go-review.googlesource.com/c/go/+/412474 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Eric Fang <eric.fang@arm.com>
Refer to CL 413428 and 412474, for loong64, like mips, s390x and riscv, there is no single instruction that saves the LR and decrements the SP, so we also need to insert an instruction to save the LR after decrementing the SP. Fixes golang#56623. Updates golang#53374. Change-Id: I3de040792f0a041d3d2a98ea89c23a2dd2f4ad10
Refer to CL 413428 and 412474, for loong64, like mips, s390x and riscv, there is no single instruction that saves the LR and decrements the SP, so we also need to insert an instruction to save the LR after decrementing the SP. Fixes #56623. Updates #53374. Change-Id: I3de040792f0a041d3d2a98ea89c23a2dd2f4ad10 Reviewed-on: https://go-review.googlesource.com/c/go/+/416154 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: xiaodong liu <teaofmoli@gmail.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn>
Refer to CL 413428 and 412474, for loong64, like mips, s390x and riscv, there is no single instruction that saves the LR and decrements the SP, so we also need to insert an instruction to save the LR after decrementing the SP. Fixes golang#56623. Updates golang#53374. Change-Id: I3de040792f0a041d3d2a98ea89c23a2dd2f4ad10 Reviewed-on: https://go-review.googlesource.com/c/go/+/416154 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: xiaodong liu <teaofmoli@gmail.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn>
Refer to CL 413428 and 412474, for loong64, like mips, s390x and riscv, there is no single instruction that saves the LR and decrements the SP, so we also need to insert an instruction to save the LR after decrementing the SP. Fixes golang#56623. Updates golang#53374. Change-Id: I3de040792f0a041d3d2a98ea89c23a2dd2f4ad10 Reviewed-on: https://go-review.googlesource.com/c/go/+/416154 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: xiaodong liu <teaofmoli@gmail.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> (cherry picked from commit 3ed8a1e)
greplogs -l -e '(?ms)unexpected return pc for runtime\.sigpanic.*exit status 2\nFAIL\s+misc/cgo/test'
2022-06-08T15:47:58-4afb0b9/linux-arm64-packet
2022-06-04T16:11:54-fc66cae/linux-arm64-packet
The text was updated successfully, but these errors were encountered: