-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: TestStackBarrierProfiling missed stack barrier #13362
Comments
I saw this on a builder too I think, which was it... ah, android/arm: http://build.golang.org/log/60c9e0384b64e90ad26b2235f044d172f84b2811 |
I've reproduced this several times now and it's clear that it's always a SIGPROF that came in during the systemstack(gcMark) in gcMarkTermination called from a mutator assist. Given where it found a barrier versus where it expects them, I'd say the stack was shrunk. However, we don't allow sigprof to walk the user stack when it's in _Gcopystack (and the traceback doesn't say that the user goroutine is in _Gcopystack), so we must have finished shrinking the stack, but not yet returned to the new stack. But in this case, we should have adjusted all of the barriers as well as sched.sp (which gentraceback uses to find the user stack), so I'm not sure how this inconsistency is arising. |
For the record, 10 failures of this form out of 173932 runs on my linux/amd64 workstation. |
Reproduced with somewhat more detailed stack barrier debugging. The @@@ in the stack barrier list indicates stkbarPos and the ==> indicates the mismatched stack barrier. The fact that we failed this far in to the stack barrier list suggests to me that we did a stack copy while in the middle of Stack barrier list:
Traceback:
|
I believe this is exactly what's going on. We prevent sigprof from tracing the user stack if it's currently being copied, but we don't interlock the other way around: if sigprof starts tracing the user stack first, it's possible for another thread to come along and shrink it while sigprof is still tracing. So, in fact bbd1a1c is only a partial fix to #12932. (/cc @DanielMorsing) It's possible this could be causing #12528 as well, but I haven't worked out an exact sequence of events for that yet. |
CL https://golang.org/cl/17192 mentions this issue. |
CL https://golang.org/cl/17191 mentions this issue. |
CL https://golang.org/cl/17194 mentions this issue. |
Commit bbd1a1c prevented SIGPROF from scanning stacks that were being copied, but it didn't prevent a stack copy (specifically a stack shrink) from happening while SIGPROF is scanning the stack. As a result, a stack copy may adjust stack barriers while SIGPROF is in the middle of scanning a stack, causing SIGPROF to panic when it detects an inconsistent stack barrier. Fix this by taking the stack barrier lock while adjusting the stack. In addition to preventing SIGPROF from scanning this stack, this will block until any in-progress SIGPROF is done scanning the stack. For 1.5.2. Fixes #13362. Updates #12932. Change-Id: I422219c363054410dfa56381f7b917e04690e5dd Reviewed-on: https://go-review.googlesource.com/17191 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-on: https://go-review.googlesource.com/17194
The addition of stack barrier locking to copystack subsumes the partial fix from commit bbd1a1c for SIGPROF during copystack. With the stack barrier locking, this commit simplifies the rule in sigprof to: the user stack can be traced only if sigprof can acquire the stack barrier lock. Updates #12932, #13362. Change-Id: I1c1f80015053d0ac7761e9e0c7437c2aba26663f Reviewed-on: https://go-review.googlesource.com/17192 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
Found while stress testing TestStackBarrierProfiling at 54bd5a7 on master linux/amd64:
The text was updated successfully, but these errors were encountered: