-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: executable compiled under Go 1.17.7 will occasionally wedge #52226
Comments
Does this reproduce with Go 1.18? It's hard to tell what's going on here. The second panic is from a signal handler presumably invoked to crash the program, and is caused by the signal handler apparently not being set up right (or something is corrupted). It would be nice to know what's going on in At first glance, I'm really not sure. More example crashes and more detail via |
The only signal handler we have in our code is on this line in the traceback, where it's blocking on waiting for a signal. It's possible there is another signal handler being installed by a third party package, but I can't see it in any of the tracebacks provided. I will enable |
But that said, doesn't the fact that we're getting to this state indicate a corruption (somehow) of the Go runtime? |
More information from one of our users:
|
I mean signal handler in the Unix sense of a bunch of code that gets injected onto an OS thread when a signal gets delivered, not something that handles incoming signals in a Go application. I do mean to say that data in the runtime is corrupted. The signal handler runs on the signal stack which I believe "looks" like a g0 stack to most checks. The |
The note about |
Probing our GitHub issues further, does hermit use the |
Will |
No, not Hermit itself (though I can't exclude some imported package I guess). |
FWIW I think I have also seen this behaviour on darwin/arm64 as well. |
It's used at runtime initialization, so it needs to be set from the outside. We have a few outstanding issues about making
Got it. Apparently #38824 could happen even if |
Good to know. That correlates more strongly with this being either an OS bug, or a bad interaction with new OS versions, or an old bug on our end that really wasn't truly fixed. It could be something new, but I'm not sure we've made any major changes to how macOS syscalls work, or how we call exec on macOS (other than to fix bugs). |
Also FWIW (and my recollection is a bit hazy at this point), but I think I only started seeing this after upgrading from Go 1.16 to Go 1.17 |
I also tried |
Further anecdata for the Darwin-specific hypothesis: we use |
Edit: Sorry, the wrong process's stack was dumped. |
Hmmm, are you sure it's the same issue? We are not waiting for child processes (in our Go code at least), we're calling |
That's on the main path only though, we do have some other calls to |
Actually I don't see any |
Confirmed that this is a stack trace from a wrapping Go executable (mockery), apologies. We'll try to get one from Hermit itself. |
Sorry for the earlier mistake in posting the wrong output. Full stack trace info from the correct process is here: deadlock-hermit-0.18.3-go-db7183ccf9.txt. This time, I was able to reproduce using a locally-built Go toolchain from commit db7183c (tip as of 2022-04-08) to build Hermit at v0.18.3 (through an internal wrapper). Interestingly, I was only able to reproduce when compiling Hermit with CGO_ENABLED=0. |
I was able to use |
Thanks for tracking that down. Why is your code calling In delve, when looking at a wedged program, what is the value of |
This deadlock was discovered in Hermit, which is a virtual environment for tools. It is invoked as a symlinked shim and execs the real command's binary. Though it was discovered in Hermit, I got the behavior to reproduce in a minimal program. Its source code is here: exec-deadlock.go. The one signal handler seems to be needed. To reproduce, compile it with I also have a core dump (produced by dlv) from an execution of this program that has locked up. I can't post it as a gist—it's 2.3 GiB—but I can answer questions about it. A stack trace from this core is here: exec-deadlock.stack.
From the core image:
|
Also, setting |
Thanks for the simpler recreator. Some notes on a deadlocked instance:
The program is creating a new M. The new M has been added to the The OK, I see one possible problem. The goroutine started by |
OK, sending a fix. Thanks. |
Change https://go.dev/cl/400315 mentions this issue: |
@gopherbot Please open backport issues This bug can cause rare cases of deadlock on macOS in programs that use |
Backport issue(s) opened: #52374 (for 1.17), #52375 (for 1.18). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
@ianlancetaylor Nice catch! I left a comment on your CL. Paraphrasing that comment here for visibility: Basically, I'm wondering if this is a more general problem. For instance, should Go programs still be able to work if every thread had |
The |
Change https://go.dev/cl/400317 mentions this issue: |
Change https://go.dev/cl/400318 mentions this issue: |
Thanks @ianlancetaylor, much appreciated. |
And thanks to @bwester for digging in and coming up with a reproducible test case. |
Revert "fix: create process group" This reverts commit 0f3b2f3. Revert "chore: fix linting" This reverts commit 1555512. Revert "fix: workaround for golang/go#52226" This reverts commit 0cd74ef.
… M's or ensureSigM No test because we already have a test in the syscall package. The issue reports 1 failure per 100,000 iterations, which is rare enough that our builders won't catch the problem. For #52226 Fixes #52374 Change-Id: I17633ff6cf676b6d575356186dce42cdacad0746 Reviewed-on: https://go-review.googlesource.com/c/go/+/400315 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> (cherry picked from commit e398266) Reviewed-on: https://go-review.googlesource.com/c/go/+/400317 Reviewed-by: Austin Clements <austin@google.com>
… M's or ensureSigM No test because we already have a test in the syscall package. The issue reports 1 failure per 100,000 iterations, which is rare enough that our builders won't catch the problem. For #52226 Fixes #52375 Change-Id: I17633ff6cf676b6d575356186dce42cdacad0746 Reviewed-on: https://go-review.googlesource.com/c/go/+/400315 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> (cherry picked from commit e398266) Reviewed-on: https://go-review.googlesource.com/c/go/+/400318 Reviewed-by: Austin Clements <austin@google.com>
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
I have not yet tried Go 1.17.8. Will do that.
What operating system and processor architecture are you using (
go env
)?The executable in question is running on darwin amd64, and is cross-compiled from linux/amd64 running on GitHub Actions.
What did you do?
Not repeatably unfortunately, but one of our Go tools will occasionally wedge. The executable is cross-compiled to darwin/amd64 from linux/amd64 in GitHub Actions.
What did you expect to see?
My application not wedging and/or an uncorrupted stack trace that I could use to debug why it is wedging.
What did you see instead?
The following stack trace produced with
CTRL-\
:The text was updated successfully, but these errors were encountered: