-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: "fatal error: unknown caller pc" crash on darwin/arm #16760
Comments
/cc @aclements @minux @crawshaw (as requested) |
/cc @ianlancetaylor This may look like a GC problem, but the GC's just tripping over a bad call stack surrounding a cgo call. @dcu, it looks like the traceback got cut off. I would expect at least one more line. Is that the whole traceback? What version of github.com/boltdb/bolt do you have vendored? At master, db.go:968 is a blank line between functions, and bolt.(*meta).sum64 doesn't go anywhere near cgo code. |
@aclements thanks, I updated the description including the whole stack trace. I'm using the master (94c8db596809690a3f7046fa83c7b0dda13a3222) |
Thanks! I bet this is happening at the bottom of goroutine 17's stack where gomobile bind's wrapper did a cgo callback into |
I'm trying to find the correct boltdb version we were using at that point since looks like we updated it recently. I'll report back soon. |
This is the boltdb version we were using when we captured that trace: https://github.com/boltdb/bolt/tree/b514920f8f2e0a68f857e5a12c774f385a59aef9 |
#16772 is related with boltdb but I'm not sure if it's related to this one |
We updated BoltDB to the latest one and we are still seeing the crash randomly. Unfortunately we weren't using the SDK in debug mode so we couldn't get the stack trace. I'll update again if we get the trace. |
@dcu is this a regression from Go 1.6? If so, we should consider applying its eventual fix to a 1.7 point release. |
@dcu, can you say more about the call from Java code to coresdk.(*Device).Sync? In particular, do you know if the thread that made the call was started in Java/native code, or did a goroutine call into Java/native code and then that called back into Go code? runtime.cgocallback_gofunc does some very subtle things involving stack switches and overlapping stack frames. It's possible we're getting that wrong in some cases on darwin/arm(64). |
@eliasnaur I think it's a regression but I'm not sure because our project doesn't work on 1.6 because of #12896 @aclements yeah, Device.Sync is called from a GCD thread. goroutines don't interact with objective-c in this code. |
I'm going to mark this Go1.8 but if we find a fix that is deemed safe we can backport it to Go1.7.x |
@dcu, if I were to send you a runtime patch to report more debug info, would it be possible for you to run with it?
(Sorry, not sure why I said Java in my earlier reply. Must have been thinking Android.) I'm only familiar with GCD at a very high level. Could you give me a sense of how you're wiring up GCD and Go? A snippet of code would be great. I'm wondering if there's some particular bad interaction between GCD and Go. |
@aclements yeah sure, I'd more than happy to test. let queue = dispatch_queue_create("sync", nil)
dispatch_async(queue) {
self.gosdk!.sync()
} we also tested #16644 and it still crashes in our app but it doesn't in a simpler app. not sure why. |
@aclements I'm getting very frequent crash when the app is in background, sadly I can't see the backtrace because it doesn't really crash on XCode. I changed the page size in the compiler and it seems the crash is less frequent. Any idea how can I help debug this issue? |
continuing with the investigation I found out that if I remove a goroutine that's always running the app doesn't crash. The crash doesn't happen on Android even with the goroutine. |
@aclements does this tell you anything:
it happened after printing this when running with gctrace enabled:
|
Looks like I can reliably reproduce this with a test on this project. The affected test purposefully has a NPE in it to make sure panics return HTTP 500s correctly. The complete trace is:
Running Mac OS X - 10.10.5 (Darwin 14.5.0) go version go1.7.3 darwin/amd64 |
The compiled code for this function is bad. It overwrites its return address with 0. |
Standalone test case:
|
Kind of a strange sequence of errors. It does
First it stores the nil at 0(SP) for the call to f. |
(f = stringtoslicebyte in the example above) It seems bad to start writing args to the stack before we know if the call will happen. Maybe we need to evaluate all args first, at least to the point where we know whether they will panic or not. |
CL https://golang.org/cl/32551 mentions this issue. |
when compiling f(a, b, c), we do something like: *(SP+0) = eval(a) *(SP+8) = eval(b) *(SP+16) = eval(c) call f If one of those evaluations is later determined to unconditionally panic (say eval(b) in this example), then the call is deadcode eliminated. But any previous argument write (*(SP+0)=... here) is still around. Becuase we only compute the size of the outarg area for calls which are still around at the end of optimization, the space needed for *(SP+0)=v is not accounted for and thus the outarg area may be too small. The fix is to make sure that we evaluate any potentially panicing operation before we write any of the args to the stack. It turns out that fix is pretty easy, as we already have such a mechanism available for function args. We just need to extend it to possibly panicing args as well. The resulting code (if b and c can panic, but a can't) is: tmpb = eval(b) *(SP+16) = eval(c) *(SP+0) = eval(a) *(SP+8) = tmpb call f This change tickled a bug in how we find the arguments for intrinsic calls, so that latent bug is fixed up as well. Update #16760. Change-Id: I0bf5edf370220f82bc036cf2085ecc24f356d166 Reviewed-on: https://go-review.googlesource.com/32551 Reviewed-by: Cherry Zhang <cherryyz@google.com>
I've fixed the problem with the recent reproductions on this issue. The original problem remains undiagnosed. It has the same symptom as the fixed examples, but we are <50% confident that the underlying cause is related. |
@randall77 what's an easy way to apply your patch to my local go repo ? |
It was just submitted so the easiest would be to sync to tip. |
I'm still seeing a crash on both tip and 1.7 (stable) https://github.com/AdrianaPineda/FrameworkGOMobileSample and the code that causes the crash is this one: https://github.com/dcu/gomobile-sample/blob/master/sample.go#L34 basically, an http request. To reproduce it we let ran the iOS app in background for a few seconds (30 or more) and then we try to open the app and it crashes. |
I managed to reproduce the crash with Go tip and with Xcode 8 and an iPhone 5s running ios 10.0.2 with some fiddling. I managed to cut out the umbrella framework and use the Go-based Sample.framework directly from the sample app. However, I failed to get a crash log from the crash: I can't reproduce the crash while running in the Xcode debugger, and there are no log entries in Xcode > Window > Devices > View Device Logs. @dcu, how did you acquire a crash log? |
I failed to reproduce the runtime crash in the OP, but I did find a problem with non-Go main programs and SIGPIPE handling (sigh). Consider this program that mimics the behaviour of iOS closing sockets when suspending an app after 30 seconds in the background:
Running it with
as expected. If, however, I run the same Go program with a C main program as driver:
compile it with
the program exits because the SIGPIPE is neither handled by the C program nor the Go program. This behaviour is documented in
where "any other" refers to synchronous signals. Since SIGPIPE is not a synchronous signal, no handler is installed and the program crashes when receiving SIGPIPE from a broken connection. Even if documented, I believe this behavior is wrong. Go main programs stop SIGPIPEs from killing the program, and should do so even if linked in as c-archive or c-shared. At the very least when the SIGPIPE originates from a Go-created file descriptor. Paging Mr Signals, @ianlancetaylor (sorry) for ideas. |
A low impact, but non-portable, fix would be to use the SO_NOSIGPIPE socket option for darwin, as described in Although then the problem would persist on other platforms with similar SIGPIPE behaviour. |
@eliasnaur Could you open a separate issue for the sigpipe problem? Thanks. |
@eliasnaur @randall77 #17393 is probably the same issue |
CL https://golang.org/cl/32796 mentions this issue. |
Before this CL, Go programs in c-archive or c-shared buildmodes would not handle SIGPIPE. That leads to surprising behaviour where writes on a closed pipe or socket would raise SIGPIPE and terminate the program. This CL changes the Go runtime to handle SIGPIPE regardless of buildmode. In addition, SIGPIPE from non-Go code is forwarded. Fixes #17393 Updates #16760 Change-Id: I155e82020a03a5cdc627a147c27da395662c3fe8 Reviewed-on: https://go-review.googlesource.com/32796 Run-TryBot: Elias Naur <elias.naur@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
CL 32796 was reverted (see #18100 for details) so the SIGPIPE fix will not be in Go 1.8. An easy workaround is to call |
I'll close this one since I can't reproduce it anymore. |
Please answer these questions before submitting your issue. Thanks!
go version
)?and
go env
)?darwin/arm(64)
I'm not sure how to reproduce this, it's more likely to happen after sending the app to background and waiting few minutes and then resume it.
I don't have code to provide at the moment but I got a backtrace:
iOS app not crashing when resuming it
The application crashes randomly and it is hard to reproduce. It seems to be more frequent on armv7 than arm64
The text was updated successfully, but these errors were encountered: