-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: slowdown on arm64 in GC while scanning stacks #37779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The enclosed case derives from 'BenchmarkStartStop', which experiences lengthy running time on some arm64 machines randomly (much better after a few changes for the new timers were checked into 1.14, but we still run into the problem sometimes). There are several other benchmarks having a similar issue, still under investigation. Initial analysis: When the main goroutine is async-preempted by a GC worker to suspend to have its stack scanned, it might be in the middle of 'addInitializedTimer' which is holding M's lock and cleaning deleted timers, seems that the machine will be heavily occupied by handling the preemption requests that will not succeed until the cleaning up task finishes, literally put both goroutines to unnecessary CPU burns.
As an experimental, increasing the 'yieldDelay' in suspendG appears to address the issue, 'yieldDelay' with 20 * 1000 reduced the worst number of the enclosed testcase on an arm64 server from ~40s to 1 second or so, further evaluation will be made later. |
When you say "holding M's lock", which lock do you mean? |
It's the 'locks' of 'm', Here are the stacks of doSigPreempt when the signal landing on target G or its g0. #0 runtime.doSigPreempt (gp=0x4000000180, ctxt=0x400050ac38) at /home/xiaji01/src/goan/src/runtime/signal_unix.go:329 (gdb) bt |
Thanks. Perhaps the problem you are seeing is that this program spends much of its time with the M holding a lock. That seems unlikely to be true of real programs. Which isn't to say that we shouldn't try to fix this if possible. CC @cherrymui |
I cannot reproduce it on my AMD64 machine, or the linux-arm64 builder. From what you and Ian described, this seems a problem that could happen on any platform, not just ARM64. Is there anything special on your machine (like, a large number of CPU cores)? |
Yes, it happened on several arm64 servers with many cores (30+) and was never hit on amd64. Various random benchmarking and testing timeout issues were observed, among of them is analyzed BenchmarkStartStop, others are still under investigation. I am inclined to view they share the same cause, that is too many preemptions from GC, in runtime.suspendG, keep a thread in locked state busy with signal handling. On amd64 machines the target goroutines usually reache a safe point after dozens of preemption attempts, but on some ARM64 machines they might not succeed until hundred millions of attempts. Those time-out issues are still there, less frequently, with 1.14, the newest issue is testing 'reflect' package with "internal linking of -buildmode=pie" on an A72 machine may cost more than 60 seconds, triggering a time-out failure, initial analysis shows it's caused by the preemption in suspending a goroutine for stack scanning as well. I wonder if the preemption pace shall adapt to the target platform. |
It is entirely possible that #37741 is related. |
Yes, they share the same cause, thanks for linking the two issues. |
I don't think it's quite the same cause. I think that #37741 is about code that the compiler has marked unpreemptible. I think this issue is about code that is unpreemptible because it holds an M lock. |
Change https://golang.org/cl/223122 mentions this issue: |
@shawn-xdji Can you see if https://golang.org/cl/223122 fixes the problem that you are seeing? Thanks. |
Thanks @ianlancetaylor , the fix works, I'm not able to check all types of machines we have for now, but pretty sure the fix should work on them. |
Thanks for testing it. I will commit this for the future 1.15 release. My next question is whether this arises in real code. That is, is there a good reason to backport this for a 1.14 patch release? That is less interesting if it only appears in artificial benchmarks. Thanks. |
Hi @ianlancetaylor So far it's not been observed in real code, only in a few micro benchmarks. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, reproducible with go1.14
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
// main.go
package main
import (
"time"
)
func sst(n int) {
timers := make([]*time.Timer, n)
for i := 0; i < n; i++ {
timers[i] = time.AfterFunc(time.Hour, nil)
}
}
func SST() {
sst(2<<15)
sst(2<<18)
}
func main() {
SST()
}
$ go build ./main.go
$ time -p ./main // run a few times
The execution time should be a few hundreds of milliseconds, while sometimes it may take dozens of, or 100 plus seconds to finish.
good:
real 0.42
user 0.64
sys 0.73
bad:
real 38.11
user 28.41
sys 48.71
What did you expect to see?
The program shall finish quickly.
What did you see instead?
Considerably long latency (in GC).
The text was updated successfully, but these errors were encountered: