Skip to content

runtime: slowdown on arm64 in GC while scanning stacks #37779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shawndx opened this issue Mar 10, 2020 · 14 comments
Closed

runtime: slowdown on arm64 in GC while scanning stacks #37779

shawndx opened this issue Mar 10, 2020 · 14 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@shawndx
Copy link
Contributor

shawndx commented Mar 10, 2020

What version of Go are you using (go version)?

$ go version
go version devel +92d1fb7cb4 Tue Mar 10 00:47:23 2020 +0000 linux/arm64

Does this issue reproduce with the latest release?

Yes, reproducible with go1.14

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/xiaji01/.cache/go-build"
GOENV="/home/xiaji01/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xiaji01/.go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/xiaji01/util/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/xiaji01/util/go/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build683441384=/tmp/go-build -gno-record-gcc-switches"

What did you do?

// main.go
package main

import (
"time"
)

func sst(n int) {
timers := make([]*time.Timer, n)
for i := 0; i < n; i++ {
timers[i] = time.AfterFunc(time.Hour, nil)
}

    for i := 0; i < n; i++ {
            timers[i].Stop()
    }

}

func SST() {
sst(2<<15)
sst(2<<18)
}

func main() {
SST()
}

$ go build ./main.go
$ time -p ./main // run a few times

The execution time should be a few hundreds of milliseconds, while sometimes it may take dozens of, or 100 plus seconds to finish.

good:
real 0.42
user 0.64
sys 0.73

bad:
real 38.11
user 28.41
sys 48.71

What did you expect to see?

The program shall finish quickly.

What did you see instead?

Considerably long latency (in GC).

@shawndx
Copy link
Contributor Author

shawndx commented Mar 10, 2020

The enclosed case derives from 'BenchmarkStartStop', which experiences lengthy running time on some arm64 machines randomly (much better after a few changes for the new timers were checked into 1.14, but we still run into the problem sometimes). There are several other benchmarks having a similar issue, still under investigation.

Initial analysis:

When the main goroutine is async-preempted by a GC worker to suspend to have its stack scanned, it might be in the middle of 'addInitializedTimer' which is holding M's lock and cleaning deleted timers, seems that the machine will be heavily occupied by handling the preemption requests that will not succeed until the cleaning up task finishes, literally put both goroutines to unnecessary CPU burns.

	#gctrace output with additional logging
	
	suspendG:  0x4000302180  suspended  0x4000001200  with  1  attemps
	gc 3 @0.023s 2%: 0.058+2.3+0.17 ms clock, 13+1.1/11/2.0+38 ms cpu, 7->8->6 MB, 8 MB goal, 224 P
	markroot: _g_= 0x196580 , curg= 0x4000001080  to scan  0x4000000180 , gp->status= 2
	suspendG:  0x196580  preempted  0x4000000180  with  124353567  attemps
	markroot: suspending  0x4000000180  costs  22534438852  ns

As an experimental, increasing the 'yieldDelay' in suspendG appears to address the issue, 'yieldDelay' with 20 * 1000 reduced the worst number of the enclosed testcase on an arm64 server from ~40s to 1 second or so, further evaluation will be made later.

@shawndx shawndx changed the title runtime: slowdown on arm64 in GC while scanning stacks (and analysis) runtime: slowdown on arm64 in GC while scanning stacks Mar 10, 2020
@ianlancetaylor
Copy link
Member

When you say "holding M's lock", which lock do you mean?

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 10, 2020
@ianlancetaylor ianlancetaylor added this to the Go1.15 milestone Mar 10, 2020
@shawndx
Copy link
Contributor Author

shawndx commented Mar 11, 2020

When you say "holding M's lock", which lock do you mean?

It's the 'locks' of 'm',

Here are the stacks of doSigPreempt when the signal landing on target G or its g0.

#0 runtime.doSigPreempt (gp=0x4000000180, ctxt=0x400050ac38) at /home/xiaji01/src/goan/src/runtime/signal_unix.go:329
#1 0x000000000004d2a0 in runtime.sighandler (sig=0x17, info=0x400050ada0, ctxt=0x400050ae20, gp=0x4000000180) at /home/xiaji01/src/goan/src/runtime/signal_unix.go:536
#2 0x000000000004c840 in runtime.sigtrampgo (sig=0x17, info=0x400050ada0, ctx=0x400050ae20) at /home/xiaji01/src/goan/src/runtime/signal_unix.go:444
#3 0x0000000000064934 in runtime.sigtramp () at /home/xiaji01/src/goan/src/runtime/sys_linux_arm64.s:428
#4
#5 0x0000000000056128 in runtime.siftdownTimer (t= []*runtime.timer, i=0x0) at /home/xiaji01/src/goan/src/runtime/time.go:1062
#6 0x0000000000054760 in runtime.dodeltimer0 (pp=0x4000026000) at /home/xiaji01/src/goan/src/runtime/time.go:392
#7 0x0000000000054b5c in runtime.cleantimers (pp=0x4000026000) at /home/xiaji01/src/goan/src/runtime/time.go:515
#8 0x0000000000053f88 in runtime.addInitializedTimer (t=0x4000f4f098) at /home/xiaji01/src/goan/src/runtime/time.go:263
#9 0x0000000000053ef8 in runtime.addtimer (t=0x4000f4f098) at /home/xiaji01/src/goan/src/runtime/time.go:254
#10 0x0000000000053e48 in time.startTimer (t=0x4000f4f098) at /home/xiaji01/src/goan/src/runtime/time.go:207
#11 0x000000000006ecec in time.AfterFunc (d=0x34630b8a000, f={void (void)} 0x4000286f00, ~r2=) at /home/xiaji01/src/goan/src/time/sleep.go:164
#12 0x00000000000729f4 in main.sst (n=0x80000) at /home/xiaji01/data/sample/runtime/gc_mark_slowdown/main.go:10
#13 0x0000000000072ae4 in main.SST () at /home/xiaji01/data/sample/runtime/gc_mark_slowdown/main.go:20
#14 0x0000000000072b20 in main.main () at /home/xiaji01/data/sample/runtime/gc_mark_slowdown/main.go:24
(gdb) p gp.preempt
$37 = 0x1
(gdb) p gp.m.locks
$38 = 0x1
(gdb) p/d gp.m.preemptGen
$39 = 1493963
(gdb) f 7
#7 0x0000000000054b5c in runtime.cleantimers (pp=0x4000026000) at /home/xiaji01/src/goan/src/runtime/time.go:515
515 dodeltimer0(pp)
(gdb) p/d $len(pp.timers)
$44 = 32118

(gdb) bt
#0 runtime.doSigPreempt (gp=0x4000502180, ctxt=0x400050ac38) at /home/xiaji01/src/goan/src/runtime/signal_unix.go:329
#1 0x000000000004d2a0 in runtime.sighandler (sig=0x17, info=0x400050ada0, ctxt=0x400050ae20, gp=0x4000502180) at /home/xiaji01/src/goan/src/runtime/signal_unix.go:536
#2 0x000000000004c840 in runtime.sigtrampgo (sig=0x17, info=0x400050ada0, ctx=0x400050ae20) at /home/xiaji01/src/goan/src/runtime/signal_unix.go:444
#3 0x0000000000064934 in runtime.sigtramp () at /home/xiaji01/src/goan/src/runtime/sys_linux_arm64.s:428
#4
#5 runtime.findObject (p=0x40009e7ef8, refBase=0x0, refOff=0x0, base=, s=, objIndex=) at /home/xiaji01/src/goan/src/runtime/mbitmap.go:435
#6 0x0000000000035960 in runtime.wbBufFlush1 (p=0x4000026000) at /home/xiaji01/src/goan/src/runtime/mwbbuf.go:288
#7 0x000000000005f5cc in runtime.wbBufFlush.func1 () at /home/xiaji01/src/goan/src/runtime/mwbbuf.go:218
#8 0x0000000000061290 in runtime.systemstack () at /home/xiaji01/src/goan/src/runtime/asm_arm64.s:235
#9 0x000000000003e5c0 in runtime.startTheWorldWithSema (emitTraceEvent=0x0, ~r1=) at :1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p gp.preempt
$46 = 0x0
(gdb) p gp.m.locks
$47 = 0x1
(gdb) p/d gp.m.preemptGen
$48 = 1495188
(gdb) p gp.m.g0
$49 = (runtime.g *) 0x4000502180

@ianlancetaylor
Copy link
Member

Thanks.

Perhaps the problem you are seeing is that this program spends much of its time with the M holding a lock. That seems unlikely to be true of real programs.

Which isn't to say that we shouldn't try to fix this if possible.

CC @cherrymui

@cherrymui
Copy link
Member

I cannot reproduce it on my AMD64 machine, or the linux-arm64 builder. From what you and Ian described, this seems a problem that could happen on any platform, not just ARM64. Is there anything special on your machine (like, a large number of CPU cores)?

@shawndx
Copy link
Contributor Author

shawndx commented Mar 12, 2020

I cannot reproduce it on my AMD64 machine, or the linux-arm64 builder. From what you and Ian described, this seems a problem that could happen on any platform, not just ARM64. Is there anything special on your machine (like, a large number of CPU cores)?

Yes, it happened on several arm64 servers with many cores (30+) and was never hit on amd64.

Various random benchmarking and testing timeout issues were observed, among of them is analyzed BenchmarkStartStop, others are still under investigation. I am inclined to view they share the same cause, that is too many preemptions from GC, in runtime.suspendG, keep a thread in locked state busy with signal handling. On amd64 machines the target goroutines usually reache a safe point after dozens of preemption attempts, but on some ARM64 machines they might not succeed until hundred millions of attempts.

Those time-out issues are still there, less frequently, with 1.14, the newest issue is testing 'reflect' package with "internal linking of -buildmode=pie" on an A72 machine may cost more than 60 seconds, triggering a time-out failure, initial analysis shows it's caused by the preemption in suspending a goroutine for stack scanning as well.

I wonder if the preemption pace shall adapt to the target platform.

@randall77
Copy link
Contributor

It is entirely possible that #37741 is related.

@shawndx
Copy link
Contributor Author

shawndx commented Mar 12, 2020

It is entirely possible that #37741 is related.

Yes, they share the same cause, thanks for linking the two issues.

@ianlancetaylor
Copy link
Member

I don't think it's quite the same cause. I think that #37741 is about code that the compiler has marked unpreemptible. I think this issue is about code that is unpreemptible because it holds an M lock.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/223122 mentions this issue: runtime: leave cleantimers early if G is being preempted

@ianlancetaylor
Copy link
Member

@shawn-xdji Can you see if https://golang.org/cl/223122 fixes the problem that you are seeing? Thanks.

@shawndx
Copy link
Contributor Author

shawndx commented Mar 12, 2020

Thanks @ianlancetaylor , the fix works, I'm not able to check all types of machines we have for now, but pretty sure the fix should work on them.
The long running time of testing 'reflect' is still there, assume the fix being discussed in #37741 may help.

@ianlancetaylor
Copy link
Member

Thanks for testing it. I will commit this for the future 1.15 release.

My next question is whether this arises in real code. That is, is there a good reason to backport this for a 1.14 patch release? That is less interesting if it only appears in artificial benchmarks. Thanks.

@shawndx
Copy link
Contributor Author

shawndx commented Mar 13, 2020

Hi @ianlancetaylor So far it's not been observed in real code, only in a few micro benchmarks.
Without the fix, a long latency of GC may block the testing framework during its auto-adjusting phase, due to the explicit GC calling in testing.runN, triggering time-out of CI system.
There are other benchmarks on arm64 platform that experienced similar problems, but much harder to be reproduced compared with the timer benchmarks, and probably @randall77 's new fix will address them, I'll update later if finding anything new.
Thanks.

@golang golang locked and limited conversation to collaborators Mar 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants