Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: performance degradation in high-load long running applications #32398

Open
akaspin opened this issue Jun 3, 2019 · 10 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@akaspin
Copy link

akaspin commented Jun 3, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/spin/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/.../go"
GOPROXY=""
GORACE=""
GOROOT="/.../1.12.5/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.12.5/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/q5/b4nzm8zj3g5gcjs4rhkpvczh0000gn/T/go-build533400238=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

My company spotted serious performance degradation in one of high load services. Service handles about 100K TCP permanent connections and using complex processing for about 75K partners requests per second. When we tried to upgrade Go from 1.11.6 to 1.12.5 we noted that performance fell by a third. Service begins seriously misbehave after about 15-30 mins after start.

After investigation I found that performance degradation caused by commits 1678b2c5..873bd47d. Commits 1678b2c5..457c8f4f causes slight performance degradation. Performance of service builded with 873bd47d and later is absolutely horrible as well as memory consumption. Also I noted that CPU load from 873bd47d reduced from 100% to 67%. Last commit with acceptable performance is 6e9fb11b3.

Very strange benchmark:

func BenchmarkWrite(b *testing.B) {
	f, err := os.Create("/dev/null")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	buf := make([]byte, 1024*100)
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			for i:=0;i<100000;i++ {
				f.SetDeadline(time.Now().Add(5*time.Minute))
				f.Write(buf)
				f.SetDeadline(time.Time{})
				runtime.GC()
			}
		}
	})
	b.StopTimer()
}

Maybe not of all ops makes sense but the results speak for themselves. Benchmarks are executed with Go built from source 6e9fb11b..873bd47d. Here is brief results. You can find details in attached archives.

6e9fb11b (OK):

$ CGO_ENABLED=0 $HOME/go.dev/bin/go test -run=XXX -bench=BenchmarkWrite ./...
goos: linux
goarch: amd64
pkg: ...
BenchmarkWrite-2   	       1	17839662490 ns/op	   93816 B/op	      61 allocs/op
PASS

873bd47d (degraded):

...
goos: linux
goarch: amd64
pkg: ...
BenchmarkWrite-2   	       1	41872237656 ns/op	 6708104 B/op	   68744 allocs/op
PASS

Detailed results:
0-6e9fb11b.zip
1-1678b2c5.zip
2-143b13ae.zip
3-457c8f4f.zip
4-873bd47.zip

Referred commits:

What did you expect to see?

Normal performance like in Go 1.11.6.

What did you see instead?

Big performance degradation with service built with Go 1.12+

@ianlancetaylor
Copy link
Member

CC @aclements @mknyszek

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 3, 2019
@ianlancetaylor ianlancetaylor added this to the Go1.13 milestone Jun 3, 2019
@mknyszek
Copy link
Contributor

mknyszek commented Jun 3, 2019

What did you do?

My company spotted serious performance degradation in one of high load services. Service handles about 100K TCP permanent connections and using complex processing for about 75K partners requests per second. When we tried to upgrade Go from 1.11.6 to 1.12.5 we noted that performance fell by a third. Service begins seriously misbehave after about 15-30 mins after start.

After investigation I found that performance degradation caused by commits 1678b2c5..873bd47d. Commits 1678b2c5..457c8f4f causes slight performance degradation. Performance of service builded with 873bd47d and later is absolutely horrible as well as memory consumption. Also I noted that CPU load from 873bd47d reduced from 100% to 67%. Last commit with acceptable performance is 6e9fb11b3.

Just to be totally clear, this investigation was done using your service and not the benchmark below, correct?

Can you comment more on memory consumption? Do you mean RSS?

Also, since it's related to the changes that appear to cause the regression, do you run your service with GODEBUG=gcstoptheworld=1 or GODEBUG=gcstoptheworld=2?

Very strange benchmark:

func BenchmarkWrite(b *testing.B) {
	f, err := os.Create("/dev/null")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	buf := make([]byte, 1024*100)
	b.ResetTimer()
	b.RunParallel(func(pb *testing.PB) {
		for pb.Next() {
			for i:=0;i<100000;i++ {
				f.SetDeadline(time.Now().Add(5*time.Minute))
				f.Write(buf)
				f.SetDeadline(time.Time{})
				runtime.GC()
			}
		}
	})
	b.StopTimer()
}

Maybe not of all ops makes sense but the results speak for themselves. Benchmarks are executed with Go built from source 6e9fb11b..873bd47d. Here is brief results. You can find details in attached archives.

That is a strange benchmark indeed. The many calls to runtime.GC() are going to cause this benchmark to run really slowly in the first place. Does this benchmark relate to your service in any way?

@akaspin
Copy link
Author

akaspin commented Jun 4, 2019

@mknyszek This strange benchmark is just a quick and dirty try to simulate service load. Of course there are no runtime.GC() calls in service. This benchmark used with Git bisect to find suspicious commit. After this service was built with corresponding Go commits and deployed. Results of deployment with 873bd47d confirmed that "new" GC is absolutely ineffective under high load.

gcstoptheworld is not used. There are no notes about "new" GC in documentation. Also docs about gcstoptheworld is not changed. In our company we never used gcstoptheworld with any Go version.

I want to clarify again: right now service built with Go 1.11 works absolutely well without any GODEBUG options.

@mknyszek
Copy link
Contributor

Thanks for the information, and belated thanks for all the detail and the reproducer. My apologies for how long it's taken to get back to you.

I just want to be absolutely clear that 873bd47d and the commits before it are in fact the source of the problem for your service. Did you try the commit before 1678b2c5 (6e9fb11b) against your service? If you're confident your service is also running just fine with 6e9fb11b, then I'll dig deeper into what could be going wrong with the benchmark.

@akaspin
Copy link
Author

akaspin commented Jun 12, 2019

@mknyszek Thanks for response. Yes. As I wrote earlier I made test deployments with commits from 6e9fb11b to 873bd47d. I can confirm that 6e9fb11b works well.

  • Service built with 6e9fb11b works absolutely same as built with Go 1.11.6 (our current production version).
  • Three commits before 873bd47d gives slight performance degradation.
  • Service built with 873bd47d gives 30-40% performance degradation.

@akaspin
Copy link
Author

akaspin commented Jun 22, 2019

@mknyszek Just notes which may be useful during investigation (Go 1.11.6):

In service we massively using sync.Pool, sync.Map, large maps e. t. c.

Average stats:

  1. Memory consumption: alloc: 55GB, sys: 90GB.
  2. Heap objects: 450M
  3. Heap memory: idle: 32GB, inuse: 52GB, released: 16GB, sys: 85GB
  4. Memory: mspan: 690MB, stack: 890MB
  5. GC pauses: 10ms
  6. GD per minute: 2
  7. GC CPU fraction: 2.5%
  8. Next GC: 65GB
  9. Heap alloc: 60GB
  10. Goroutines: 100K

Hope this helps.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@jayeshwebcluesglobal
Copy link

@akaspin I am doing almost same what you have described above and am facing the same issue. we are doing around the same we are having almost 600 TCP/IP connection connected with another 600 TCP/IP so roughly around we are using 600x600 TCP/IP connection but right now we are having a temporary connection so every-time it will be creating and closing the connection. But it is consuming a huge amount of memory in short period but the Garbage collector is not freeing up in the same time so what do you suggest to have a permanent connection as you have done in you project and di you get the solution for your issue if you have solved it can you please guide us how can we solve this issue. we are using GRPC with protobuf.

@akaspin
Copy link
Author

akaspin commented Aug 2, 2019

@jayeshwebcluesglobal We just fixed Go version on 1.11.10.

@jayeshwebcluesglobal
Copy link

@akaspin Thanks! it seems we are running on older version(1.9.3), will check issue by upgrading to 1.11.10

@jayeshwebcluesglobal
Copy link

jayeshwebcluesglobal commented Aug 5, 2019

@akaspin Want to take your view on one thing,
We are making 600*600 TCP connections, should we make it permanent or should close and reopen when we want to make a request to each other ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

7 participants