-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: stack growth during critical section harms performance #18020
Comments
Did you try tip? |
I did; tip has slightly better performance than Go 1.7.3, but still much slower than Go 1.6.3. |
Looked at the pb.gz profiles in etcd-io/etcd#6876. Slightly confusing because the Go 1.7 profile ran 2X as long as the Go 1.6 profile so the total times are not directly comparable. However, in Go 1.6 the scheduler ran for about 2s, while in Go 1.7 the scheduler ran for about 20s, so about 5X more. That probably has something to do with it. http2 probably does not have anything to do with it. From the profile it seems clear that you are running a vendored copy of http2, not the one from the Go distribution itself. Go 1.6: Go 1.7: We might be able to fix this in Go 1.8 if the fix is simple. /cc @aclements @RLH |
I gave different duration to test the same workloads, but they are confusing for profiling. So I collected
And here are Go 1.6.3
Go 1.7.3
Thanks a lot! |
I haven't seen any update on this bug about anything Go 1.8-related, so I'm going to kick this down the road further. Please try Go 1.8 and file bugs about any performance problems, thanks! In the future, please start your trials of new Go versions earlier. Testing the move from Go 1.6 to Go 1.7 doesn't help us much when we're just about done with Go 1.8. |
@bradfitz Thanks! I ran the same tests with
Here are benchmark binaries and profiled data: go1.6.4 go1.8beta1 wget https://storage.googleapis.com/etcd/tip/etcd-v3.0.2-go1.6.4
wget https://storage.googleapis.com/etcd/tip/etcd-v3.0.2-go1.8beta1
wget https://storage.googleapis.com/etcd/tip/etcdctl-master-go1.8beta1
wget https://storage.googleapis.com/etcd/tip/benchmark-master-go1.8beta1
sudo chmod +x ./etcd-v3.0.2-go1.6.4 && ./etcd-v3.0.2-go1.6.4 --version
sudo chmod +x ./etcd-v3.0.2-go1.8beta1 && ./etcd-v3.0.2-go1.8beta1 --version
sudo chmod +x ./etcdctl-master-go1.8beta1 && ./etcdctl-master-go1.8beta1 --version
sudo chmod +x ./benchmark-master-go1.8beta1 && ./benchmark-master-go1.8beta1 help |
Leaving for @aclements to decide what to do here, and when. |
I've figured out, whats going on. I've verified that following approaches make this regression disappear:
However both of them look like hacks to me. |
I can confirm this. We also suspect that there is a more serious lock contention triggered by some runtime changes. So we change the locking mechanism, the problem disappeared. But as you mentioned, this issue is probably still worth fixing. |
CL https://golang.org/cl/45142 mentions this issue. |
@TocarIP thanks for the excellent analysis. If you can still reproduce this, I'd be curious what the effect of CL 45142 is. I'm really not sure what the right answer here is, but that hack might mitigate things from both this issue and #18138. I've been thinking for a while that the latency of stack growth can be problematic, but this is an interesting case where it's not just a problem with large stacks. The old segmented stacks approach had less of this problem, but it had other anomalies. I wonder if it's possible to efficiently but incrementally move a stack. |
I've tested CL 45142, and it doesn't help |
At my request, @TocarIP also checked CL 43150 (thanks!). It's not a deep fix, but it does at least help a little--from 13.4s to 12.4s. |
I’ve done a bit of work on stack copying this cycle. Much of it is in. https://golang.org/cl/109001 is under review. I have one more I hope to mail soon, but no promises. If anyone can still reproduce readily, I’d be curious to hear whether / how much tip has improved, and whether the linked and future CLs help. |
From briefly reading this issue, it sounds like this might be resolved with 016d755 which should be included in Go 1.19. |
Hello,
etcd observed significant performance regression with Go 1.7.3.
Details follow or please visit etcd-io/etcd#6876.
What version of Go are you using (
go version
)?Go 1.6.3 and Go 1.7.3
What operating system and processor architecture are you using (
go env
)?Google Cloud VM with 8 vCPUs, 16 GB memory, ubuntu-1610
What did you do?
Build etcd binaries, others
Build etcd binaries in different Go versions with etcd commit etcd-io/etcd@faeeb2f.
Go version is the only difference!
Or
Run single-node cluster, benchmark
Go 1.6.3 + master branch Go 1.7.3 client
rm -rf test.etcd ./etcd-v3.0.2-go1.6.3 --name test \ --listen-client-urls http://localhost:2379 \ --advertise-client-urls http://localhost:2379 \ --listen-peer-urls http://localhost:2380 \ --initial-advertise-peer-urls http://localhost:2380 \ --initial-cluster test=http://localhost:2380 \ --initial-cluster-token test-token \ --initial-cluster-state new \ --enable-pprof ETCDCTL_API=3 ./etcdctl-master-go1.7.3 --endpoints=localhost:2379 put foo bar ETCDCTL_API=3 ./etcdctl-master-go1.7.3 --endpoints=localhost:2379 get foo --consistency=s ./benchmark-master-go1.7.3 --endpoints=localhost:2379 --conns=100 --clients=1000 range foo --consistency=s --total=200000
Go 1.7.3 + master branch Go 1.7.3 client
rm -rf test.etcd ./etcd-v3.0.2-go1.7.3 --name test \ --listen-client-urls http://localhost:2379 \ --advertise-client-urls http://localhost:2379 \ --listen-peer-urls http://localhost:2380 \ --initial-advertise-peer-urls http://localhost:2380 \ --initial-cluster test=http://localhost:2380 \ --initial-cluster-token test-token \ --initial-cluster-state new \ --enable-pprof ETCDCTL_API=3 ./etcdctl-master-go1.7.3 --endpoints=localhost:2379 put foo bar ETCDCTL_API=3 ./etcdctl-master-go1.7.3 --endpoints=localhost:2379 get foo --consistency=s ./benchmark-master-go1.7.3 --endpoints=localhost:2379 --conns=100 --clients=1000 range foo --consistency=s --total=200000
What did you expect to see?
etcd with Go 1.6.3 shows:
What did you see instead?
etcd with Go 1.7.3 is 2x slower:
We expected Go 1.7.3 to be comparable or better.
Here's our profiling data
Note: etcd uses old
golang.org/x/net/http2
but same behavior happens with latesthttp2
; see etcd-io/etcd#6876 (comment).Could anybody help us debug this around Go runtime?
Please let me know if more information is needed.
Thanks a lot in advance!
The text was updated successfully, but these errors were encountered: