-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: optimize the insert performance #13974
Comments
Per discussion on other channels, the testing is being done on a table pre-split with 2000 ranges using 600 threads to INSERT data. Given these numbers, I don't think improving the concurrency of writes to a particular range will provide a significant benefit as there will be very little per-range concurrency. Two thoughts:
|
I did a bit of investigation this morning and managed to reproduce poor latencies using the |
@petermattis forgotten to say, in our test, we set it's our benchmark test, we found the |
Getting back to this today, a mutex profile showed high contention for the I too have had the observation that latencies seem to be rising faster than I would expect given a significant amount of idle CPU on my test machines. |
Apparently I can't read a profile: both |
@petermattis, how do you detect the contention for the raftEntryCache mutex, through |
@a6802739 I set |
@petermattis about
And I couldn't quite understand the debug info when I open Thanks very much. |
You'll likely need to build |
@petermattis thanks very much. |
@a6802739 Are your requirements for max latency? 99th-%tile latency? Some other percentile? |
I ran the following experiment against a 6-node cluster where I had pre-split the
The SQL latencies started in the 5-6ms range for the 1 and 6 client runs, and rose 45-50ms for the 96 client run. There is a similar rise in the Exec latency graph. This graph measures the time it takes to perform a The Exec Success graph shows that the Node operations are fairly evenly distributed across the nodes in the cluster. Our internal measure of network latency calculated using the CPU usage climbed over the course of the test, but these are 8 CPU machines so even at the peak usage the machines were 50% idle. Network usage is higher than I would expect at 32 MiB/sec on the 96 client run. The corresponding SQL traffic: That's a hefty network traffic blowup. Are we doing something excessive somewhere? Disk utilization is negligible: No conclusions from this yet. |
Oh, wow! The Network Traffic graph was for a single node. All of the nodes show similar traffic, so the cluster was pumping 180 MiB/sec at the peak. |
Ah, that's |
@petermattis, you run test from |
@a6802739 I ran the test with 1-96 clients. Each test always used 6 nodes. The purpose was to see how latency was affected by the number of clients and to try and identify the bottleneck. |
FYI, I'm still investigating this. Specifically I'm trying to figure out why increasing load doesn't eventually drive CPU utilization close to 100%. There seems to be something going on with too many Go runtime threads being stuck in Cgo calls. I have an experimental branch which replaces |
The first run is using a binary built from my |
@petermattis how do you run you test to 6 nodes, I mean how to config haproxy and how to specify the |
We use the following
I then I run the
So |
Batch concurrent commits of write-only batches (i.e. most batches) in Go. This gives a 10% performance boost on a write-only workload on my laptop and a 50% performance boost on a write-only workload on a single-node cluster running on Azure. See cockroachdb#13974
Batch concurrent commits of write-only batches (i.e. most batches) in Go. This gives a 10% performance boost on a write-only workload on my laptop and a 50% performance boost on a write-only workload on a single-node cluster running on Azure. See cockroachdb#13974
@petermattis looks like this might have gotten addressed by #14138? I'm going to close for now, but please re-open if this is still an issue. |
QUESTION
here is
handle raft ready
warning log.we could found
handle raft ready
takes a log time.suggestions:
handle raft ready
andproposeRaftRequest
in a concurrent wayhandle raft ready
, apply thecommited raft log
in a asynchronous wayI post the trace for some insert statement:
the schema is :
the insert statement like:
The text was updated successfully, but these errors were encountered: