-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission,server: scope store overload admission control to write ope… #67717
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I owe new unit tests, which I'll do after some initial feedback.
We could also use a multi-store roachtest for testing this -- do we have any existing ones that I could use as code examples?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)
6377181
to
0aee0ba
Compare
@RaduBerinde @ajwerner any thoughts on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this fell through the cracks.
A bit worried that we keep adding per-request overhead. How hard would it be to make it so that we still use a single queue when we have a single store? (which I believe is the case in most deployments)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit worried that we keep adding per-request overhead. How hard would it be to make it so that we still use a single queue when we have a single store? (which I believe is the case in most deployments)
In the current setup, even with a single store, only KV writes (not reads) go through both queues.
I think we'll need to solve any overhead issues with more code optimizations rather than fusing together queues. The latter would make the code more complicated and less composable. I plan to start running tpcc and kv roachtests with admission control enabled, which should tell us more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @sumeerbhola)
pkg/server/node.go, line 958 at r2 (raw file):
return nil, err } if !continueToSecondQ {
[nit] Maybe add a comment explaining that in this case, admission control for KVWork is disabled so there is no point to use the second queue.
Also maybe rename continueToSecondQ
to admissionEnabled
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use a multi-store roachtest for testing this -- do we have any existing ones that I could use as code examples?
@ajwerner I could use some suggestions on this
- I see only one roachtest with multiple stores
tpce.go
. Confusingly, it specifies the store count both toCluster.Start
(--store-count) and toTestSpec.Cluster
. It's not clear to me why it is not sufficient to do the former. - I'd like to create a kv-like roachtest (single node is fine) with 2 stores and pin tables to the stores to prevent rebalancing, and overload one store using a high concurrency kv0 workload and have the other store see a kv95 workload that doesn't overload it. Currently
workload/kv/kv.go
hardcodes the table name, which should be easy to override using a parameter. Also, I suppose it is ok to do twoworkload run
calls (I can two nodes for the workloads). But I don't how to pin the tables and prevent rebalancing.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see only one roachtest with multiple stores tpce.go. Confusingly, it specifies the store count both to Cluster.Start (--store-count) and to TestSpec.Cluster. It's not clear to me why it is not sufficient to do the former.
The former creates VMs with 2 disks. The latter provides flags to cockroach start
to use those disks.
I can two nodes for the workloads
nit: I don't know that you need this. One workload node to run both workloads seems likely to be sufficient
Currently workload/kv/kv.go hardcodes the table name
While this is true, you can use different databases using the db
flag for workload to configure them separately.
But I don't how to pin the tables and prevent rebalancing.
You'll need to set up zone configs.
I'd like to create a kv-like roachtest (single node is fine) with 2 stores and pin tables to the stores to prevent rebalancing, and overload one store using a high concurrency kv0 workload and have the other store see a kv95 workload that doesn't overload it.
What we're really missing here is that roachprod is not putting attributes on the stores. We'd need to put something like attrs=s%d
for the store index or something around here:
var storeDirs []string |
Then we could do something like:
CREATE DATABASE db1;
ALTER DATABASE db1 CONFIGURE ZONE USING constraints = '[+s1]';
CREATE DATABASE db2;
ALTER DATABASE db2 CONFIGURE ZONE USING constraints = '[+s2]';
And then in the two workloads, specify the --db
flag.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
d3db1fe
to
b2b1858
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
pkg/util/admission/granter_test.go, line 219 at r3 (raw file):
// TestStoreCoordinators tests only the setup of GrantCoordinators per store. // Testing of IO load functionality happens in TestIOLoadListener.
added a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajwerner Thanks, that is very useful! I have a simple roachtest with 2 stores but I can't figure out why the two overloaded db (db2) is not being placed on store2. I see the following in crdb_internal.zones.txt
52 0 DATABASE db1 NULL db1 NULL NULL NULL NULL "range_min_bytes: null
range_max_bytes: null
gc: null
global_reads: null
num_replicas: null
num_voters: null
constraints: [+store1]
voter_constraints: []
lease_preferences: []
" "ALTER DATABASE db1 CONFIGURE ZONE USING
constraints = '[+store1]'" 2\x102\f\b\x01\x12\x00\x1a\x06store18\x00P\x00X\x01x\x00 "range_min_bytes: 134217728
range_max_bytes: 536870912
gc:
ttlseconds: 90000
global_reads: null
num_replicas: 3
num_voters: null
constraints: [+store1]
voter_constraints: []
lease_preferences: []
" "ALTER DATABASE db1 CONFIGURE ZONE USING
range_min_bytes = 134217728,
range_max_bytes = 536870912,
gc.ttlseconds = 90000,
num_replicas = 3,
constraints = '[+store1]',
lease_preferences = '[]'"
53 0 DATABASE db2 NULL db2 NULL NULL NULL NULL "range_min_bytes: null
range_max_bytes: null
gc: null
global_reads: null
num_replicas: null
num_voters: null
constraints: [+store2]
voter_constraints: []
lease_preferences: []
" "ALTER DATABASE db2 CONFIGURE ZONE USING
constraints = '[+store2]'" 2\x102\f\b\x01\x12\x00\x1a\x06store28\x00P\x00X\x01x\x00 "range_min_bytes: 134217728
range_max_bytes: 536870912
gc:
ttlseconds: 90000
global_reads: null
num_replicas: 3
num_voters: null
constraints: [+store2]
voter_constraints: []
lease_preferences: []
" "ALTER DATABASE db2 CONFIGURE ZONE USING
range_min_bytes = 134217728,
range_max_bytes = 536870912,
gc.ttlseconds = 90000,
num_replicas = 3,
constraints = '[+store2]',
lease_preferences = '[]'"
And the node was started with the correct store attributes
I210806 22:30:36.721467 9327 util/log/file_sync_buffer.go:238 ⋮ [config] arguments: [‹./cockroach› ‹start› ‹--insecure› ‹--store› ‹path=/mnt/data1/cockroach,attrs=store1› ‹--store› ‹path=/mnt/data2/cockroach,attrs=store2› ‹--log› ‹file-defaults: {dir: 'logs', exit-on-error: false}› ‹--cache=25%› ‹--max-sql-memory=25%› ‹--port=26257› ‹--http-port=26258› ‹--locality=cloud=gce,region=us-east1,zone=us-east1-b› ‹--join=34.73.187.44:26257› ‹--advertise-host› ‹10.142.0.10›]
I didn't anything in the logs that suggests that it is trying to move ranges to conform to these configs. And since these configs were created before the tables, I am assuming that the initial placement of the ranges will be correct.
I couldn't find anything relevant in the db console either. Any suggestions on where I should look?
The code is in
sumeerbhola@4aeb26d
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
94c07b0
to
1d5cae8
Compare
…rations on the overloaded store Previously KVWork was all admitted through a single WorkQueue, and this WorkQueue was paired with a GrantCoordinator that took into account both CPU overload and storage overload. This meant a single overloaded store (in a multi-store setting) would slow down all KV operations including reads, and those on stores that were fine. We now have multiple WorkQueue's, one per store, that KV writes go through, and then get to the shared WorkQueue that all operations go through. The per-store WorkQueues are associated with their own GrantCoordinators that listen to health state for their respective stores. The per-store queues do not care about CPU and therefore do not do grant chaining. Since admission through these per-store queues happens before the shared queue, a store bottleneck will not cause KV slots in the shared queue to be taken by requests that are still waiting for admission elsewhere. The reverse can happen, and is considered acceptable -- per-store tokens, when a store is overloaded, can be used by requests that are now waiting for admission in the shared WorkQueue because of a cpu bottleneck. The code is significantly refactored for the above: NewGrantCoordinators returns a container called StoreGrantCoordinators which lazily initializes the relevant per-store GrantCoordinators when it first fetches Pebble metrics, in addition to the shared GrantCoordinator. The server code now integrates with both types and the code in Node.Batch will sometimes subject a request to two WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the periodic ticking that fetches these metrics at 1min intervals, and does 1s ticks, is moved to StoreGrantCoordinators. This simplifies the ioLoadListener which no longer does the periodic ticking and eliminates a some testing-only abstractions. The per-store WorkQueues share the same metrics, which represent an aggregate across these queues. Informs cockroachdb#65957 Release note: None
1d5cae8
to
e0f1479
Compare
Ack. Will look in the next couple of hours. |
I'm going to merge this PR and do the roachtest in a followup, since @RaduBerinde approved this and he is out this week. |
bors r+ |
Build succeeded: |
@ajwerner any pointers would be much appreciated |
I think that the problem is the inscrutible difference between
I just ran the experiment on
|
…rations on the overloaded store
Previously KVWork was all admitted through a single WorkQueue, and this
WorkQueue was paired with a GrantCoordinator that took into account
both CPU overload and storage overload. This meant a single overloaded
store (in a multi-store setting) would slow down all KV operations
including reads, and those on stores that were fine.
We now have multiple WorkQueue's, one per store, that KV writes go
through, and then get to the shared WorkQueue that all operations go
through. The per-store WorkQueues are associated with their own
GrantCoordinators that listen to health state for their respective
stores. The per-store queues do not care about CPU and therefore do not
do grant chaining. Since admission through these per-store queues
happens before the shared queue, a store bottleneck will not cause KV
slots in the shared queue to be taken by requests that are still waiting
for admission elsewhere. The reverse can happen, and is considered
acceptable -- per-store tokens, when a store is overloaded, can be used
by requests that are now waiting for admission in the shared WorkQueue
because of a cpu bottleneck.
The code is significantly refactored for the above:
NewGrantCoordinators returns a container called StoreGrantCoordinators
which lazily initializes the relevant per-store GrantCoordinators when
it first fetches Pebble metrics, in addition to the shared
GrantCoordinator. The server code now integrates with both types and
the code in Node.Batch will sometimes subject a request to two
WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the
periodic ticking that fetches these metrics at 1min intervals, and does
1s ticks, is moved to StoreGrantCoordinators. This simplifies the
ioLoadListener which no longer does the periodic ticking and eliminates
a some testing-only abstractions. The per-store WorkQueues share the
same metrics, which represent an aggregate across these queues.
Informs #65957
Release note: None