-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: pre-split hash sharded indexes on shard boundaries before DELETE_AND_WRITE_ONLY
#74558
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Comments
ajwerner
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Jan 6, 2022
craig bot
pushed a commit
that referenced
this issue
Jan 24, 2022
74810: kvserver: remove unnecessary special casing of lease for LAI r=nvanbenschoten a=tbg Touches #33007 (this simplifies #71806). We were previously "not" assigning a LeaseAppliedIndex to lease request proposals. But we were doing so very half-heartedly: instead of always assigning zero, we assigned "whatever was assigned to the previous command". This meant that in practice, the first lease on a range would get zero, and any subsequent leases would get some nonzero numbers. This wasn't particularly principled and raises eyebrows. For testing convenience and clarity it is helpful to assign a zero MLAI to leases, which is what this commit does. We then turn to the special-casing in refreshProposalsLocked. `(*Replica).refreshProposalsLocked` previously refused to repropose commands that had a zero `AppliedLeaseIndex` (LAI). This was done on grounds of needing to provide replay protection for these requests. Upon inspection it became clear that a zero MLAI could only ever apply to lease requests; they are the only proposals with an exception, and as we saw above it would not usually result in a zero, but could (and especially in testing, where most leases are the first lease on their respective range). We [discussed] this internally and concluded that leases can in fact be reproposed, since they have their own replay protection mediated through the lease's sequence number. This is great since as stated above, we did in fact repropose (some) leases and have for years. This commit removes the special casing. Fixes #74711. As suggested by @nvanbenschoten I ran the `kvserver` tests with this diff that reproposes every lease request: ```diff diff --git a/pkg/kv/kvserver/replica_proposal_buf.go b/pkg/kv/kvserver/replica_proposal_buf.go index 95765d8efd..1b66745866 100644 --- a/pkg/kv/kvserver/replica_proposal_buf.go +++ b/pkg/kv/kvserver/replica_proposal_buf.go @@ -576,6 +576,11 @@ func (b *propBuf) FlushLockedWithRaftGroup( ents = append(ents, raftpb.Entry{ Data: p.encodedCommand, }) + if p.Request.IsLeaseRequest() { + ents = append(ents, raftpb.Entry{ + Data: p.encodedCommand, + }) + } } } if firstErr != nil { ``` This inspired the subsequent commits. [discussed]: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1641559185021100 Release note: None 74923: sql: pre-split hash sharded indexes ranges before DELETE_AND_WRITE_ONLY r=chengxiong-ruan a=chengxiong-ruan Fixes #74558 Pre-split ranges on shard boundaries before SchemaChanger move new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY state. Release note (bug fix): None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
chengxiong-ruan
added a commit
to chengxiong-ruan/cockroach
that referenced
this issue
Jan 26, 2022
Fixes cockroachdb#74558 Pre-split ranges on shard boundaries before SchemaChanger move new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY state. Release note (bug fix): When creating hash sharded index to an existing table, traffic could hit hard on the single range of the index before it is split into more ranges for shards as range size grows. This change make schema changer able to presplit ranges on shard boundaries before the index becomes writable. `sql.hash_sharded_range_pre_split.max` is the cluster setting added so that users can set the upbound of the amount of ranges to have. If the bucket count of the defined index is less than the cluster setting, the bucket count will be the amount of pre-split ranges.
chengxiong-ruan
added a commit
to chengxiong-ruan/cockroach
that referenced
this issue
Jan 26, 2022
Fixes cockroachdb#74558 Pre-split ranges on shard boundaries before SchemaChanger move new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY state. Release note (bug fix): When creating hash sharded index to an existing table, traffic could hit hard on the single range of the index before it is split into more ranges for shards as range size grows. This change make schema changer able to presplit ranges on shard boundaries before the index becomes writable. `sql.hash_sharded_range_pre_split.max` is the cluster setting added so that users can set the upbound of the amount of ranges to have. If the bucket count of the defined index is less than the cluster setting, the bucket count will be the amount of pre-split ranges.
chengxiong-ruan
added a commit
to chengxiong-ruan/cockroach
that referenced
this issue
Jan 26, 2022
Fixes cockroachdb#74558 Pre-split ranges on shard boundaries before SchemaChanger move new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY state. Release note (bug fix): When creating hash sharded index to an existing table, traffic could hit hard on the single range of the index before it is split into more ranges for shards as range size grows. This change make schema changer able to presplit ranges on shard boundaries before the index becomes writable. `sql.hash_sharded_range_pre_split.max` is the cluster setting added so that users can set the upbound of the amount of ranges to have. If the bucket count of the defined index is less than the cluster setting, the bucket count will be the amount of pre-split ranges.
exalate-issue-sync
bot
added
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
and removed
T-sql-schema-deprecated
Use T-sql-foundations instead
labels
May 10, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Is your feature request related to a problem? Please describe.
We're creating a hash sharded index because we know that the write volume will be sequential. As we've see in #62672 (comment) and elsewhere, throughput can crater when we make these new indexes writable because all of the load lands on a single range. Ideally we'd pre-split the new index before turning on traffic.
It's a somewhat tricky problem to figure out, in the general case, where to pre-split these new indexes. Fortunately, for hash sharded indexes, we get to make an assumption that underneath each shard, the data is uniformly distributed. That means we can safely split at each shard boundary and know that it's a smart place to put a split.
Describe the solution you'd like
Before making a hash sharded index
DELETE_AND_WRITE_ONLY
, split it with an expiring split point at each shard boundary.Describe alternatives you've considered
There was tons of discussion of the broader problem in this internal thread.
Additional context
We should also split other secondary indexes at the same time, but the heuristic on how many split and where to put them is less obvious. We'll track that separately.
This change should be backportable.
Epic: CRDB-7363
gz#10941
The text was updated successfully, but these errors were encountered: