-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: don't include RHS data in merge trigger #28661
Conversation
Now that we require ranges to be collocated during a merge and the RHS replicas to be up-to-date before the merge commits, we no longer need to include a snapshot of the RHS in the merge trigger. We know that the copy of the data that already exists in the local store is perfectly up-to-date. So, stop sending the data in the merge trigger. Release note: None
329d913
to
bb7426a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/batcheval/cmd_end_transaction.go, line 1010 at r1 (raw file):
// copy from the RHS are the keys in the abort span, and we've already // accounted for those stats above. ms.Add(merge.RightMVCCStats)
If you fudge up something in these stats computations, which tests start to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r=bdarnell,tschottdorf
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/storage/batcheval/cmd_end_transaction.go, line 1010 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
If you fudge up something in these stats computations, which tests start to fail?
TestStoreRangeMergeStats
, primarily. I haven't concocted a scenario yet which it fails to catch.
23885: kv: evict leaseholder on RPC error r=solongoron a=tschottdorf This addresses a situation in which we would not evict a stale leaseholder for a long time. Consider the replicas [s1,s2,s3] and s1 is down but is the cached leaseholder, while s2 is the actual lease holder. The RPC layer will try s1, get an RPC error, try s2 and succeed. Since there is no NotLeaseHolderError involved, the cache would not get updated, and so every request pays the overhead of trying s1 first. WIP because needs testing. Touches #23601. Release note (bug fix): Improve request routing during node outages. 28609: opt: Make additional perf improvements r=andy-kimball a=andy-kimball Make several more fixes: 1. Do not qualify column names in metadata, since that requires expensive string formatting up-front (also cleanup the factoring of this code, which had gotten messy). 2. Inline Metadata into Memo. 3. Inline logicalPropsBuilder into the Memo. Together, these changes improve KV perf from: ``` Phases/kv-read/OptBuild 18.4µs ± 1% ``` to: ``` Phases/kv-read/OptBuild 17.8µs ± 1% ``` 28661: storage: don't include RHS data in merge trigger r=bdarnell,tschottdorf a=benesch Now that we require ranges to be collocated during a merge and the RHS replicas to be up-to-date before the merge commits, we no longer need to include a snapshot of the RHS in the merge trigger. We know that the copy of the data that already exists in the local store is perfectly up-to-date. So, stop sending the data in the merge trigger. Release note: None 28689: sqlbase: avoid using SERIAL in system tables r=knz a=knz Needed for #28575. We'll soon want special behavior for SERIAL. We can't afford the definition of system tables to be subject to a discussion about what SERIAL means. So this patch ensures system tables don't use SERIAL. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Build succeeded |
28684: storage: make subsumption atomic with merge commit application r=bdarnell,tschottdorf a=benesch ~Please review the first commit separately in #28661.~ During a merge, the subsumed range logically ceases to exist at the moment that the merge transaction commit applies. It's important that we remove the subsumed range from disk in the same batch as the merge commit to protect against an ill-timed crash. Tease apart Replica.destroyRaftMuLocked to make this possible. Release note: None 28707: server: don't drain when decommissioning r=nvanbenschoten a=tschottdorf Prior to this commit, a server which enters decommissioning would automatically drain. This wasn't a great idea since it meant that pointing the decommissioning command at a set of nodes required for the cluster to work would brick that cluster, and would be difficult to recover from (since the draining state is picked up by the nodes when they start). Instead, decouple draining from decommissioning. Decommissioning simply tells the allocator to move data off that node; when the node gets shut down cleanly (the operator's responsiblity), it will drain. The resulting behavior when trying to decommission a too-large set of nodes is now that the decommission command will simply not finish. For example, starting a three node cluster and decommissioning all nodes will simply hang (though the nodes will be marked as decommissioning). Add three more nodes to the cluster and replicas will move over to the newly added nodes, and the decommissioning command will finish. As a bonus, recommissioning a node now doesn't require the target node to restart. As a second bonus, the decommissioning acceptance test now takes around 60% of the previous time (~45s down from 70+). One remaining caveat is that users may forget that they attempted to decommission nodes. We need to check that we prominently alert in the UI when nodes are decommissioning. This isn't a new problem. Fixes #27444. Fixes #27025. Release note (bug fix): decommissioning multiple nodes is now possible without posing a risk to cluster health. Recommissioning a node does no longer require a restart of the target nodes to take effect. Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Prior to cockroachdb#28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. Now that we require ranges to be collocated during merges, we didn't need to include the snapshot nor declare the span. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release note: None
Prior to cockroachdb#28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. Now that we require ranges to be collocated during merges, we didn't need to include the snapshot nor declare the span. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release note: None
40261: storage: remove span declaration for merges r=irfansharif a=irfansharif Prior to #28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. Now that we require ranges to be collocated during merges, we didn't need to include the snapshot nor declare the span. Also informs #32583. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Prior to cockroachdb#28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. After we required ranges to be collocated during merges, we didn't need to include the snapshot, so now now longer need to declare the span. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release note: None
41036: storage: remove span declaration for merges r=ajwerner a=irfansharif I'm just re-sending #40261 (reverted in #40446). These changes are intended for 20.1 and will not be merged until after branching. --- Prior to #28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. Now that we require ranges to be collocated during merges, we didn't need to include the snapshot nor declare the span. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release justification: N/A Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Now that we require ranges to be collocated during a merge and the RHS
replicas to be up-to-date before the merge commits, we no longer need to
include a snapshot of the RHS in the merge trigger. We know that the
copy of the data that already exists in the local store is perfectly
up-to-date.
So, stop sending the data in the merge trigger.
Release note: None