storage: don't include RHS data in merge trigger #28661

benesch · 2018-08-15T20:32:46Z

Now that we require ranges to be collocated during a merge and the RHS
replicas to be up-to-date before the merge commits, we no longer need to
include a snapshot of the RHS in the merge trigger. We know that the
copy of the data that already exists in the local store is perfectly
up-to-date.

So, stop sending the data in the merge trigger.

Release note: None

cockroach-teamcity · 2018-08-15T20:32:53Z

This change is

Now that we require ranges to be collocated during a merge and the RHS replicas to be up-to-date before the merge commits, we no longer need to include a snapshot of the RHS in the merge trigger. We know that the copy of the data that already exists in the local store is perfectly up-to-date. So, stop sending the data in the merge trigger. Release note: None

tbg

Reviewed 10 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/storage/batcheval/cmd_end_transaction.go, line 1010 at r1 (raw file):

	// copy from the RHS are the keys in the abort span, and we've already
	// accounted for those stats above.
	ms.Add(merge.RightMVCCStats)

If you fudge up something in these stats computations, which tests start to fail?

bdarnell

Reviewed 10 of 10 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained

benesch

bors r=bdarnell,tschottdorf

Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/storage/batcheval/cmd_end_transaction.go, line 1010 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

If you fudge up something in these stats computations, which tests start to fail?

TestStoreRangeMergeStats, primarily. I haven't concocted a scenario yet which it fails to catch.

23885: kv: evict leaseholder on RPC error r=solongoron a=tschottdorf This addresses a situation in which we would not evict a stale leaseholder for a long time. Consider the replicas [s1,s2,s3] and s1 is down but is the cached leaseholder, while s2 is the actual lease holder. The RPC layer will try s1, get an RPC error, try s2 and succeed. Since there is no NotLeaseHolderError involved, the cache would not get updated, and so every request pays the overhead of trying s1 first. WIP because needs testing. Touches #23601. Release note (bug fix): Improve request routing during node outages. 28609: opt: Make additional perf improvements r=andy-kimball a=andy-kimball Make several more fixes: 1. Do not qualify column names in metadata, since that requires expensive string formatting up-front (also cleanup the factoring of this code, which had gotten messy). 2. Inline Metadata into Memo. 3. Inline logicalPropsBuilder into the Memo. Together, these changes improve KV perf from: ``` Phases/kv-read/OptBuild 18.4µs ± 1% ``` to: ``` Phases/kv-read/OptBuild 17.8µs ± 1% ``` 28661: storage: don't include RHS data in merge trigger r=bdarnell,tschottdorf a=benesch Now that we require ranges to be collocated during a merge and the RHS replicas to be up-to-date before the merge commits, we no longer need to include a snapshot of the RHS in the merge trigger. We know that the copy of the data that already exists in the local store is perfectly up-to-date. So, stop sending the data in the merge trigger. Release note: None 28689: sqlbase: avoid using SERIAL in system tables r=knz a=knz Needed for #28575. We'll soon want special behavior for SERIAL. We can't afford the definition of system tables to be subject to a discussion about what SERIAL means. So this patch ensures system tables don't use SERIAL. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>

craig · 2018-08-16T16:00:21Z

Build succeeded

GitHub CI (Cockroach)

28684: storage: make subsumption atomic with merge commit application r=bdarnell,tschottdorf a=benesch ~Please review the first commit separately in #28661.~ During a merge, the subsumed range logically ceases to exist at the moment that the merge transaction commit applies. It's important that we remove the subsumed range from disk in the same batch as the merge commit to protect against an ill-timed crash. Tease apart Replica.destroyRaftMuLocked to make this possible. Release note: None 28707: server: don't drain when decommissioning r=nvanbenschoten a=tschottdorf Prior to this commit, a server which enters decommissioning would automatically drain. This wasn't a great idea since it meant that pointing the decommissioning command at a set of nodes required for the cluster to work would brick that cluster, and would be difficult to recover from (since the draining state is picked up by the nodes when they start). Instead, decouple draining from decommissioning. Decommissioning simply tells the allocator to move data off that node; when the node gets shut down cleanly (the operator's responsiblity), it will drain. The resulting behavior when trying to decommission a too-large set of nodes is now that the decommission command will simply not finish. For example, starting a three node cluster and decommissioning all nodes will simply hang (though the nodes will be marked as decommissioning). Add three more nodes to the cluster and replicas will move over to the newly added nodes, and the decommissioning command will finish. As a bonus, recommissioning a node now doesn't require the target node to restart. As a second bonus, the decommissioning acceptance test now takes around 60% of the previous time (~45s down from 70+). One remaining caveat is that users may forget that they attempted to decommission nodes. We need to check that we prominently alert in the UI when nodes are decommissioning. This isn't a new problem. Fixes #27444. Fixes #27025. Release note (bug fix): decommissioning multiple nodes is now possible without posing a risk to cluster health. Recommissioning a node does no longer require a restart of the target nodes to take effect. Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>

Prior to cockroachdb#28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. Now that we require ranges to be collocated during merges, we didn't need to include the snapshot nor declare the span. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release note: None

40261: storage: remove span declaration for merges r=irfansharif a=irfansharif Prior to #28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. Now that we require ranges to be collocated during merges, we didn't need to include the snapshot nor declare the span. Also informs #32583. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

Prior to cockroachdb#28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. After we required ranges to be collocated during merges, we didn't need to include the snapshot, so now now longer need to declare the span. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release note: None

41036: storage: remove span declaration for merges r=ajwerner a=irfansharif I'm just re-sending #40261 (reverted in #40446). These changes are intended for 20.1 and will not be merged until after branching. --- Prior to #28661, we used to include a snapshot of the RHS in the merge trigger. Due to this, we declared a read only span over the RHS data range. Now that we require ranges to be collocated during merges, we didn't need to include the snapshot nor declare the span. Orthogonal to this are Subsume requests that block out concurrent commands to the RHS, with the appropriate span declarations contained therein. Release justification: N/A Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

benesch requested review from bdarnell, tbg and a team August 15, 2018 20:32

benesch force-pushed the merge-efficiency branch from 329d913 to bb7426a Compare August 15, 2018 20:58

benesch mentioned this pull request Aug 16, 2018

storage: make subsumption atomic with merge commit application #28684

Merged

tbg approved these changes Aug 16, 2018

View reviewed changes

bdarnell approved these changes Aug 16, 2018

View reviewed changes

benesch commented Aug 16, 2018

View reviewed changes

craig bot merged commit bb7426a into cockroachdb:master Aug 16, 2018

benesch deleted the merge-efficiency branch August 17, 2018 01:44

irfansharif mentioned this pull request Aug 29, 2019

storage: remove span declaration for merges #40261

Merged

irfansharif mentioned this pull request Sep 24, 2019

storage: remove span declaration for merges #41036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: don't include RHS data in merge trigger #28661

storage: don't include RHS data in merge trigger #28661

benesch commented Aug 15, 2018

cockroach-teamcity commented Aug 15, 2018

tbg left a comment

bdarnell left a comment

benesch left a comment

craig bot commented Aug 16, 2018

storage: don't include RHS data in merge trigger #28661

storage: don't include RHS data in merge trigger #28661

Conversation

benesch commented Aug 15, 2018

cockroach-teamcity commented Aug 15, 2018

tbg left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

craig bot commented Aug 16, 2018

Build succeeded