Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Move range bounds and GC threshold check out of evaluateBatch #14833

Merged
merged 1 commit into from
Apr 13, 2017

Conversation

bdarnell
Copy link
Contributor

@bdarnell bdarnell commented Apr 12, 2017

Performing this check at evaluation time (and only at evaluation time)
required that we declare read access to the descriptor and GC
threshold for every command (leading to stalls when splits,
rebalances, and GCs had to get a write lock on the same keys).

Instead, with this commit, we do this check once at evaluation
time (because some of the evaluation functions don't fail gracefully
if they are run in completely invalid circumstances) and again when
applying the command downstream of raft. By repeating the check, we no
longer need to hold a read lock for the entire duration of the
command.

Note that even with this change, GC requests generated by the gcQueue
will still block everything else because the command is sent with a
very large span, but this can be fixed without backwards-compatibility
problems. All EndTransaction requests need to read the range
descriptor, which means that a rebalance operation will block all
writing transactions (splits also block writing transactions, but note
that a split necessarily blocks all writes to the RHS of the range).

This is a backwards-compatible change for clusters without propEvalKV:
these checks were already happening downstream of raft so there has
been no real change. It is backwards-incompatible for any clusters
running with propEvalKV and they will need to be wiped.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

I think we want to do this. A quick hack shows that not adding those two read-only keys improves performance of simple read-only commands by ~20%.

@bdarnell
Copy link
Contributor Author

OK, I've updated this and it's ready for review. We'll still get that ~20% improvement for reads, but unfortunately it doesn't help as much for transactional writes (which includes all SQL writes): all EndTransaction calls need to depend on the range descriptor to determine which intents are within the range (there are some things we could do to improve this, but they'll be post-1.0. with propEvalKV, we should be able to make the necessary changes in a backwards-compatible way).

@bdarnell bdarnell changed the title DO NOT MERGE: Move range bounds and GC threshold check below raft storage: Move range bounds and GC threshold check out of evaluateBatch Apr 12, 2017
@petermattis
Copy link
Collaborator

Review status: 0 of 8 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2529 at r1 (raw file):

	if err := r.requestCanProceed(rSpan, ba.Timestamp); err != nil {
		return nil, nil, err
	}

Should this check be moved about r.raftMu.Lock()?


pkg/storage/replica_command.go, line 550 at r1 (raw file):

	// All transactions depend on the range descriptor because they need
	// to determine which intents are within the local range.
	spans.Add(SpanReadOnly, roachpb.Span{Key: keys.RangeDescriptorKey(desc.StartKey)})

It's unfortunate that this applies to 1PC transactions as well.

I'm wondering if we could include the range span in the proposal and check that down stream of Raft, failing commands for which the range span has changed. I haven't thought through this at all, so pardon the drive-by comment.


pkg/storage/replica_proposal.go, line 545 at r1 (raw file):

	// The above are always present, so we assert only if there are
	// "nontrivial" actions below.
	shouldAssert = !reflect.DeepEqual(rResult, storagebase.ReplicatedEvalResult{})

That's unfortunate. I think @tschottdorf had worked hard to avoid using reflect.DeepEqual here. I don't have a good suggestion for an alternative, though.


Comments from Reviewable

@spencerkimball
Copy link
Member

Review status: 0 of 8 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/storage/replica_proposal.go, line 545 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

That's unfortunate. I think @tschottdorf had worked hard to avoid using reflect.DeepEqual here. I don't have a good suggestion for an alternative, though.

One option is to not enable this check (and others like it) by default, but only in test clusters.


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

Review status: 0 of 8 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2529 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should this check be moved about r.raftMu.Lock()?

It was happening under r.raftMu before, and it's quick enough that I'd rather not risk additional change here.


pkg/storage/replica_command.go, line 550 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It's unfortunate that this applies to 1PC transactions as well.

I'm wondering if we could include the range span in the proposal and check that down stream of Raft, failing commands for which the range span has changed. I haven't thought through this at all, so pardon the drive-by comment.

Yeah, we could do something like that, but it's a tricky thing to be introducing at this stage. We'd need to introduce new error codes that would be retried at the appropriate level, etc. We'll have more flexibility to handle this later, when propEvalKV is the only way (2PC EndTransactions are always sent alone so we can change the ReplicatedEvalResult RSpan to reflect the intents we resolved. Error retries will still be slightly tricky, but should be manageable).


pkg/storage/replica_proposal.go, line 545 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

One option is to not enable this check (and others like it) by default, but only in test clusters.

Yeah, if this turns out to be a problem we could turn off the check for production use. But it doesn't seem likely to be a significant cost to me. (Looking back at #10327 there was some discussion about the convenience of == but not much about performance)


Comments from Reviewable

@petermattis
Copy link
Collaborator

petermattis commented Apr 13, 2017 via email

Performing this check at evaluation time (and only at evaluation time)
required that we declare read access to the descriptor and GC
threshold for every command (leading to stalls when splits,
rebalances, and GCs had to get a write lock on the same keys).

Instead, with this commit, we do this check once at evaluation
time (because some of the evaluation functions don't fail gracefully
if they are run in completely invalid circumstances) and again when
applying the command downstream of raft. By repeating the check, we no
longer need to hold a read lock for the entire duration of the
command.

Note that even with this change, GC requests generated by the gcQueue
will still block everything else because the command is sent with a
very large span, but this can be fixed without backwards-compatibility
problems. All EndTransaction requests need to read the range
descriptor, which means that a rebalance operation will block all
writing transactions (splits also block writing transactions, but note
that a split necessarily blocks all writes to the RHS of the range).

This is a backwards-compatible change for clusters without propEvalKV:
these checks were already happening downstream of raft so there has
been no real change. It is backwards-incompatible for any clusters
running with propEvalKV and they will need to be wiped.
@bdarnell bdarnell merged commit bc79f9e into cockroachdb:master Apr 13, 2017
@bdarnell bdarnell deleted the check-below-raft branch April 13, 2017 03:53
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Apr 17, 2017
This was meant to go in as a part of cockroachdb#14833.
De-flakes TestUnsplittableRange.

Fixes cockroachdb#14881
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Apr 17, 2017
This was meant to go in as a part of cockroachdb#14833.
De-flakes TestUnsplittableRange.

Fixes cockroachdb#14881
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants