Skip to content

Commit

Permalink
kv: refresh less and retry more
Browse files Browse the repository at this point in the history
Before this patch, when the DistSender would split a batch into multiple
sub-batches and one of the sub-batches fails, it would collect responses
for the successful ones and return them together with the error. This
used to be pretty important before we had write idempotency, because it
allowed the span refresher to only retry an EndTxn without also retring
other writes in that batch (which would have failed).

Since we've gotten idempotency in the meantime, we can retry those other
writes. In fact, it's arguably better to do it: there's a tradeoff
between refreshing and retrying. Currently the span refresher needs to
refresh the read spans of the successful sub-batches, which refresh is
at risk of failing under contention.

This patch makes the span refresher retry the whole batch without
considering partial successes. With this patch, refreshing the partial
successes is no longer needed because we'll retry those requests. In
other words, we'll refresh less and retry more.

The existing policy of refreshing more and retrying less will start to be
applied inconsistenly with #44654, where we start refreshing when the
client sees a WriteTooOld flag - but we're forced to refresh the whole
batch.

Besides the rationalizations above, this patch allows us to simplify
code by not having to deal with both responses and errors. We can thus
get rid of the enthralling comment on the client.Sender.Send() stating:
"
// The contract about whether both a response and an error can be
// returned varies between layers.
"

Release note: None
  • Loading branch information
andreimatei committed Feb 3, 2020
1 parent 60d40b8 commit a57b729
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 30 deletions.
2 changes: 0 additions & 2 deletions pkg/internal/client/sender.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,6 @@ const (
// storage.Store, storage.Replica.
type Sender interface {
// Send sends a batch for evaluation.
// The contract about whether both a response and an error can be
// returned varies between layers.
//
// The caller retains ownership of all the memory referenced by the
// BatchRequest; the callee is not allowed to hold on to any parts
Expand Down
14 changes: 5 additions & 9 deletions pkg/kv/dist_sender.go
Original file line number Diff line number Diff line change
Expand Up @@ -640,14 +640,6 @@ func splitBatchAndCheckForRefreshSpans(
//
// When the request spans ranges, it is split by range and a partial
// subset of the batch request is sent to affected ranges in parallel.
//
// Note that on error, this method will return any batch responses for
// successfully processed batch requests. This allows the caller to
// deal with potential retry situations where a batch is split so that
// EndTxn is processed alone, after earlier requests in the batch
// succeeded. Where possible, the caller may be able to update spans
// encountered in the transaction and retry just the EndTxn request to
// avoid client-side serializable txn retries.
func (ds *DistSender) Send(
ctx context.Context, ba roachpb.BatchRequest,
) (*roachpb.BatchResponse, *roachpb.Error) {
Expand Down Expand Up @@ -747,6 +739,10 @@ func (ds *DistSender) Send(
parts = parts[1:]
}

if pErr != nil {
return nil, pErr
}

var reply *roachpb.BatchResponse
if len(rplChunks) > 0 {
reply = rplChunks[0]
Expand All @@ -759,7 +755,7 @@ func (ds *DistSender) Send(
reply.BatchResponse_Header = lastHeader
}

return reply, pErr
return reply, nil
}

type response struct {
Expand Down
22 changes: 3 additions & 19 deletions pkg/kv/txn_interceptor_span_refresher.go
Original file line number Diff line number Diff line change
Expand Up @@ -238,25 +238,9 @@ func (sr *txnSpanRefresher) maybeRetrySend(
return nil, pErr, hlc.Timestamp{}
}

// If a prefix of the batch was executed, collect refresh spans for
// that executed portion, and retry the remainder. The canonical
// case is a batch split between everything up to but not including
// the EndTxn. Requests up to the EndTxn succeed, but the EndTxn
// fails with a retryable error. We want to retry only the EndTxn.
ba.UpdateTxn(retryTxn)
retryBa := ba
if br != nil {
doneBa := ba
doneBa.Requests = ba.Requests[:len(br.Responses)]
log.VEventf(ctx, 2, "collecting refresh spans after partial batch execution of %s", doneBa)
if err := sr.appendRefreshSpans(ctx, doneBa, br); err != nil {
return nil, roachpb.NewError(err), hlc.Timestamp{}
}
retryBa.Requests = ba.Requests[len(br.Responses):]
}

log.VEventf(ctx, 2, "retrying %s at refreshed timestamp %s because of %s",
retryBa, retryTxn.ReadTimestamp, pErr)
ba, retryTxn.ReadTimestamp, pErr)

// Try updating the txn spans so we can retry.
if ok := sr.tryUpdatingTxnSpans(ctx, retryTxn); !ok {
Expand All @@ -267,14 +251,14 @@ func (sr *txnSpanRefresher) maybeRetrySend(
// newBa.Txn.ReadTimestamp to the current timestamp. Submit the
// batch again.
retryBr, retryErr, retryLargestRefreshTS := sr.sendLockedWithRefreshAttempts(
ctx, retryBa, maxRefreshAttempts-1,
ctx, ba, maxRefreshAttempts-1,
)
if retryErr != nil {
log.VEventf(ctx, 2, "retry failed with %s", retryErr)
return nil, retryErr, hlc.Timestamp{}
}

log.VEventf(ctx, 2, "retry successful @%s", retryBa.Txn.WriteTimestamp)
log.VEventf(ctx, 2, "retry successful @%s", ba.Txn.WriteTimestamp)
sr.autoRetryCounter.Inc(1)
retryTxn.ReadTimestamp.Forward(retryLargestRefreshTS)

Expand Down

0 comments on commit a57b729

Please sign in to comment.