Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: integrate Concurrency Manager into Replica request path #45482

Merged
merged 10 commits into from
Mar 3, 2020

Conversation

nvanbenschoten
Copy link
Member

Related to #41720.
Related to #44976.

This commit integrates the new concurrency package into the storage package. Each Replica is given a concurrency manager, which replaces its existing latch manager and txn wait queue. The change also uses the concurrency manager to simplify the role of the intent resolver. The intent resolver no longer directly handles WriteIntentErrors. As a result, we are able to delete the contention queue entirely.

With this change, all requests are now sequenced through the concurrency manager. When sequencing, latches are acquired and conflicting locks are detected. If any locks are found, the requests wait in lock wait-queues for the locks to be resolved. This is a major deviation from how things currently work because today, even with the contention queue, requests end up waiting for conflicting transactions to commit/abort in the txnWaitQueue after at least one RPC. Now, requests wait directly next to the intents/locks that they are waiting on and react directly to the resolution of these intents/locks.

Once requests are sequenced by the concurrency manager, they are theoretically fully isolated from all conflicting requests. However, this is not strictly true today because we have not yet pulled all replicated locks into the concurrency manager's lock table. We will do so in a future change. Until then, the concurrency manager maintains a notion of "intent discovery", which is integrated into the Replica-level concurrency retry loop.

Performance numbers will be published shortly. This will be followed by performance numbers using the SELECT FOR UPDATE locking (#40205) improvements that this change enables.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/concPkgInteg branch 2 times, most recently from 414a4c2 to 53393f1 Compare February 27, 2020 08:05
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- only some minor comments.

Reviewed 12 of 12 files at r1, 12 of 12 files at r2, 3 of 3 files at r3, 40 of 40 files at r4, 30 of 30 files at r5, 1 of 1 files at r6, 3 of 3 files at r7, 1 of 1 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/storage/replica.go, line 1153 at r5 (raw file):

	// Release the latches acquired by the request back to the spanlatch
	// manager. Must be done AFTER the timestamp cache is updated. ec.g is only
	// when the Raft proposal has assumed responsibility for releasing latches.

word missing after "only"?


pkg/storage/replica_read.go, line 112 at r5 (raw file):

	if lResult.WrittenIntents != nil {
		// These will all be unreplicated locks.

are reads that acquire unreplicated locks only in a read-only batch?

And how does this relate to the TODO earlier in this file about releasing latches early. If the read wants to acquire unreplicated locks it can't release latches, yes? So that TODO only applies to non-locking reads?


pkg/storage/replica_send.go, line 134 at r5 (raw file):

// 2. is continuing to execute asynchronously and needs to maintain isolation
//    from conflicting requests throughout the lifetime of its asynchronous
//    processing.

what is an example of this asynchronous processing?


pkg/storage/batcheval/result/result.go, line 35 at r2 (raw file):

	// to asynchronous intent processing on the proposer, so that an attempt to
	// resolve them is made.
	EncounteredIntents []roachpb.Intent

I got mired in plumbing code and couldn't find where these are actually used -- could you point me to it?
And why are we trying to do anything with intents that it did not conflict with -- isn't it wasteful given that those transactions may be ongoing and there is no conflict?


pkg/storage/batcheval/result/result.go, line 39 at r2 (raw file):

	WrittenIntents []roachpb.LockUpdate
	// ResolvedIntents stores any resolved intent spans, either with finalized
	// or pending statuses. Unlike WrittenIntents and EncounteredIntents, values

what exactly does "finalized or pending statuses" mean in this case -- is it referring to the TransactionStatus?


pkg/storage/concurrency/concurrency_control.go, line 318 at r4 (raw file):

	// The maximal set of spans that the request will access. Latches
	// will be acquired for these spans.
	// TODO(nvanbenschoten): don't allocate these SpanSet objects.

how will this "don't allocate" be accomplished?


pkg/storage/concurrency/concurrency_manager.go, line 344 at r7 (raw file):

	// state transition, it is safer to simply clear the lockTable and rebuild
	// it from persistent intent state.
	m.lt.Clear()

The concurrency manager exists only on the leaseholder for the range, yes?
Is this the case where the raft leader is at a different node?


pkg/storage/concurrency/lock_table_waiter.go, line 161 at r1 (raw file):

				// and quickly resolve abandoned intents then we might be able to
				// get rid of this state.
				delay := LockTableLivenessPushDelay.Get(&w.st.SV)

Doesn't the comment about waiting "for each" contradict this change to add a delay?


pkg/storage/concurrency/lock_table_waiter.go, line 245 at r3 (raw file):

		obsTS, ok := h.Txn.GetObservedTimestamp(w.nodeID)
		if !ok {
			// This was set earlier, so it's completely unexpected to

why is !ok not an error anymore?


pkg/storage/concurrency/lock_table_waiter_test.go, line 67 at r1 (raw file):

	ir := &mockIntentResolver{}
	st := cluster.MakeTestingClusterSettings()
	LockTableLivenessPushDelay.Override(&st.SV, 1*time.Millisecond)

is there a reason this is overridden to 1ms and not 0ms in tests?


pkg/storage/intentresolver/intent_resolver.go, line 277 at r5 (raw file):

// MaybePushTransactions tries to push the conflicting transaction(s):
// either moving their timestamp forward on a read/write conflict, abortint

s/abortint/aborting/

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/concPkgInteg branch from 53393f1 to f6761da Compare February 28, 2020 01:34
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)


pkg/storage/replica.go, line 1153 at r5 (raw file):

Previously, sumeerbhola wrote…

word missing after "only"?

Done.


pkg/storage/replica_read.go, line 112 at r5 (raw file):

are reads that acquire unreplicated locks only in a read-only batch?

Yes, although they will acquire write latches to synchronize properly with other SFU reads. This is a little strange, given that we're in executeReadOnlyBatch. I think we need to reframe this function and executeWriteBatch to make it clear that the most fundamental distinction between them is that the latter can propose to Raft and the former cannot.

And how does this relate to the TODO earlier in this file about releasing latches early. If the read wants to acquire unreplicated locks it can't release latches, yes? So that TODO only applies to non-locking reads?

Great point, I added to the comment.


pkg/storage/replica_send.go, line 134 at r5 (raw file):
Added to the comment:

The most prominent example of asynchronous processing is
with requests that have the "async consensus" flag set. A more subtle
case is with requests that are acknowledged by the Raft machinery after
their Raft entry has been committed but before it has been applied to
the replicated state machine. In all of these cases, responsibility
for releasing the concurrency guard is handed to Raft.


pkg/storage/batcheval/result/result.go, line 35 at r2 (raw file):

could you point me to it?

They are passed to the IntentResolver here.

And why are we trying to do anything with intents that it did not conflict with -- isn't it wasteful given that those transactions may be ongoing and there is no conflict?

CleanupIntentsAsync uses a PUSH_TOUCH push RPC, so it simply checks whether the intents transaction is alive or not. If it is, it doesn't wait around for it to finish like a PUSH_ABORT style push does.


pkg/storage/batcheval/result/result.go, line 39 at r2 (raw file):

is it referring to the TransactionStatus?

Yes, and we call committed or aborted statuses "finalized". See TransactionStatus.IsFinalized.


pkg/storage/concurrency/concurrency_control.go, line 318 at r4 (raw file):

Previously, sumeerbhola wrote…

how will this "don't allocate" be accomplished?

I just don't want to store them by reference on this object. I'd rather store them by value.


pkg/storage/concurrency/concurrency_manager.go, line 344 at r7 (raw file):

The concurrency manager exists only on the leaseholder for the range, yes?

Well, it exists on all replicas, but it's drained on all replicas but the leaseholder.

Is this the case where the raft leader is at a different node?

Yes, exactly. It's rare but fundamentally possible for the leaseholder to receive a Raft snapshot.


pkg/storage/concurrency/lock_table_waiter.go, line 161 at r1 (raw file):

Previously, sumeerbhola wrote…

Doesn't the comment about waiting "for each" contradict this change to add a delay?

The "for each" part of this is referring to waiting out the longer deadlock detection delay. I made this more explicit.


pkg/storage/concurrency/lock_table_waiter.go, line 245 at r3 (raw file):

Previously, sumeerbhola wrote…

why is !ok not an error anymore?

It wasn't worth asserting this and breaking a number of tests.


pkg/storage/concurrency/lock_table_waiter_test.go, line 67 at r1 (raw file):

Previously, sumeerbhola wrote…

is there a reason this is overridden to 1ms and not 0ms in tests?

No reason. Done.


pkg/storage/intentresolver/intent_resolver.go, line 277 at r5 (raw file):

Previously, sumeerbhola wrote…

s/abortint/aborting/

Done.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/concPkgInteg branch from f6761da to 7b0c15a Compare February 28, 2020 05:21
@tbg tbg requested a review from sumeerbhola February 28, 2020 16:28
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if the EndTxn is in a batch with a Get request, that
Get request does still want to be isolated.

To be fair, it couldn't (we don't support mixed batches) but it doesn't change your point (use a CPut).

split read and write timestamp when scanning lockTable

Could you spell out the example in detail (including how the txn's read and write ts drift)? That test runs inserts only, right? Am morally right that readtimestamp=writetimestamp+maxoffset in practice, even though the code suggests that transactions may do the snapshot thing of reading at T and writing at a later timestamp?

I'm very excited to see this land. Consider this basically an LGTM except I should look at it a little more after you've addressed feedback. I support merging this quickly so that you don't catch skew.

Reviewed 3 of 12 files at r1, 69 of 69 files at r9, 12 of 12 files at r10, 3 of 3 files at r11, 40 of 40 files at r12, 30 of 30 files at r13, 1 of 1 files at r14, 3 of 3 files at r15, 1 of 1 files at r16, 4 of 4 files at r17.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/storage/replica.go, line 224 at r13 (raw file):

	leaseHistory *leaseHistory

	// concMgr sequences incoming requests and provides isoaltion between

isolation


pkg/storage/replica_raft.go, line 150 at r13 (raw file):

	quotaSize := uint64(proposal.command.Size())
	if maxSize := uint64(MaxCommandSize.Get(&r.store.cfg.Settings.SV)); quotaSize > maxSize {
		return nil, nil, 0, g, roachpb.NewError(errors.Errorf(

Is it important that g is returned in all of these error cases? I sure hope not... definitely the method comment says it shouldn't. If it is necessary (it has to be, why else would you have done this) return a type errWithGuard struct { *roachpb.Error \n *roachpb.Guard } instead of individual values.

See also

// Returns exactly one of a response, an error or re-evaluation reason.
for another rotted comment

Ok, I see now that this is a general pattern:

br, g, pErr = fn(r, ctx, ba, g)
switch t := pErr.GetDetail().(type) {
case nil:
// Success.
return br, nil
case *roachpb.WriteIntentError:
// Drop latches, but retain lock wait-queues.
g.AssertLatches()

I really dislike us returning errors along with nonzero values. Not for this PR, but I strongly encourage you to consider the suggestion above (or any other one that avoids this problem).


pkg/storage/replica_read.go, line 112 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

are reads that acquire unreplicated locks only in a read-only batch?

Yes, although they will acquire write latches to synchronize properly with other SFU reads. This is a little strange, given that we're in executeReadOnlyBatch. I think we need to reframe this function and executeWriteBatch to make it clear that the most fundamental distinction between them is that the latter can propose to Raft and the former cannot.

And how does this relate to the TODO earlier in this file about releasing latches early. If the read wants to acquire unreplicated locks it can't release latches, yes? So that TODO only applies to non-locking reads?

Great point, I added to the comment.

I had always assumed that having split up the read/rw paths as we do today was mostly an artifact of the past, and that we could send reads into the write path. Is this fundamentally not true? You could imagine wanting to lift the restriction that mixed batches aren't allowed inside of KV, without implementing splitting at the Replica level - mixed batches would have to hit the rw path, so that better be the only path.


pkg/storage/replica_send.go, line 139 at r13 (raw file):

//    their Raft entry has been committed but before it has been applied to
//    the replicated state machine. In all of these cases, responsibility
//    for releasing the concurrency guard is handed to Raft.

ditto about returning both a guard and an error


pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r10 (raw file):

						log.Fatalf(ctx, "expected resolve allowance to be exactly 0 resolving %s; got %d", intent.Span, resolveAllowance)
					}
					intent.EndKey = resumeSpan.Key

Does this case have a test?


pkg/storage/batcheval/cmd_resolve_intent_range.go, line 59 at r10 (raw file):

	reply.NumKeys = numKeys
	if resumeSpan != nil {
		update.EndKey = resumeSpan.Key

ditto


pkg/storage/batcheval/result/intent.go, line 41 at r10 (raw file):

// FromWrittenIntents creates a Result communicating that the intents were
// written or re-written by the given request and should be handled.
func FromWrittenIntents(txn *roachpb.Transaction, keys []roachpb.Key) Result {

Why do you have two of these methods and not just a single variadic one? Perf?


pkg/storage/batcheval/result/result.go, line 35 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

could you point me to it?

They are passed to the IntentResolver here.

And why are we trying to do anything with intents that it did not conflict with -- isn't it wasteful given that those transactions may be ongoing and there is no conflict?

CleanupIntentsAsync uses a PUSH_TOUCH push RPC, so it simply checks whether the intents transaction is alive or not. If it is, it doesn't wait around for it to finish like a PUSH_ABORT style push does.

For additional color, encountering but not conflicting with intents basically only (?) happens when doing inconsistent reads. I don't remember where exactly we use those but I think in the past we used them on the meta ranges (do we still?) and this code makes sure intents are cleaned up quickly.


pkg/storage/batcheval/result/result.go, line 40 at r10 (raw file):

	// ResolvedIntents stores any resolved intent spans, either with finalized
	// or pending statuses. Unlike WrittenIntents and EncounteredIntents, values
	// in this slice can represent spans of intents that were resolved.

will, not can, right? If I'm only looking at WrittenIntents and ResolvedIntents I might wonder why a PENDING intent could be in both.


pkg/storage/concurrency/concurrency_control.go, line 276 at r15 (raw file):

	// OnRangeSnapshot informs the concurrency manager that its replica has
	// received a snapshot from another replica in its range.
	OnRangeSnapshot()

nit: why Range and not Replica? Something like OnReplicaAppliedSnapshot()


pkg/storage/concurrency/concurrency_manager.go, line 344 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The concurrency manager exists only on the leaseholder for the range, yes?

Well, it exists on all replicas, but it's drained on all replicas but the leaseholder.

Is this the case where the raft leader is at a different node?

Yes, exactly. It's rare but fundamentally possible for the leaseholder to receive a Raft snapshot.

Mind adding that to the comment here?

How does the "rebuilding" work? We start discovering the replicated locks again as they are being read, and we simply lost all the unreplicated locks. This could go in the comment as well.


pkg/storage/concurrency/lock_table.go, line 32 at r9 (raw file):

// Default upper bound on the number of locks in a lockTable.
const defaultLockTableSize = 10000

What happens above this? Also, shouldn't this be bounding the number of bytes?


pkg/storage/concurrency/lock_table_waiter.go, line 32 at r9 (raw file):

	"kv.lock_table.coordinator_liveness_push_delay",
	"the delay before pushing in order to detect coordinator failures of conflicting transactions",
	10*time.Millisecond,

Isn't this too low? I'd expect it would be ~txn timeout.


pkg/storage/concurrency/lock_table_waiter.go, line 40 at r9 (raw file):

	"kv.lock_table.deadlock_detection_push_delay",
	"the delay before pushing in order to detect dependency cycles between transactions",
	100*time.Millisecond,

Some justification for this number would be good, even if it's just a guess.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 69 files at r9, 6 of 30 files at r13, 4 of 4 files at r17.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @sumeerbhola, and @tbg)


pkg/storage/concurrency/lock_table.go, line 32 at r9 (raw file):

Also, shouldn't this be bounding the number of bytes?

Yes, that is a TODO.
Do we have a general library that does this via reflection, or do we hand-roll the memory accounting for each data-structure? And are there existing examples in the codebase to look at?


pkg/storage/concurrency/lock_table.go, line 849 at r17 (raw file):

		}
		// Locked by some other txn.
		if g.readTS.Less(waitForTs) {

can you add a TODO somewhere that we need to add tests to test when readTS != writeTS?


pkg/storage/concurrency/lock_table.go, line 1121 at r17 (raw file):

	}

	switch sa {

I liked this change -- made it easier to understand.


pkg/storage/concurrency/lock_table_waiter.go, line 245 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It wasn't worth asserting this and breaking a number of tests.

I'm confused -- some tests take the !ok path? Is that risky (wrt tests not finding bugs)?

This commit replaces the strategy for injecting dependencies into the
concurrency and txnwait packages from custom interfaces to custom configuration
structs. This is more consistent with our standard approach for dependency
injection. It also makes the two packages cleaner and easier to test.

While here, the commit pulls out two cluster settings:
- kv.lock_table.coordinator_liveness_push_delay: the delay before pushing in
order to detect coordinator failures of conflicting transactions
- kv.lock_table.deadlock_detection_push_delay: the delay before pushing in order
to detect dependency cycles between transactions

The implication of the first setting is that, by default, even a distinguished
waiter will wait 10ms before pushing a conflicting transaction and potentially
discovering that it is dead. This will slow down recovery from failed
transactions, but this seems well worth it given that we will now frequently
avoid pushing at all in the happy path.
This allows the handling of added, updated, and resolved intents to
become more reactive. It mirrors our handling of UpdatedTxns and their
interaction with the TxnWaitQueue.
This hasn't been used since d41dc72, which was
made on Aug 7, 2016. Our understanding of how to resolve intents without blocking
conflicting transactions has dramatically evolved since then.
Up until now, we had been assuming that we could use the spans that a
batch request declared to determine which conflicting latches and locks
to wait on in the concurrency manager. We then tried to determine at the
batch level which batches should skip a lockTable scan (see shouldWaitOnConflicts).

This turned out to be a fraught effort. We need to make decisions about
whether or not to scan the lockTable on a per-request basis, not a per-batch
basis. This is because the desire or lack of desire for isolation during
evaluation is a trait of each request type and not a trait of its batch.
For an example of this, imagine an EndTxn request with a Split trigger,
which causes the request to declare a latch span over the entire RHS of
its proposed split. The EndTxn request does not want to wait on all locks
on the RHS. However, if the EndTxn is in a batch with a CPut request, that
CPut request does still want to be isolated. It was unclear how to reconcile
these two desires before - should a batch that wants any isolation get full
isolation (shouldWaitOnConflicts == true) or should a batch that requires
less isolation get no isolation (shouldWaitOnConflicts == false)?

Clearly, conflating the spans that a request wants to latch and the spans
that it wants full transactional isolation over is a bad idea. This commit
addresses this by allowing requests to declare which spans to scan the
lockTable with individually.
Related to cockroachdb#41720.
Related to cockroachdb#44976.

This commit integrates the new concurrency package into the storage
package. Each Replica is given a concurrency manager, which replaces
its existing latch manager and txn wait queue. The change also uses
the concurrency manager to simplify the role of the intent resolver.
The intent resolver no longer directly handles WriteIntentErrors. As
a result, we are able to delete the contention queue entirely.

With this change, all requests are now sequenced through the concurrency
manager. When sequencing, latches are acquired and conflicting locks are
detected. If any locks are found, the requests wait in lock wait-queues
for the locks to be resolved. This is a major deviation from how things
currently work because today, even with the contention queue, requests
end up waiting for conflicting transactions to commit/abort in the
txnWaitQueue after at least one RPC. Now, requests wait directly next
to the intents/locks that they are waiting on and react directly to the
resolution of these intents/locks.

Once requests are sequenced by the concurrency manager, they are
theoretically fully isolated from all conflicting requests. However,
this is not strictly true today because we have not yet pulled all
replicated locks into the concurrency manager's lock table. We will
do so in a future change. Until then, the concurrency manager maintains
a notion of "intent discovery", which is integrated into the Replica-level
concurrency retry loop.

Performance numbers will be published shortly. This will be followed
by performance numbers using the SELECT FOR UPDATE locking (cockroachdb#40205)
improvements that this change enables.
From `btree.MakeIter`'s documentation:

> It is not safe to continue using an iterator after modifications are
> made to the tree. If modifications are made, create a new iterator.

This was occasionally causing a nil pointer dereference because the
tree's root was being released back into a memory pool while still
referenced by the iterator.
This commit hooks the concurrency manager against range snapshot events.
The hazard this is protecting against is that a snapshot can cause
discontinuities in raft entry application. The lockTable expects to
observe all lock state transitions on the range through LockManager
listener methods. If there's a chance it missed a state transition, it
is safer to clear the lockTable and let it be rebuilt from persistent
state as intents are discovered.

I have not actually seen this cause problems and this seems like it
would be extremely rare because it's quite hard to create situations
where the leaseholder replica receives a snapshot, but it does seem
possible so it's better to be safe here.
This prevents false detection of stalls in a few cases, like
when receiving on a buffered channel that already has an element
in it.

Before this, the first few iterations of `testutils.SucceedsSoon`
were so quick that `TestConcurrencyManagerBasic/basic` saw cases
where the wrong channel select was detected as a stall point. This
occurred ~1/1,000 times. With this change, the test has stood up
to over 10,000 iterations of stress.
…kTable

This resolves an infinite loop that was possible and somewhat common in
`TestMonotonicInserts`. The loop occurred because read requests were discovering
intents at timestamps above their read timestamp but within their uncertainty
interval. This is essential to avoid stale reads. However, it was confusing the
lockTable. This commit addresses this issue by considering a transaction's
uncertainty interval when scanning the lockTable.

A new assertion was added that would detect this infinite loop condition.
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! After we get to a state where we like this, I'm going to wait to merge this until #45567 is also ready to go in, then I'll merge them together.

Performance numbers will be published shortly.

I benchmarked this with a suite of microbenchmarks and full workload benchmarks. The full workload benchmarks look fine:

tpccbench/nodes=3/cpu=16 before tpccbench/nodes=3/cpu=16 after
1895 1913
1913 1913
1885 1913

The pkg/sql/tests microbenchmarks took a little bit of a hit. Here's the benchdiff output from a few days ago: https://docs.google.com/spreadsheets/d/1W0PFHOw08NXBWgLHyhGc5Zky2oFpOHXCz5ZTV-Qm60o/edit#gid=4. The Bank workload's massive drop in performance is actually a deadlock that was being broken accidentally because we don't launch heartbeat loops for 1PC txns so a txn in the deadlock was expiring. I have a PR to update the kv/contention roachtest to disable this "optimization" so that we never accidentally allow such deadlocks to be resolved - we'd rather them deadlock indefinitely so we can catch the underlying issue, at least in tests. The issue had to do with a CPut forgetting to include WrittenIntents on an error path. The error was being consumed (see #45562) and the CPut was succeeding but the lockTable was not being told about this. This led to a case where a lock was being discovered and added to a lockState that contained the same transaction as a queuedWriter. We weren't handling this case correctly. I fixed the WrittenIntents bug in the second commit and have a PR for that coming for the AddDiscoveredLock issue, which may or may not still be possible to hit.

I also have a series of micro-optimizations around the lockTable and concurrency manager to pool memory. This closes the gap on a lot of the other benchmarks. I'll wait until this PR goes in to continue tuning.

To be fair, it couldn't (we don't support mixed batches) but it doesn't change your point (use a CPut).

Done.

Could you spell out the example in detail (including how the txn's read and write ts drift)? That test runs inserts only, right?

No, the test runs a series of reads and writes in a transaction, all with a fixed commit timestamp: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/tests/monotonic_insert_test.go#L138

Am morally right that readtimestamp=writetimestamp+maxoffset in practice, even though the code suggests that transactions may do the snapshot thing of reading at T and writing at a later timestamp?

In that test, that's exactly correct because the commit timestamp is fixed at the beginning of the transaction due to the use of cluster_logical_timestamp().

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)


pkg/storage/replica.go, line 224 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

isolation

Done.


pkg/storage/replica_raft.go, line 150 at r13 (raw file):
Yeah, I don't love this either, but I had a hard time figuring out how to conditionally pass ownership of the concurrency guard (and the responsibility of freeing it at the right time) without this structure. We want Raft to assume ownership of the guard if the proposal makes it to Raft, but we want to continue holding on to the guard if the proposal doesn't make it to Raft. I played around with a few strategies here with communicating back to the caller of evalAndPropose whether it still had ownership of the guard - such as passing a bool to signify this or trying to correlate the error value with whether the guard still needed to be freed or not. Unless I'm missing something, the errWithGuard idea would run into the same kinds of issues. In the end, I decided that trying to hide ownership of the guard just made things more complex.

I'm very open to improving things, but I think we'll need to more aggressively restructure this logic to make any meaningful improvements. For instance, if we split evaluation and Raft proposal into two separate stages from the perspective of executeWriteBatch then some of the complexity around no-op proposals and evaluation errors and how that maps to guard ownership would go away.

for another rotted comment

Done.


pkg/storage/replica_read.go, line 112 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I had always assumed that having split up the read/rw paths as we do today was mostly an artifact of the past, and that we could send reads into the write path. Is this fundamentally not true? You could imagine wanting to lift the restriction that mixed batches aren't allowed inside of KV, without implementing splitting at the Replica level - mixed batches would have to hit the rw path, so that better be the only path.

It should be true and we're continuing to narrow the gap between these two code paths. I think the largest remaining difference between them is that we use a more efficient LSM iterator on the read path instead of a full write batch. There's also some complexity around which kinds of retries are appropriate for reads vs. writes because writes can arbitrarily increase their timestamps without violating their latches while reads have to be more careful about doing so. Finally, I think we'd want to pull the evaluation/proposal split (currently in evalAndPropose) up a level if we were to do so.

All of these would be good changes though, so I'm in support of such a refactor.


pkg/storage/replica_send.go, line 139 at r13 (raw file):

Previously, tbg (Tobias Grieger) wrote…

ditto about returning both a guard and an error

Let's discuss in the previous comment.


pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Does this case have a test?

No, not yet. It's not actually necessary for correctness, just performance. I'd like to add a lot more testing around batch evaluation and the result.Result that we get out of it, but I'd rather not include that testing in this already very bloated PR.


pkg/storage/batcheval/cmd_resolve_intent_range.go, line 59 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

ditto

Same response as above.


pkg/storage/batcheval/result/intent.go, line 41 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Why do you have two of these methods and not just a single variadic one? Perf?

No reason, escape analysis can prove that the array passed to the variadic function won't escape when given a constant number of args, so using variadics wouldn't force any new heap allocations or anything. Done.


pkg/storage/batcheval/result/result.go, line 35 at r2 (raw file):

do we still?

We scan using a READ_UNCOMMITTED isolation level, which as the same behavior.


pkg/storage/batcheval/result/result.go, line 40 at r10 (raw file):

Previously, tbg (Tobias Grieger) wrote…

will, not can, right? If I'm only looking at WrittenIntents and ResolvedIntents I might wonder why a PENDING intent could be in both.

Done. "Can" was referring to the possibility of them being spans, not of them being resolved. But the wording was confusing.


pkg/storage/concurrency/concurrency_control.go, line 276 at r15 (raw file):

Previously, tbg (Tobias Grieger) wrote…

nit: why Range and not Replica? Something like OnReplicaAppliedSnapshot()

Done.


pkg/storage/concurrency/concurrency_manager.go, line 344 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Mind adding that to the comment here?

How does the "rebuilding" work? We start discovering the replicated locks again as they are being read, and we simply lost all the unreplicated locks. This could go in the comment as well.

Done.


pkg/storage/concurrency/lock_table.go, line 32 at r9 (raw file):

Do we have a general library that does this via reflection, or do we hand-roll the memory accounting for each data-structure?

We typically hand-roll memory accounting for each data structure. git grep 'unsafe.Sizeof' pkg/storage will give you some examples of this approach in our codebase, such as the timestamp cache and the raftentry cache. This isn't too burdensome in practice, but I can't say we're particularly in love with the approach either, so if you've used nicer patterns in the past, this might be a good time to experiment with introducing them to the codebase.


pkg/storage/concurrency/lock_table.go, line 849 at r17 (raw file):

Previously, sumeerbhola wrote…

can you add a TODO somewhere that we need to add tests to test when readTS != writeTS?

Done.


pkg/storage/concurrency/lock_table_waiter.go, line 245 at r3 (raw file):

Previously, sumeerbhola wrote…

I'm confused -- some tests take the !ok path? Is that risky (wrt tests not finding bugs)?

Some tests are below the level where transactions get assigned observed timestamps and so they hit the !ok path. I wouldn't say that's particularly risky, just that it's a bit unrealistic. But this is simply a heuristic to avoid extra transaction retries, so it's not very concerning either.

Still, I'll update this to handle the missing observed timestamp case more gracefully, which also helps drive home the point of the heuristic. Done in new commit.


pkg/storage/concurrency/lock_table_waiter.go, line 32 at r9 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Isn't this too low? I'd expect it would be ~txn timeout.

This is discussed in the comment on case waitForDistinguished. If we had a cache for aborted txn IDs then I'd feel better about bumping this up, but we want to avoid waiting out a long delay on each abandoned intent, so for now, we keep it fairly short.

I also have an upcoming change that explains this even further.

EDIT: actually, I added a commit to explain this here as well.


pkg/storage/concurrency/lock_table_waiter.go, line 40 at r9 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Some justification for this number would be good, even if it's just a guess.

This is what the number is today in the contentionQueue.

Added a commit explaining this and the previous value.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 2, 2020
…odes=4

I noticed when debugging issues in cockroachdb#45482 that unhandled deadlocks occasionally
resolved themselves because txns would eventually time out. This was because
we don't start the txn heartbeat loop for 1PC txns. In this kind of test, we
want any unhandled deadlocks to be as loud as possible, so just like we set
a very long txn expiration, we also enable the txn heartbeat loop for all
txns, even those that we expect will be 1PC.

This commit also drops the kv.lock_table.deadlock_detection_push_delay down
for the test, since it's already touching this code.
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR to update the kv/contention roachtest to disable this "optimization" so that we never accidentally allow such deadlocks to be resolved - we'd rather them deadlock indefinitely so we can catch the underlying issue, at least in tests.

This is #45568.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)

This commit adds commentary around the default values for:
- kv.lock_table.coordinator_liveness_push_delay
- kv.lock_table.deadlock_detection_push_delay
@craig
Copy link
Contributor

craig bot commented Mar 2, 2020

Canceled (will resume)

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 2, 2020
This commit updates the AddDiscoveredLock state transition in lockTableImpl
to release all waiting writers that are part of the same transaction as a
discovered lock.

This caused issues in cockroachdb#45482, where we saw txn deadlocks in the
`pkg/sql/tests.Bank` benchmark. This issue was triggered because we were
forgetting to inform the lockTable of a lock acquisition in a subtle case.
That is now fixed and it's not clear that we can actually hit the bug here
anymore given the current policy on how wait-queues form in the lock-table.
Regardless, this is worth handling correctly in case things ever change.
@craig
Copy link
Contributor

craig bot commented Mar 2, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Mar 3, 2020

Build succeeded

@craig craig bot merged commit cf8cd92 into cockroachdb:master Mar 3, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 3, 2020
…odes=4

I noticed when debugging issues in cockroachdb#45482 that unhandled deadlocks occasionally
resolved themselves because txns would eventually time out. This was because
we don't start the txn heartbeat loop for 1PC txns. In this kind of test, we
want any unhandled deadlocks to be as loud as possible, so just like we set
a very long txn expiration, we also enable the txn heartbeat loop for all
txns, even those that we expect will be 1PC.

This commit also drops the kv.lock_table.deadlock_detection_push_delay down
for the test, since it's already touching this code.
craig bot pushed a commit that referenced this pull request Mar 3, 2020
45551: ui: standardize the display of node names in node lists r=dhartunian a=petermattis

Standardize the display of node names in node lists. The nodes overview
list was displaying nodes as `N<id> <ip-address>` while graphs display
`<ip-address> (n<id>)`. Standardize on the latter format. A similar
problem existed on the statement details page which was displaying
`N<id> <ip-address> (n<id>)`. The node id was being displayed twice!

Release note: None

45567: storage/concurrency: push reservation holders to detect deadlocks r=nvanbenschoten a=nvanbenschoten

This is a partial reversion of #45420.

It turns out that there are cases where a reservation holder is a link in a dependency cycle. This can happen when a reservation holder txn is holding on to one reservation while blocking on a lock on another key. If any txns queued up behind the reservation did not push _someone_, they wouldn't detect the deadlock.

```
     range A      .     range B
                  .
       txnB       .       txnC              txnA
        |         .        |  ^________      |
        v         .        v            \    v
 [lock X: (txnA)] .  [lock Y: (txnB)]  [res Z: (txnC)]
```

It turns out that this segment of the dependency cycle is always local to a single concurrency manager, so it could potentially forward the push through the reservation links to shorten the cycle and prevent non-lock holders from ever being the victim of a deadlock abort. This is tricky though, so for now, we just push.

To address the issue that motivated #45420, we perform this form of push asynchronously while continuing to listen to state transitions in the lockTable. If the pusher is unblocked (see #45420 for an example of when that can happen), it simply cancels the push and proceeds with navigating the lockTable.

This PR also adds a set of datadriven deadlock tests to the concurrency manager test suite: [`testdata/concurrency_manager/deadlocks`](https://github.com/cockroachdb/cockroach/pull/45567/files#diff-5934754d5a8f1086698cdbab628ee1b5). These tests construct deadlocks due to lock ordering and request ordering and demonstrate how the deadlocks are resolved.

45568: roachtest: enable txn heartbeat loops for 1PC txns in kv/contention/nodes=4 r=nvanbenschoten a=nvanbenschoten

I noticed when debugging issues in #45482 that unhandled deadlocks occasionally
resolved themselves because txns would eventually time out. This was because
we don't start the txn heartbeat loop for 1PC txns. In this kind of test, we
want any unhandled deadlocks to be as loud as possible, so just like we set
a very long txn expiration, we also enable the txn heartbeat loop for all
txns, even those that we expect will be 1PC.

This commit also drops the kv.lock_table.deadlock_detection_push_delay down
for the test, since it's already touching this code.

Co-authored-by: Peter Mattis <petermattis@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 3, 2020
Closes cockroachdb#40786.
Closes cockroachdb#44336.

This commit resolves a bug in distributed deadlock detection that would
allow a deadlock between transactions to go undetected, stalling the
workload indefinitely.

The issue materialized as follows:
1. two transactions would deadlock and each enter a txnwait queue
2. they would poll their pushees record along with their own
3. deadlock detection would eventually pick this up and abort one of the txns
   using the pusher's copy of the txn's proto
4. however, the aborted txn has since restarted and bumped it epoch
5. the aborted txn continued to query its record, but failed to ingest any
   updates from it because the record was at a lower epoch than its own
   copy of its txn proto. So it never noticed that it was ABORTED
6. all other txns in the system including the original contending txn
   piled up behind the aborted txn in the contention queue, waiting for
   it to notice it was aborted and exit the queue
7. deadlock!

I'm optimistically closing the two `kv/contention/nodes=4` issues both
because I hope this is the cause of their recent troubles and also because
I've been spending a lot of time with the test recently in light of cockroachdb#45482
and plan to stabilize it fully.

I plan to backport this to release-19.2. This doesn't need to go all the
way back to release-19.1 because this was introduces in aed892a.
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/concPkgInteg branch March 3, 2020 05:55
craig bot pushed a commit that referenced this pull request Mar 3, 2020
45595: sql: make cleanup jobs spawned by alter primary key not cancelable r=spaskob a=rohany

Fixes #45500.

This PR makes the job spawned by ALTER PRIMARY KEY that cleans
up indexes be uncancelable.

This PR additionally fixes a related bug where ALTER PRIMARY KEY
would spawn a job when it didn't actually enqueue any mutations
on the table descriptor, causing future schema changes on the
table to hang.

Release note (sql change): This PR makes the cleanup job spawned
by ALTER PRIMARY KEY in some cases uncancelable.

45603: storage/txnwait: terminate push when pusher aborted at lower epoch r=nvanbenschoten a=nvanbenschoten

Closes #40786.
Closes #44336.

This commit resolves a bug in distributed deadlock detection that would
allow a deadlock between transactions to go undetected, stalling the
workload indefinitely.

The issue materialized as follows:
1. two transactions would deadlock and each enter a txnwait queue
2. they would poll their pushees record along with their own
3. deadlock detection would eventually pick this up and abort one of the txns
   using the pusher's copy of the txn's proto
4. however, the aborted txn has since restarted and bumped it epoch
5. the aborted txn continued to query its record, but failed to ingest any
   updates from it because the record was at a lower epoch than its own
   copy of its txn proto. So it never noticed that it was ABORTED
6. all other txns in the system including the original contending txn
   piled up behind the aborted txn in the contention queue, waiting for
   it to notice it was aborted and exit the queue
7. deadlock!

I'm optimistically closing the two `kv/contention/nodes=4` issues both
because I hope this is the cause of their recent troubles and also because
I've been spending a lot of time with the test recently in light of #45482
and plan to stabilize it fully.

I plan to backport this to release-19.2. This doesn't need to go all the
way back to release-19.1 because this was introduces in aed892a.

Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 3, 2020
This commit updates the AddDiscoveredLock state transition in lockTableImpl
to release all waiting writers that are part of the same transaction as a
discovered lock.

This caused issues in cockroachdb#45482, where we saw txn deadlocks in the
`pkg/sql/tests.Bank` benchmark. This issue was triggered because we were
forgetting to inform the lockTable of a lock acquisition in a subtle case.
That is now fixed and it's not clear that we can actually hit the bug here
anymore given the current policy on how wait-queues form in the lock-table.
Regardless, this is worth handling correctly in case things ever change.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 4, 2020
This commit updates the AddDiscoveredLock state transition in lockTableImpl
to release all waiting writers that are part of the same transaction as a
discovered lock.

This caused issues in cockroachdb#45482, where we saw txn deadlocks in the
`pkg/sql/tests.Bank` benchmark. This issue was triggered because we were
forgetting to inform the lockTable of a lock acquisition in a subtle case.
That is now fixed and it's not clear that we can actually hit the bug here
anymore given the current policy on how wait-queues form in the lock-table.
Regardless, this is worth handling correctly in case things ever change.
craig bot pushed a commit that referenced this pull request Mar 4, 2020
45601: storage/concurrency: release reqs from same txn as discovered lock r=nvanbenschoten a=nvanbenschoten

This commit updates the AddDiscoveredLock state transition in lockTableImpl to release all waiting writers that are part of the same transaction as a discovered lock.

This caused issues in #45482, where we saw txn deadlocks in the `pkg/sql/tests.Bank` benchmark. This issue was triggered because we were forgetting to inform the lockTable of a lock acquisition in a subtle case. That is now fixed and it's not clear that we can actually hit the bug here anymore given the current policy on how wait-queues form in the lock-table. Regardless, this is worth handling correctly in case things ever change.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
koorosh added a commit to koorosh/cockroach that referenced this pull request Mar 6, 2020
commit feb3d45
Author: David Hartunian <davidh@cockroachlabs.com>
Date:   Mon Mar 2 22:53:52 2020 -0500

    sql: add API for accessing statement diagnostics

    Added 3 new RPC endpoints under `_status` for manipulating Statement Diagnostics
    Requests:

    1. `CreateStatementDiagnosticsRequest`
    Creates a new request for a given statement fingerprint

    2. `StatementDiagnosticsRequests`
    Fetches all statement diagnostics requests

    3. `StatementDiagnostics`
    Fetches a specific statement diagnostics request that has been completed and
    includes the JSON payload of the trace

    Basic implementations for these endpoints have been added to the
    `apiReducers.tsx` and `api.ts` files for use by the Admin UI.

    A minor bug where completed requests did not have `completedAt` timestamp set
    was fixed.

commit 9d684ea
Merge: 2e621c7 62c8667
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Wed Mar 4 02:09:24 2020 +0000

    Merge cockroachdb#45575

    45575: cli: reveal SQL error details/hints in CLI commands r=ajwerner a=knz

    Found while investigating cockroachdb#43114. First commit from that PR.

    Previously, when a CLI command that uses SQL under the hood
    encountered an error, only the main error message was displayed.

    This patch changes it to reveal the other additional user-visible
    fields.

    Release note (cli change): Certain CLI command now provide more
    details and/or a hint when they encounter an error.

    Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

commit 2e621c7
Merge: 7b64b0b 3daeced
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Wed Mar 4 00:26:33 2020 +0000

    Merge cockroachdb#45520

    45520: opt: FK checks for Upsert r=RaduBerinde a=RaduBerinde

    These commit add support for opt-driven FK checks for UPSERT and INSERT ON CONFLICT DO UPDATE. This is a simplified and cleaned up reimplementation of cockroachdb#45095.

    There aren't a lot of logictests around FKs and upsert. I plan to try to build a randomized testing facility for FK checks, where we generate randomized mutations and cross-check FK violations or lack there of against a separate instance of the database without FK relationships.

    #### Revert "opt: add first half of upsert FK checks"

    This reverts commit 96447c8.

    We are reconsidering the direction of the upsert FK checks: instead of filtering
    inserted vs updated rows, we can use the return columns to get the "new" values.
    This change also made some things harder to understand: we used to build the FK
    check input using just the columns corresponding to the FK instead of all
    columns.

    Release note: None

    #### opt: refactor optbuilder FK check code

    The optbuilder FK check code has become unnecessarily complicated:
     - there are two ways of creating a WithScan (`makeFKInputScan` and
       `projectOrdinals`) which are similar but take different kinds of information
       as input.
     - there are various slices of column ordinals or column IDs that are threaded
       through long parts of code, making it hard to follow.

    This change cleans this up by creating a fkCheckHelper which contains the
    metadata related to FK table ordinals and is capable of creating a WithScan with
    either the "new" or the "fetched" values.

    The new code should generate the same effective plans; the differences are:
     - we now always create the WithScan before the other table Scan so some column IDs in the plan changed;
     - we no longer have an unnecessary projection for Update (it was used to
       renumber the columns coming out of WithScan).

    Release note: None

    #### opt: add outbound FK checks for upsert

    This is a re-implementation of Justin's change 3d1dd0f, except that we now
    perform the insertion check for all new rows instead of just the inserted rows.
    We do this by utilizing the same column that would be used for a RETURNING
    clause, which in some cases is of the form
    `CASE WHEN canary IS NULL col1 ELSE col2`.

    Release note: None

    #### opt: add inbound FK checks for upsert

    This change adds inbound FK checks for upsert and switches execution over to the
    new style FK checks for upsert.

    Similar to UPDATE, the inbound FK checks run on the set difference between "old"
    values for the FK columns and "new" values.

    Release note (performance improvement): improved execution plans of foreign key
    checks for UPSERT and INSERT ON CONFLICT in some cases (in particular
    multi-region).

    Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

commit 3daeced
Author: Radu Berinde <radu@cockroachlabs.com>
Date:   Tue Mar 3 16:23:50 2020 -0800

    opt: add inbound FK checks for upsert

    This change adds inbound FK checks for upsert and switches execution over to the
    new style FK checks for upsert.

    Similar to UPDATE, the inbound FK checks run on the set difference between "old"
    values for the FK columns and "new" values.

    Release note (performance improvement): improved execution plans of foreign key
    checks for UPSERT and INSERT ON CONFLICT in some cases (in particular
    multi-region).

commit b8ebc3a
Author: Radu Berinde <radu@cockroachlabs.com>
Date:   Tue Mar 3 16:23:50 2020 -0800

    opt: add outbound FK checks for upsert

    This is a re-implementation of Justin's change 3d1dd0f, except that we now
    perform the insertion check for all new rows instead of just the inserted rows.
    We do this by utilizing the same column that would be used for a RETURNING
    clause, which in some cases is of the form
    `CASE WHEN canary IS NULL col1 ELSE col2`.

    Release note: None

commit 75e6009
Author: Radu Berinde <radu@cockroachlabs.com>
Date:   Tue Mar 3 16:23:50 2020 -0800

    opt: refactor optbuilder FK check code

    The optbuilder FK check code has become unnecessarily complicated:
     - there are two ways of creating a WithScan (`makeFKInputScan` and
       `projectOrdinals`) which are similar but take different kinds of information
       as input.
     - there are various slices of column ordinals or column IDs that are threaded
       through long parts of code, making it hard to follow.

    This change cleans this up by creating a fkCheckHelper which contains the
    metadata related to FK table ordinals and is capable of creating a WithScan with
    either the "new" or the "fetched" values.

    The new code should generate the same effective plans; the differences are:
     - we now always create the WithScan before the other table Scan so some column IDs in the plan changed;
     - we no longer have an unnecessary projection for Update (it was used to
       renumber the columns coming out of WithScan).

    Release note: None

commit aa2295d
Author: Radu Berinde <radu@cockroachlabs.com>
Date:   Tue Mar 3 16:23:50 2020 -0800

    Revert "opt: add first half of upsert FK checks"

    This reverts commit 96447c8.

    We are reconsidering the direction of the upsert FK checks: instead of filtering
    inserted vs updated rows, we can use the return columns to get the "new" values.
    This change also made some things harder to understand: we used to build the FK
    check input using just the columns corresponding to the FK instead of all
    columns.

    Release note: None

commit 7b64b0b
Merge: ede97bc 5b1598d
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 23:33:09 2020 +0000

    Merge cockroachdb#45513

    45513: sql: disable primary key changes when a schema change is in progress r=lucy-zhang a=rohany

    Fixes cockroachdb#45362.

    Release note (sql change): This PR disables primary key changes
    when a concurrent schema change is executing on the same table,
    or if a schema change has been started on the same table in
    the current transaction.

    Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>

commit ede97bc
Merge: bbbb26b 7383e9d
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 22:44:11 2020 +0000

    Merge cockroachdb#45597

    45597: opt: extend WithUses and improve NeededMutationCols r=RaduBerinde a=RaduBerinde

    This change extends the WithUses rule prop to keep track of the columns actually
    used by `WithScan`s.

    This set is then used by NeededMutationCols to make sure we never prune mutation
    input columns that are used by FK checks. Currently this never happens, but
    because of fairly subtle reasons: FKs happen to require indexes on both sides,
    and those indexes force the mutation operator to require values for those
    columns. This won't be the case anymore with upsert FK checks, for which this
    fix is required.

    The new information will also be used (in a future change) to prune unused
    columns of `With` itself.

    Release note: None

    Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

commit bbbb26b
Merge: e1e1f2c d80411f
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 21:58:08 2020 +0000

    Merge cockroachdb#45397

    45397: sql: disallow other schema changes during an in progress primary key change r=lucy-zhang a=rohany

    Fixes cockroachdb#45363.

    Release note (sql change): This commit disables the use of schema
    changes when a primary key change is in progress. This includes
    the following:
    * running a primary key change in a transaction, and then starting
      another schema change in the same transaction.
    * starting a primary key change on one connection, and then starting
      a schema change on the same table on another connection while
      the initial primary key change is currently executing.

    Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>

commit e1e1f2c
Merge: 064c4ea f38087b 66d502f 2b7aa46 35095d9
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 20:36:07 2020 +0000

    Merge cockroachdb#45472 cockroachdb#45607 cockroachdb#45609 cockroachdb#45613

    45472: storage/engine: Teeing engine fixes r=itsbilal a=itsbilal

    This change introduces some misc teeing engine fixes, such as proper
    cache initialization, copying of SSTs before ingestions (now that
    pebble also deletes SSTs after an Ingest), and better null msg
    handling in GetProto. The cache part fixes a segfault in
    GetStats.

    Fixes cockroachdb#42654 .

    Release note: None.

    45607: sql: fix usages of ActiveVersionOrEmpty in {create, alter}_table.go r=irfansharif a=rohany

    Fixes cockroachdb#45519.

    This PR removes an incorrect usage of ActiveVersionOrEmpty in
    alter_table.go, and updates a comment in create_table.go detailing its
    usage.

    Release note: None

    45609: sql: support star-expanding label-less tuples + numeric tuple indexing r=RaduBerinde a=knz

    Requested by @RaduBerinde  / @ajwerner .

    Previously, it was impossible to use `(t).*` to expand a single
    tuple-typed column into multiple projections if the tuple was not
    labeled to start with.

    This limitation was caused by another limitation, that it was not
    possible to refer to a single column in a tuple if the tuple was not
    already labeled.

    This patch addresses both limitations.

    Release note (sql change): CockroachDB now supports expanding all
    columns of a tuple using the `.*` notation, for example `SELECT (t).*
    FROM (SELECT (1,'b',2.3) AS t)`. This is a CockroachDB-specific extension.

    Release note (sql change): CockroachDB now supports accessing the Nth
    column in a column with tuple type using the syntax `(...).@N`, for
    example `SELECT (t).@2 FROM (SELECT (1,'b',2.3) AS t)`. This is a
    CockroachDB-specific extension.

    45613: pgwire: fix decimal decoding with trailing zeroes r=jordanlewis a=otan

    Resolves cockroachdb#25460

    In addition to the release note, I have also modified the docker runfile
    to support elixir tests, and also updated the elixir tests to be run.

    Release note (bug fix): In previous PRs, drivers that do not truncate
    trailing zeroes for decimals in the binary format end up having
    inaccuracies of up to 10^4 during the decode step. In this PR, we fix
    the error by truncating the trailing zeroes as appropriate. This fixes
    known incorrect decoding cases with Postgrex in Elixir.

    Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
    Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
    Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
    Co-authored-by: Oliver Tan <otan@cockroachlabs.com>

commit 35095d9
Author: Oliver Tan <otan@cockroachlabs.com>
Date:   Mon Mar 2 15:40:45 2020 -0800

    pgwire: fix decimal decoding with trailing zeroes

    In addition to the release note, I have also modified the docker runfile
    to support elixir tests, and also updated the elixir tests to be run.

    Release note (bug fix): In previous PRs, drivers that do not truncate
    trailing zeroes for decimals in the binary format end up having
    inaccuracies of up to 10^4 during the decode step. In this PR, we fix
    the error by truncating the trailing zeroes as appropriate. This fixes
    known incorrect decoding cases with Postgrex in Elixir.

commit 064c4ea
Merge: ff2b605 6653127
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 19:08:46 2020 +0000

    Merge cockroachdb#45630

    45630: storageccl: rework TestRandomKeyAndTimestampExport to be shorter r=pbardea a=ajwerner

    The test was taking a very long time due to the tiny page sizes. This commit
    changes the test to scale the total number of keys based on the page size.

    Before:

    ```
    --- PASS: TestRandomKeyAndTimestampExport (19.06s)
    ```

    After:
    ```
    --- PASS: TestRandomKeyAndTimestampExport (2.30s)
    ```

    Release note: None

    Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

commit f38087b
Author: Bilal Akhtar <bilal@cockroachlabs.com>
Date:   Wed Feb 26 15:45:56 2020 -0500

    storage/engine: Teeing engine fixes

    This change introduces some misc teeing engine fixes, such as proper
    cache initialization, copying of SSTs before ingestions (now that
    pebble also deletes SSTs after an Ingest), and better null msg
    handling in GetProto. The cache part fixes a segfault in
    GetStats.

    It also unifies file handling between in-memory and on-disk
    teeing engines, by ensuring we write to files in each engine's
    aux directory if we're writing to one engine's aux directory.

    Fixes cockroachdb#42654 .

    Release note: None.

commit 5b1598d
Author: Rohan Yadav <rohany@alumni.cmu.edu>
Date:   Thu Feb 27 12:42:24 2020 -0500

    sql: disable primary key changes when a schema change is in progress

    Fixes cockroachdb#45362.

    Release note (sql change): This PR disables primary key changes
    when a concurrent schema change is executing on the same table,
    or if a schema change has been started on the same table in
    the current transaction.

commit d80411f
Author: Rohan Yadav <rohany@alumni.cmu.edu>
Date:   Tue Feb 25 10:59:17 2020 -0500

    sql: disallow other schema changes an in progress primary key change

    Fixes cockroachdb#45363.

    Release note (sql change): This commit disables the use of schema
    changes when a primary key change is in progress. This includes
    the following:
    * running a primary key change in a transaction, and then starting
      another schema change in the same transaction.
    * starting a primary key change on one connection, and then starting
      a schema change on the same table on another connection while
      the initial primary key change is currently executing.

commit ff2b605
Merge: aeb41dc 9dcab01 cee9355
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 16:59:33 2020 +0000

    Merge cockroachdb#45347 cockroachdb#45502

    45347: sql: make secondary indexes not write empty k/v's + bugfixes for primary key changes r=jordanlewis a=rohany

    Fixes cockroachdb#45223.

    Depends on cockroachdb#45226 to land first.

    This PR fixes many bugs around secondary index encodings
    and CRUD operations k/v reads and writes.

    * Fixes a problem secondary indexes would write empty
      k/v's if it contained a family that had all null values.
    * Fix multiple bugs where updates to a table during an online
      primary key change could result an inconsistent
      new primary key.
    * Fix an assumption in the updater that assumed indexes
      always had the same number of k/v's. The logic has been
      updated to perform a sort of merge operation to decide
      what k/v's to insert/delete during the update operation.
    * Increased testing around secondary indexes k/vs and
      schema change operations.

    The release note is None because these are all bugs
    introduced in 20.1.

    Release note: None

    45502: sql: allow rename database for sequences without a db name reference r=rohany a=otan

    Resolves immediate concern from cockroachdb#45411
    Refs: cockroachdb#34416

    See release note for description. This PR should be included ahead of
    the more "general" fix of changing the DefaultExpr with the new database
    as it unblocks people using
    `experimental_serial_normalization=sql_sequence` from using the database
    rename feature.

    Release note (sql change): Previously, when we renamed a database, any
    table referencing a sequence would be blocked from being able to rename
    the table. This is to block cases where if the table's reference to the
    sequence contains the database name, and the database name changes, we
    have no way of overwriting the table's reference to the sequence in the
    new database. However, if no database name is included in the sequence
    reference, we should continue to allow the database to rename, as is
    implemented with this change.

    Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
    Co-authored-by: Oliver Tan <otan@cockroachlabs.com>

commit 6653127
Author: Andrew Werner <ajwerner@cockroachlabs.com>
Date:   Tue Mar 3 00:33:17 2020 -0500

    storageccl: rework TestRandomKeyAndTimestampExport to be shorter

    The test was taking a very long time due to the tiny page sizes. This commit
    changes the test to scale the total number of keys based on the page size.

    Before:

    ```
    --- PASS: TestRandomKeyAndTimestampExport (19.06s)
    ```

    After:
    ```
    --- PASS: TestRandomKeyAndTimestampExport (2.30s)
    ```

    Release note: None

commit 66d502f
Author: Rohan Yadav <rohany@alumni.cmu.edu>
Date:   Mon Mar 2 18:01:59 2020 -0500

    sql: fix usages of ActiveVersionOrEmpty in {create, alter}_table.go

    Fixes cockroachdb#45519.

    This PR removes an incorrect usage of ActiveVersionOrEmpty in
    alter_table.go, and updates a comment in create_table.go detailing its
    usage.

    Release note: None

commit aeb41dc
Merge: ada086e 3d7aeb3 0a1f251
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 16:10:07 2020 +0000

    Merge cockroachdb#45595 cockroachdb#45603

    45595: sql: make cleanup jobs spawned by alter primary key not cancelable r=spaskob a=rohany

    Fixes cockroachdb#45500.

    This PR makes the job spawned by ALTER PRIMARY KEY that cleans
    up indexes be uncancelable.

    This PR additionally fixes a related bug where ALTER PRIMARY KEY
    would spawn a job when it didn't actually enqueue any mutations
    on the table descriptor, causing future schema changes on the
    table to hang.

    Release note (sql change): This PR makes the cleanup job spawned
    by ALTER PRIMARY KEY in some cases uncancelable.

    45603: storage/txnwait: terminate push when pusher aborted at lower epoch r=nvanbenschoten a=nvanbenschoten

    Closes cockroachdb#40786.
    Closes cockroachdb#44336.

    This commit resolves a bug in distributed deadlock detection that would
    allow a deadlock between transactions to go undetected, stalling the
    workload indefinitely.

    The issue materialized as follows:
    1. two transactions would deadlock and each enter a txnwait queue
    2. they would poll their pushees record along with their own
    3. deadlock detection would eventually pick this up and abort one of the txns
       using the pusher's copy of the txn's proto
    4. however, the aborted txn has since restarted and bumped it epoch
    5. the aborted txn continued to query its record, but failed to ingest any
       updates from it because the record was at a lower epoch than its own
       copy of its txn proto. So it never noticed that it was ABORTED
    6. all other txns in the system including the original contending txn
       piled up behind the aborted txn in the contention queue, waiting for
       it to notice it was aborted and exit the queue
    7. deadlock!

    I'm optimistically closing the two `kv/contention/nodes=4` issues both
    because I hope this is the cause of their recent troubles and also because
    I've been spending a lot of time with the test recently in light of cockroachdb#45482
    and plan to stabilize it fully.

    I plan to backport this to release-19.2. This doesn't need to go all the
    way back to release-19.1 because this was introduces in aed892a.

    Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
    Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

commit ada086e
Merge: b0be21a 80894c3 4caee85 7e0ba7c
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 15:26:58 2020 +0000

    Merge cockroachdb#44130 cockroachdb#45589 cockroachdb#45604

    44130: sql: postgresql dollar-quoted string support r=knz a=damienhollis

    Added support for postgresql dollar-quoted strings in the scanner. A
    dollar-quoted string constant consists of a dollar sign ($), an
    optional "tag" of zero or more characters, another dollar sign, an
    arbitrary sequence of characters that makes up the string content, a
    dollar sign, the same tag that began this dollar quote, and a final
    dollar sign.

    The scanner uses the existing token type of SCONST for dollar-quoted
    strings. As a result, when the AST is formatted as a string, there is
    no knowledge that the original input was dollar-quoted and it is
    therefore formatted as either a plain string or an escaped
    string (depending on the content).

    Fixes cockroachdb#41777.

    Release Note (sql change): CockroachDB now supports string and byte
    array literals using the dollar-quoted notation, as documented here:
    https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

    45589: sql: update error message for primary key change on an interleave parent r=otan a=rohany

    Fixes cockroachdb#45537.

    This PR updates the error message when trying to perform a primary
    key change on an interleaved parent to include the name of the
    table as well as the names of the interleaved children.

    Release note: None

    45604: opt: factor limit hints into scan and lookup join costs r=rytaft a=rytaft

    This PR is a continuation of cockroachdb#43415. The first commit is copied directly
    from that PR, and the second commit includes a minor fix as well as a
    number of test cases.

    Fixes cockroachdb#34811; the example query in this issue now chooses a lookup join
    as desired. The coster now takes limit hints into account when costing
    scans and lookup joins, and propagates limit hints through lookup joins.

    Release note (sql change): The optimizer now considers the likely number
    of rows an operator will need to provide, and might choose query plans
    based on this. In particular, the optimizer might prefer lookup joins
    over alternatives in some situations where all rows of the join will
    probably not be needed.

    Co-authored-by: damien.hollis <damien.hollis@unimarket.com>
    Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
    Co-authored-by: Céline O'Neil <celineloneil@gmail.com>
    Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

commit 7e0ba7c
Author: Rebecca Taft <becca@cockroachlabs.com>
Date:   Mon Mar 2 15:35:36 2020 -0600

    opt: make lookup join limit hint consistent with coster, add tests

    This commit adds a new helper function called lookupJoinInputLimitHint, which
    is called by both the coster and the physical props code for limit hint
    calculation. This helper function ensures that both places take into account
    the batch size of 100, and require that the calculated limit hint is a
    multiple of this batch size.

    This commit also adds more tests related to costing of limit hints for
    scans and lookup joins.

    Release note: None

commit 9dcab01
Author: Rohan Yadav <rohany@alumni.cmu.edu>
Date:   Thu Feb 20 15:37:29 2020 -0500

    sql: make secondary indexes not write empty k/v's + bugfixes for primary
    key changes

    Fixes cockroachdb#45223.

    Depends on cockroachdb#45226 to land first.

    This PR fixes many bugs around secondary index encodings
    and CRUD operations k/v reads and writes.

    * Fixes a problem secondary indexes would write empty
      k/v's if it contained a family that had all null values.
    * Fix multiple bugs where updates to a table during an online
      primary key change could result an inconsistent
      new primary key.
    * Fix an assumption in the updater that assumed indexes
      always had the same number of k/v's. The logic has been
      updated to perform a sort of merge operation to decide
      what k/v's to insert/delete during the update operation.
    * Increased testing around secondary indexes k/vs and
      schema change operations.

    The release note is None because these are all bugs
    introduced in 20.1.

    Release note: None

commit 80894c3
Author: damien.hollis <damien.hollis@unimarket.com>
Date:   Sun Jan 19 21:16:13 2020 +1300

    sql: support postgresql dollar-quoted strings

    Added support for postgresql dollar-quoted strings in the scanner. A
    dollar-quoted string constant consists of a dollar sign ($), an
    optional "tag" of zero or more characters, another dollar sign, an
    arbitrary sequence of characters that makes up the string content, a
    dollar sign, the same tag that began this dollar quote, and a final
    dollar sign.

    The scanner uses the existing token type of SCONST for dollar-quoted
    strings. As a result, when the AST is formatted as a string, there is
    no knowledge that the original input was dollar-quoted and it is
    therefore formatted as either a plain string or an escaped
    string (depending on the content).

    Fixes cockroachdb#41777.

    Release Note (sql change): CockroachDB now supports string and byte
    array literals using the dollar-quoted notation, as documented here:
    https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

commit 3d7aeb3
Author: Rohan Yadav <rohany@alumni.cmu.edu>
Date:   Mon Mar 2 13:46:35 2020 -0500

    sql: make cleanup jobs spawned by alter primary key not cancelable

    Fixes cockroachdb#45500.

    This PR makes the job spawned by ALTER PRIMARY KEY that cleans
    up indexes be uncancelable.

    This PR additionally fixes a related bug where ALTER PRIMARY KEY
    would spawn a job when it didn't actually enqueue any mutations
    on the table descriptor, causing future schema changes on the
    table to hang.

    Release note (sql change): This PR makes the cleanup job spawned
    by ALTER PRIMARY KEY in some cases uncancelable.

commit b0be21a
Merge: 8ffade8 bbac108
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 14:48:15 2020 +0000

    Merge cockroachdb#45621

    45621: sql: allow inverted indexes on mixed-case cols r=jordanlewis a=jordanlewis

    Closes cockroachdb#42944.

    Previously, a bug prevented creation of inverted indexes on columns with
    mixed-case identifiers. Now, the bug is fixed.

    Release note (bug fix): it is now possible to create inverted indexes on
    columns whose names are mixed-case.

    Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

commit 2b7aa46
Author: Raphael 'kena' Poss <knz@thaumogen.net>
Date:   Tue Mar 3 00:38:55 2020 +0100

    sql: support numeric tuple indexing

    Previously, it was impossible to use `(t).*` to expand a single
    tuple-typed column into multiple projections if the tuple was not
    labeled to start with.

    This limitation was caused by another limitation, that it was not
    possible to refer to a single column in a tuple if the tuple was not
    already labeled.

    This patch addresses both limitations.

    Release note (sql change): CockroachDB now supports expanding all
    columns of a tuple using the `.*` notation, for example `SELECT (t).*
    FROM (SELECT (1,'b',2.3) AS t)`. This is a CockroachDB-specific extension.

    Release note (sql change): CockroachDB now supports accessing the Nth
    column in a column with tuple type using the syntax `(...).@N`, for
    example `SELECT (t).@2 FROM (SELECT (1,'b',2.3) AS t)`. This is a
    CockroachDB-specific extension.

commit 8ffade8
Merge: 120d53f 79ca338
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 14:08:12 2020 +0000

    Merge cockroachdb#45139

    45139: intervalccl: optimize OverlapCoveringMerge r=dt a=C0rWin

    Change OverlapCoveringMerge instead of iterating over coverings for each
    input range to flatten coverings, sort results and produce ranges with
    one pass. Reducing total complexity from O(nm) to O(n*log(n)).

    Signed-off-by: Artem Barger <bartem@il.ibm.com>

    Release note: none

    Co-authored-by: Artem Barger <bartem@il.ibm.com>

commit 120d53f
Merge: 0414b69 e2da8bd
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 11:44:30 2020 +0000

    Merge cockroachdb#45587

    45587: log: secondary logger fixes r=ajwerner a=knz

    Needed by / will be exercised by  cockroachdb#45193

    Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

commit 0414b69
Merge: 47259c2 3433fbd
Author: craig[bot] <bors@cockroachlabs.com>
Date:   Tue Mar 3 09:35:26 2020 +0000

    Merge cockroachdb#45549

    45549: changefeedccl: pick up TargetBytes for backfill ScanRequest r=ajwerner a=ajwerner

    Now that cockroachdb#44925 has merged, we can use a byte limit rather than a row limit.

    Release note: None

    Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

commit 0a1f251
Author: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Date:   Mon Mar 2 16:26:24 2020 -0500

    storage/txnwait: terminate push when pusher aborted at lower epoch

    Closes cockroachdb#40786.
    Closes cockroachdb#44336.

    This commit resolves a bug in distributed deadlock detection that would
    allow a deadlock between transactions to go undetected, stalling the
    workload indefinitely.

    The issue materialized as follows:
    1. two transactions would deadlock and each enter a txnwait queue
    2. they would poll their pushees record along with their own
    3. deadlock detection would eventually pick this up and abort one of the txns
       using the pusher's copy of the txn's proto
    4. however, the aborted txn has since restarted and bumped it epoch
    5. the aborted txn continued to query its record, but failed to ingest any
       updates from it because the record was at a lower epoch than its own
       copy of its txn proto. So it never noticed that it was ABORTED
    6. all other txns in the system including the original contending txn
       piled up behind the aborted txn in the contention queue, waiting for
       it to notice it was aborted and exit the queue
    7. deadlock!

    I'm optimistically closing the two `kv/contention/nodes=4` issues both
    because I hope this is the cause of their recent troubles and also because
    I've been spending a lot of time with the test recently in light of cockroachdb#45482
    and plan to stabilize it fully.

    I plan to backport this to release-19.2. This doesn't need to go all the
    way back to release-19.1 because this was introduces in aed892a.

commit bbac108
Author: Jordan Lewis <jordanthelewis@gmail.com>
Date:   Mon Mar 2 23:16:33 2020 -0500

    sql: allow inverted indexes on mixed-case cols

    Previously, a bug prevented creation of inverted indexes on columns with
    mixed-case identifiers. Now, the bug is fixed.

    Release note (bug fix): it is now possible to create inverted indexes on
    columns whose names are mixed-case.

commit cee9355
Author: Oliver Tan <otan@cockroachlabs.com>
Date:   Thu Feb 27 10:11:31 2020 -0800

    sql: allow rename database for sequences without a db name reference

    See release note for description. This PR should be included ahead of
    the more "general" fix of changing the DefaultExpr with the new database
    as it unblocks people using
    `experimental_serial_normalization=sql_sequence` from using the database
    rename feature.

    Release note (sql change): Previously, when we renamed a database, any
    table referencing a sequence would be blocked from being able to rename
    the table. This is to block cases where if the table's reference to the
    sequence contains the database name, and the database name changes, we
    have no way of overwriting the table's reference to the sequence in the
    new database. However, if no database name is included in the sequence
    reference, we should continue to allow the database to rename, as is
    implemented with this change.

commit 7383e9d
Author: Radu Berinde <radu@cockroachlabs.com>
Date:   Mon Mar 2 11:15:12 2020 -0800

    opt: extend WithUses and improve NeededMutationCols

    This change extends the WithUses rule prop to keep track of the columns actually
    used by `WithScan`s.

    This set is then used by NeededMutationCols to make sure we never prune mutation
    input columns that are used by FK checks. Currently this never happens, but
    because of fairly subtle reasons: FKs happen to require indexes on both sides,
    and those indexes force the mutation operator to require values for those
    columns. This won't be the case anymore with upsert FK checks, for which this
    fix is required.

    The new information will also be used (in a future change) to prune unused
    columns of `With` itself.

    Release note: None

commit e2da8bd
Author: Raphael 'kena' Poss <knz@thaumogen.net>
Date:   Mon Mar 2 17:04:52 2020 +0100

    log: ensure that the test log scopes works with secondary loggers

    This patch ensures that the file redirection performed by log.Scope /
    log.ScopeWithoutShowLogs is effective with secondary loggers.

    Release note: None

commit 3433fbd
Author: Andrew Werner <ajwerner@cockroachlabs.com>
Date:   Sat Feb 29 13:39:17 2020 -0500

    changefeedccl: pick up TargetBytes for backfill ScanRequest

    Now that cockroachdb#44925 has merged, we can use a byte limit rather than a row limit.

    Release note: None

commit 589cfa1
Author: Raphael 'kena' Poss <knz@thaumogen.net>
Date:   Mon Mar 2 17:03:18 2020 +0100

    log: ensure secondary loggers get cleaned up

    Previously, if two consecutive tests in a package were using secondary
    loggers, the second test would see the secondary loggers of the first
    test in its logger registry.

    This was problematic because it really made the log files/messages of
    the first test available to the logging file retrieval API.

    This patch cleans this up by de-registering secondary loggers when
    their context is cancelled, which maps to stoppers going down.

    Release note: None

commit 4caee85
Author: Rohan Yadav <rohany@alumni.cmu.edu>
Date:   Mon Mar 2 11:46:04 2020 -0500

    sql: update error message for primary key change on an interleave parent

    Fixes cockroachdb#45537.

    This PR updates the error message when trying to perform a primary
    key change on an interleaved parent to include the name of the
    table as well as the names of the interleaved children.

    Release note: None

commit 9ac5dbc
Author: Céline O'Neil <celineloneil@gmail.com>
Date:   Thu Dec 19 14:02:53 2019 -0500

    opt: factor limit hints into scan and lookup join costs

    Fixes cockroachdb#34811; the example query in this issue now chooses a lookup join
    as desired. The coster now takes limit hints into account when costing
    scans and lookup joins, and propagates limit hints through lookup joins.

    Release note (sql change): The optimizer now considers the likely number
    of rows an operator will need to provide, and might choose query plans
    based on this. In particular, the optimizer might prefer lookup joins
    over alternatives in some situations where all rows of the join will
    probably not be needed.

commit 62c8667
Author: Raphael 'kena' Poss <knz@thaumogen.net>
Date:   Mon Mar 2 14:35:30 2020 +0100

    cli: reveal SQL error details/hints in CLI commands

    Previously, when a CLI command that uses SQL under the hood
    encountered an error, only the main error message was displayed.

    This patch changes it to reveal the other additional user-visible
    fields.

    Release note (cli change): Certain CLI command now provide more
    details and/or a hint when they encounter an error.

commit f978076
Author: Raphael 'kena' Poss <knz@thaumogen.net>
Date:   Wed Dec 11 16:59:31 2019 +0100

    *: allow periods (.) in usernames

    Requested by a user:

    > Currently, there is a restriction for the database username which
    > will limit the certificate-based authentication. It's very common to
    > include .local (e.g.: internal-service2.local) in the CN (Common Name)
    > of a certificate.  The AWS Certificate Manager (ACM) won't even issue
    > a certificate if the "dot" (.) is not present.

    Release note (sql change): Usernames can now contain periods, for
    compatibility with certificate managers that require domain names to
    be used as usernames.

    Release note (security update): The validation rule for principal
    names was extended to support periods and thus allow domainname-like
    principals. For reference, the validation rule is currently the regular
    expression `^[\p{Ll}0-9_][---\p{Ll}0-9_.]+$` and a size limit of 63
    UTF-8 bytes (larger usernames are rejected with an error); for
    comparison, PostgreSQL allows many more characters and truncates at 63
    chacters silently.

commit 79ca338
Author: Artem Barger <bartem@il.ibm.com>
Date:   Sun Feb 16 17:28:58 2020 +0200

    intervalccl: optimize OverlapCoveringMerge

    Change OverlapCoveringMerge instead of iterating over coverings for each
    input range to flatten coverings, sort results and produce ranges with
    one pass. Reducing total complexity from O(nm) to O(n*log(n)).

    Signed-off-by: Artem Barger <bartem@il.ibm.com>

    Release note: none

Release note (<category, see below>): <what> <show> <why>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants