[DNM] RFCS: draft of transactional schema changes #8

ajwerner · 2020-09-15T17:20:13Z

Many details left to be sussed out.

TODOs:

Add commentary about the mixed version state.
- We'll need to certainly preserve roughly all of the existing behavior for this to still work
Provide proposals for less invasive default behavior that minimizes risks of restarts, especially without pessimistic mode.

Release note: None

ajwerner · 2020-09-15T17:22:49Z

This is a space for note taking on TODOs on the RFC prior to publishing it for more public consumption.

thoszhang

Just a quick first pass.

thoszhang · 2020-09-17T04:51:07Z

docs/RFCS/20200909_transactiona_schema_changes.md

+1. When a user transaction commits, its DMLs all commit but its DDLs are 
+merely queued. It's possible that the async parts of its DDLs will fail and


This is a minor point, but there are some DDL statements (e.g., renaming a table column) that are committed in the transaction. One relatively small aspect of our problems is that it's probably not obvious to users which DDLs exhibit the non-transactional behavior and which ones don't, and which DDLs do or don't get rolled back if there are multiple within a transaction and one of the async ones fails.

Added to the first issue.

thoszhang · 2020-09-17T04:57:14Z

docs/RFCS/20200909_transactiona_schema_changes.md

+
+# Motivation
+
+The lack of transactional schema changes boils down to two fundamental problems:


I think the effects of constraints being added on other writes is also worth mentioning here (either as a third distinct problem or a sub-problem of the first).

Added to the first

thoszhang · 2020-09-17T05:05:20Z

docs/RFCS/20200909_transactiona_schema_changes.md

+the physical layout has not changed but the logical layout has. When version 3 
+(`DELETE_ONLY`) comes into effect, the physical layout may differ from
+version 1. It's critical that by the time the physical layout begins to change
+we know that no transactions are in version 1.


Might be good to explicitly define "physical layout" and "logical layout" up front.

(Roughly speaking, physical layout refers to table data being stored in KV; logical layout refers to metadata indicating how to read/write that data to KV. The idea here is that we can't update the physical layout until the logical layout catches up.)

Added a sentence

thoszhang · 2020-09-17T06:02:27Z

docs/RFCS/20200909_transactiona_schema_changes.md

+and populates its target by synthesizing SSTs using the `bulkAdder`. All rows
+written by the index backfill have a timestamp that precedes any write due to
+transactional traffic, thus preserving subsequent writes as they are.


I don't think this is quite true? IIUC we just pick Clock.Now() as the read and write timestamp and basically overwrite our existing writes. There's also a step where we do a KV scan at that timestamp over the index span to resolve intents and populate the timestamp cache to avoid writing under the SSTs. That's about as much as I know.

Right, I just learned about this this week (and it turned out to be sort of buggy cockroachdb#54891).

thoszhang · 2020-09-17T06:35:51Z

docs/RFCS/20200909_transactiona_schema_changes.md

+   * This transaction will create a job that then builds the relevant indexes
+     but does not make them public.
+   * If the job fails, the statement returns an error 
+      * Rollback should happen under the job too. We’ll wait for terminal.
+   * If it succeeds the user transaction may proceed.


What I'm getting here is that the job (1) only builds the index, unlike today's schema change jobs; and (2) is essentially a black box from the perspective of the schema change state machine, in the sense that we'll just wait for the job to reach a terminal state before proceeding, sort of like today's schema change jobs in the connExecutor. Is that right?

yes, in fact it's not clear whether job is the right term.

thoszhang · 2020-09-17T06:39:06Z

docs/RFCS/20200909_transactiona_schema_changes.md

+      * Ensure that the user transaction is able to drop the leases to the old
+        table descriptor so that we don’t prevent version bumps.


What does this mean? What exactly happens if we already did some writes in the transaction with an older leased version of the table?

it's fine if we did some writes with an older leased version of the table because we're going to backfill those writes into new indexes and the physical layout of the table can't change no matter how many versions we bump it through. I added a comment here.

thoszhang · 2020-09-17T06:50:56Z

docs/RFCS/20200909_transactiona_schema_changes.md

+            * This is aided by the fact that we’re not re-writing rows.
+            * The pessimistic mode interactions here are important
+                * The backfiller should be able to read ignoring the parent’s locks.
+            * What does it mean to re-issue previous writes to tables with new indexes?


Probably want to clarify eventually that this refers to writes in the same transaction, as distinct from intents from other writing transactions.

thoszhang · 2020-09-17T08:02:16Z

docs/RFCS/20200909_transactiona_schema_changes.md

+  schema changes.
+* Should jobs be used in any way for dealing with backfills and validations
+  underneath a user transaction?
+    * What are the pros? Lots of cons.


I think the pros are basically just that we get all the jobs observability "for free." But I guess we could hook backfills into the query progress somehow. It's unclear whether being able to pause these jobs is a good idea, but I think probably not.

jordanlewis · 2020-09-17T14:34:47Z

docs/RFCS/20200909_transactiona_schema_changes.md

+
+### Child Transactions
+
+The primary new KV concept required by this proposal is the child transaction.


Is KV already planning to do this? Are we planning to?

vy-ton · 2020-09-20T20:54:35Z

docs/RFCS/20200909_transactiona_schema_changes.md

+
+When the above transaction commits, `foo` is in version 2 and has pending 
+mutations to drop the columns c1 and c2. One thing to note is that transactions 
+which happen after the above transaction may still observe those columns due to 


This would be something that would be resolved with the proposed future work, right?

Are you referring to "One thing to note is that transactions which happen after the above transaction may still observe those columns"? It's true that with the proposed work, we'd no longer commit the transaction at this point; the transaction would only be committed only when the entire column schema change is done. However, it's still true in general that because we lease/cache table descriptors, all descriptor changes still take place asynchronously across the cluster. We provide a guarantee that we don't return results to the client until the change has taken effect on all nodes. We can elaborate more on this offline. I'm not sure if I've answered your question.

vy-ton · 2020-09-20T20:57:54Z

docs/RFCS/20200909_transactiona_schema_changes.md

+The upshot of this design is that we can achieve compatibility with postgres
+and meet user expectations for what is possible within a SQL transaction. This
+proposal is not, however, without its risks. In order to regain
+transactionality, in some cases, we'll need to actually perform the work of


Why is this not in all cases? Won't we always need to perform the schema change work within the user's transaction?

Andrew may be referring to something else, but here's one way to think of it: Currently all schema change work is performed in the user transaction and in a set of transactions after the user transaction commits. (A few simple schema changes like RENAME COLUMN are performed entirely in the user transaction.) The present proposal is to perform all the work in the user transaction, and (if needed) in a set of child transactions that commit before the user transaction commits.

vy-ton · 2020-09-20T21:12:53Z

docs/RFCS/20200909_transactiona_schema_changes.md

+Once the side schema changes and has evidence that all leases are up-to-date,
+the statement execution can continue. At this point, a new child transaction can
+write and commit a descriptor of the table descriptor which has the column in
+`WRITE_ONLY` with a pending mutation to perform a backfill that it


What does WRITE_ONLY status indicate to other transactions?

For indexes, it means that transactions will keep the index updated on writes, but not use the index for reads. For columns it's a bit more complicated, but essentially it's the same: we write to the column as though it were a normal column (with computed values, defaults, etc.) but it's not available for reads by the user.

vy-ton · 2020-09-20T21:14:17Z

docs/RFCS/20200909_transactiona_schema_changes.md

+
+## Non-atomic DDLs ([#42061])
+
+Many schema changes require `O(rows)` work. Generally we've wanted to avoid


What is 0(rows) work?

It means the work required is proportional to the number of rows in the table (e.g., for an index backfill, the more rows, the longer the backfill runs)

vy-ton · 2020-09-20T21:16:42Z

docs/RFCS/20200909_transactiona_schema_changes.md

+This RFC utilizes the language of
+[20151014_online_schema_change.md](./20151014_online_schema_change.md). 
+
+ * Child Transaction - a new concept in this RFC. A Child transaction is a 


This is different that nested transactions with savepoints?

Right, this is a separate internal concept.

vy-ton · 2020-09-20T21:42:01Z

docs/RFCS/20200909_transactiona_schema_changes.md

+      * Why is this backfill different from all other backfills?
+          * It needs to behave differently in the face of intents due to the user
+            transaction.
+          * What should it do when it encounters an intent from the user transaction?


Intents are the previous statements within the same transaction?

Yeah, these are write intents from writes we did earlier

vy-ton · 2020-09-20T21:58:45Z

docs/RFCS/20200909_transactiona_schema_changes.md

+  descriptor with the constraint as `PUBLIC` using a synthetic descriptor
+  (but don't write descriptors until the end).
+* For concurrent transactions which see the constraint in `ADDING`
+   * Evaluate the constraint. If failed:


If a concurrent txn inserts new rows AND the constraint still passes, the index backfill for the ALTER TABLE would ignore these additional rows?

Are you asking about a case where we're adding an index and later adding a constraint in the same transaction? At this point the index backfill has already finished, and the index would be in the write-only state, so we'd just write to that index normally if the constraint isn't violated. Does that answer your question?

vy-ton · 2020-09-20T22:00:22Z

docs/RFCS/20200909_transactiona_schema_changes.md

+         * Straw man: write to the table descriptor marking the constraint as
+           having failed.
+            * One problem I see is that the side-key gave us some assumption that there were not concurrent modifications from user transactions to the table descriptor
+         * Straw man 2: write to some other key that the user transaction making the constraint     public will need to read prior to committing. 


Wouldn't all concurrent transactions need to read before committing to check for any conflicts with newly committed constraints?

We'd be getting this information off the leased/cached copy of the descriptor (as usual), so it wouldn't be a KV read. I think we'd refresh the leased version if we detected a possible violation.

vy-ton · 2020-09-20T22:05:24Z

docs/RFCS/20200909_transactiona_schema_changes.md

+How do we expect users to use this ASYNC behavior?
+
+  * This isn’t for the tools, this is for today’s users of schema changes who
+  want their schema changes to happen if their connection disconnects.


Is this the same behavior as bulk io operations?

Yeah, basically - we'd probably want to use jobs for this

vy-ton · 2020-09-20T22:14:24Z

docs/RFCS/20200909_transactiona_schema_changes.md

+
+* What should the semantics be when adding a column with a default value and using the `ASYNC` keywords?
+    * Seems like ASYNC operations should not take effect at all.
+* How do we deal with DMLs and DDLs interleaved within a transaction to a table.


Why is this different than the interleaving that is happening your examples above?

thoszhang

Taking a first pass at responding to some of @vy-ton's comments.

thoszhang · 2020-09-21T13:16:08Z

docs/RFCS/20200909_transactiona_schema_changes.md

+
+## Non-atomic DDLs ([#42061])
+
+Many schema changes require `O(rows)` work. Generally we've wanted to avoid


It means the work required is proportional to the number of rows in the table (e.g., for an index backfill, the more rows, the longer the backfill runs)

thoszhang · 2020-09-21T13:26:30Z

docs/RFCS/20200909_transactiona_schema_changes.md

+
+When the above transaction commits, `foo` is in version 2 and has pending 
+mutations to drop the columns c1 and c2. One thing to note is that transactions 
+which happen after the above transaction may still observe those columns due to 


Are you referring to "One thing to note is that transactions which happen after the above transaction may still observe those columns"? It's true that with the proposed work, we'd no longer commit the transaction at this point; the transaction would only be committed only when the entire column schema change is done. However, it's still true in general that because we lease/cache table descriptors, all descriptor changes still take place asynchronously across the cluster. We provide a guarantee that we don't return results to the client until the change has taken effect on all nodes. We can elaborate more on this offline. I'm not sure if I've answered your question.

thoszhang · 2020-09-21T13:28:33Z

docs/RFCS/20200909_transactiona_schema_changes.md

+the physical layout has not changed but the logical layout has. When version 3 
+(`DELETE_ONLY`) comes into effect, the physical layout may differ from
+version 1. It's critical that by the time the physical layout begins to change
+we know that no transactions are in version 1.


(Roughly speaking, physical layout refers to table data being stored in KV; logical layout refers to metadata indicating how to read/write that data to KV. The idea here is that we can't update the physical layout until the logical layout catches up.)

thoszhang · 2020-09-21T13:33:46Z

docs/RFCS/20200909_transactiona_schema_changes.md

+The upshot of this design is that we can achieve compatibility with postgres
+and meet user expectations for what is possible within a SQL transaction. This
+proposal is not, however, without its risks. In order to regain
+transactionality, in some cases, we'll need to actually perform the work of


Andrew may be referring to something else, but here's one way to think of it: Currently all schema change work is performed in the user transaction and in a set of transactions after the user transaction commits. (A few simple schema changes like RENAME COLUMN are performed entirely in the user transaction.) The present proposal is to perform all the work in the user transaction, and (if needed) in a set of child transactions that commit before the user transaction commits.

thoszhang · 2020-09-21T13:36:15Z

docs/RFCS/20200909_transactiona_schema_changes.md

+Once the side schema changes and has evidence that all leases are up-to-date,
+the statement execution can continue. At this point, a new child transaction can
+write and commit a descriptor of the table descriptor which has the column in
+`WRITE_ONLY` with a pending mutation to perform a backfill that it


For indexes, it means that transactions will keep the index updated on writes, but not use the index for reads. For columns it's a bit more complicated, but essentially it's the same: we write to the column as though it were a normal column (with computed values, defaults, etc.) but it's not available for reads by the user.

thoszhang · 2020-09-21T13:36:39Z

docs/RFCS/20200909_transactiona_schema_changes.md

+This RFC utilizes the language of
+[20151014_online_schema_change.md](./20151014_online_schema_change.md). 
+
+ * Child Transaction - a new concept in this RFC. A Child transaction is a 


Right, this is a separate internal concept.

thoszhang · 2020-09-21T13:37:11Z

docs/RFCS/20200909_transactiona_schema_changes.md

+    instead we use a separate key to coordinate changes
+* Switch to pessimistic mode so that refreshes succeed (optional, see later).
+* Start a child transaction to add the column
+   * What if we run into a failed, previous schema change?


This is if we started a schema change earlier and it never got properly cleaned up for some reason.

thoszhang · 2020-09-21T13:39:19Z

docs/RFCS/20200909_transactiona_schema_changes.md

+      * Created in child txn which writes column as delete only.
+      * First step is make column write-only and wait for that before beginning
+        backfill. This also adds the indexes which will be backfilled by the job.
+      * Backfill all new relevant indexes.


The backfill operation is mostly non-transactional (it involves just doing a bunch of bulk writes to KV) but it's coordinated from child transactions, not the user (parent) transaction.

thoszhang · 2020-09-21T13:39:46Z

docs/RFCS/20200909_transactiona_schema_changes.md

+      * Why is this backfill different from all other backfills?
+          * It needs to behave differently in the face of intents due to the user
+            transaction.
+          * What should it do when it encounters an intent from the user transaction?


Yeah, these are write intents from writes we did earlier

56914: sqlsmith: disallow `crdb_internal.unsafe_*` metadata mutation builtins r=lucy-zhang a=lucy-zhang Closes cockroachdb#56890. Release note: None Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>

This patch removes the KeepRedactable option from the Logs and LogFile RPCs. The option existed just for backwards compatibility with pre-20.2 clients. Even for such old clients, I don't think there's real consequences to removing this option. This patch retains the behvior of the option being set. The option, when coupled with the Redact=false option, made the RPCs keep the log entries exactly as they are in the log file - i.e. redactable, with sensistive data markers. With this patch, the logs returned by the RPCs are either redacted, or redactable. "Flattening" them (i.e. not redacting, but stripping the markers) is no longer an option. The Logs RPC is used by the AdminUI, which doesn't use either Redact or KeepRedactable - so, with this patch, the UI will show logs with markers in them. I think that's OK. The LogFile RPC is used by the cli getting a debug zip. Since 20.2, the client is setting KeepRedactable, and so nothing changes. Older clients will get log entries with markers in them. This change is motivated by the fact that I'd like to replace the LogFile RPC with a more efficient version that returns a raw file, not a proto-encoded list of log entries. In fact we already have that RPC in the form of GetFiles, which we may be able to use. This KeepRedactable option was in my way. Release note: None

56458: kv: immediately propagate WriteTooOld error from DeleteRange requests r=nvanbenschoten a=nvanbenschoten Fixes cockroachdb#56111. This commit fixes a bug that allowed non-serializable state to leak from the return value of a DeleteRange request. DeleteRange requests were not returning WriteTooOld errors on conflicting tombstones at higher timestamps, so they were allowing conflicting DELETEs to influence their returned keys. This, in turn, was permitting cases where a DELETE's rows updated return value would disagree with previously observed state in a transaction. Worse than this, I think there might have even been a bug with deferring the WriteTooOld status from a DeleteRange request, because unlike most MVCC operations, MVCCDeleteRange does not complete before returning a WriteTooOldError. I think this means that it could have partially succeeded and then returned a WriteTooOldError. If the write too old status was deferred, this would result in the DeleteRange request failing to delete some keys. This is why "blind-write" requests must complete entirely even if they intend to return a WriteTooOldError (see `mvccPutInternal`). We don't particularly like this behavior, but it's been around for years. The fix for this issue is twofold. First, this commit fixes MVCCDeleteRange. It uses the relatively new `FailOnMoreRecent` flag in its initial scan to ensure that a WriteTooOld error is returned even in the presence of tombstones at equal or higher timestamps. This has the side-effect of letting us get rid of some strange code added in ee80e3d which scanned at MaxTimestamp. Second, this commit fixes DeleteRange request by giving it the `isRead` flag. This ensures that the request is never considered a blind-write and therefore never defers a "write too old" status. I intend to backport this to release-20.1 and release-20.2. Release notes (bug fix): DELETE statements no longer have a chance of returning an incorrect number of deleted rows in transactions that will eventually need to restart due to contention. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

The cgroup information is constructed by looking at `/proc/self/cgroup`, matching with the mount point in `/proc/self/mountinfo` and then inpsecting the `memory.stat` file. The matching between the cgroup and the mount point was working only in case that the mount and the cgroup are exact match relative to cgroup NS. This is however inadequate as while true for the docker case, it isn't true in the general case. In this example determining the cgroup memory limit isn't possible: ``` cgcreate -t $USER:$USER -a $USER:$USER -g memory:crdb_test echo 3999997952 > /sys/fs/cgroup/memory/crdb_test/memory.limit_in_bytes cgexec -g memory:crdb_test ./cockroach start-single-nod ``` To address this, this patch instead constructs the expected relative path of the cgroup's memory info by starting with the mount path and then adding the cgroup. Release note: None

56835: opt: constrain multi-col inverted index prefix cols by contiguous range r=mgartner a=mgartner This commit allows for the prefix columns of multi-column inverted indexes to be constrained to a contiguous range. For example, consider the schema and query: CREATE TABLE t ( k INT PRIMARY KEY, i INT, g GEOMETRY, INVERTED INDEX (i, g) ) SELECT k FROM t WHERE i IN (1, 2, 3) AND ST_CoveredBy(..., g) Prior to this commit, a scan over the inverted index would not be generated for this query. Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>

Release note (sql change): Introduced stubs for ALTER DATABASE ... PRIMARY REGION and CREATE TABLE ... PRIMARY REGION

56878: server: remove KeepRedactable logs rpc option r=andreimatei a=andreimatei This patch removes the KeepRedactable option from the Logs and LogFile RPCs. The option existed just for backwards compatibility with pre-20.2 clients. Even for such old clients, I don't think there's real consequences to removing this option. This patch retains the behvior of the option being set. The option, when coupled with the Redact=false option, made the RPCs keep the log entries exactly as they are in the log file - i.e. redactable, with sensistive data markers. With this patch, the logs returned by the RPCs are either redacted, or redactable. "Flattening" them (i.e. not redacting, but stripping the markers) is no longer an option. The Logs RPC is used by the AdminUI, which doesn't use either Redact or KeepRedactable - so, with this patch, the UI will show logs with markers in them. I think that's OK. The LogFile RPC is used by the cli getting a debug zip. Since 20.2, the client is setting KeepRedactable, and so nothing changes. Older clients will get log entries with markers in them. This change is motivated by the fact that I'd like to replace the LogFile RPC with a more efficient version that returns a raw file, not a proto-encoded list of log entries. In fact we already have that RPC in the form of GetFiles, which we may be able to use. This KeepRedactable option was in my way. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

56829: cli: add \x command to toggle records display format r=knz a=erikgrinaker `psql` has a `\x` command to toggle extended output for results. CRDB already supports this output format via `\set display_format=records`. This commit adds `\x [on|off]` to toggle between `records` and `table` display formats. The `auto` option from `psql` to automatically enable extended output depending on the result width is not supported, only `on` and `off`. Resolves cockroachdb#56706. Release note (cli change): A `\x [on|off]` command has been added to toggle the `records` display format, following `psql` behavior. Also adds support for `yes` and `no` as boolean options for slash commands as a separate commit, following `psql`. 56840: util, server: fix memory detection with non-root cgroups r=darinpp a=darinpp The cgroup information is constructed by looking at `/proc/self/cgroup`, matching with the mount point in `/proc/self/mountinfo` and then inpsecting the `memory.stat` file. The matching between the cgroup and the mount point was working only in case that the mount and the cgroup are exact match relative to cgroup NS. This is however inadequate as while true for the docker case, it isn't true in the general case. In this example determining the cgroup memory limit isn't possible: ``` cgcreate -t $USER:$USER -a $USER:$USER -g memory:crdb_test echo 3999997952 > /sys/fs/cgroup/memory/crdb_test/memory.limit_in_bytes cgexec -g memory:crdb_test ./cockroach start-single-nod ``` To address this, this patch instead constructs the expected relative path of the cgroup's memory info by starting with the mount path and then adding the cgroup. Release note: None Co-authored-by: Erik Grinaker <erik@grinaker.org> Co-authored-by: Darin Peshev <darinp@gmail.com>

This commit adds a field `NonReleaseBlocker` to `testSpec` which, when set to true, causes the `release-blocker` tag to not be applied to the GitHub issues for test failures. Its intended purpose is to allow roachtests to be run nightly while they're not yet stable and we still expect non-release-blocking failures frequently. Release note: None

We're still adding new features to the `schemachange` workload frequently and finding minor bugs. We want this test to run nightly, but it's not stable enough to warrant failures being marked as release blockers. Release note: None

56883: sql: allow PRIMARY REGION syntax for CREATE / ALTER DATABASE r=ajstorm a=otan Release note (sql change): Introduced stubs for ALTER DATABASE ... PRIMARY REGION and CREATE TABLE ... PRIMARY REGION Co-authored-by: Oliver Tan <otan@cockroachlabs.com>

55474: sql: clarify help text for `show range` r=solongordon a=jseldess The SHOW RANGE FOR ROW statement now accepts the indexed values of the row as a tuple. The help text representation of this was previous (row, value, ...). The help text representation is now (value1, value2, ...), which I believe is clearer. Counterpart to this docs PR: cockroachdb/docs#8634 Release note: None Co-authored-by: Jesse Seldess <j_seldess@hotmail.com>

When iterating over the names of tables to scatter, we were sharing the loop variable between the goroutines, which is racy and caused a test failure due to reading a garbage string value for the table name. In general, this function may not scatter each table exactly once. This PR fixes the bug by moving the read out of the goroutine. Release note (bug fix): Fixed a race condition in the `tpcc` workload with the `--scatter` flag where tables could be scattered multiple times or not at all.

Relates to cockroachdb#56851. In investigations like cockroachdb#56851, we've seen the mutex in the Raft scheduler collapse due to too much concurrency. To address this, we needed to drop the scheduler's goroutine pool size to bound the amount of contention on the mutex to ensure that the scheduler was able to schedule any goroutines. This commit caps this concurrency to 96, instead of letting it grow unbounded as a function of the number of cores on the system. Release note (performance improvement): The Raft processing goroutine pool's size is now capped at 96. This was observed to prevent instability on large machines (32+ vCPU) in clusters with many ranges (50k+ per node).

Relates to cockroachdb#56851. In cockroachdb#56851, we saw that all of the Raft transport's receiving goroutines were stuck in the Raft scheduler, attempting to enqueue Ranges in response to coalesced heartbeats. We saw this in stacktraces like: ``` goroutine 321096 [semacquire]: sync.runtime_SemacquireMutex(0xc00007099c, 0xc005822a00, 0x1) /usr/local/go/src/runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc000070998) /usr/local/go/src/sync/mutex.go:138 +0xfc sync.(*Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:81 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).enqueue1(0xc000070980, 0x4, 0x19d8cb, 0x1) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:261 +0xb0 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).EnqueueRaftRequest(0xc000070980, 0x19d8cb) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:299 +0x3e github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleRaftUncoalescedRequest(0xc001136700, 0x4becc00, 0xc019f31b60, 0xc01288e5c0, 0x4ba44c0, 0xc014ff2b40, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:175 +0x201 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).uncoalesceBeats(0xc001136700, 0x4becc00, 0xc019f31b60, 0xc035790a80, 0x37, 0x43, 0x100000001, 0x29b00000000, 0x0, 0x400000004, ...) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:110 +0x33b github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleRaftRequest(0xc001136700, 0x4becc00, 0xc019f31b60, 0xc02be585f0, 0x4ba44c0, 0xc014ff2b40, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:130 +0x1be github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).handleRaftRequest(0xc000188780, 0x4becc00, 0xc019f31b60, 0xc02be585f0, 0x4ba44c0, 0xc014ff2b40, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:299 +0xab github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).RaftMessageBatch.func1.1.1(0x4c3fac0, 0xc00d3ccdf0, 0xc000188780, 0x4becc00, 0xc019f31b60, 0x95fe98, 0x40c5720) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:370 +0x199 ``` In that issue, we also saw that too much concurrency on the Raft scheduler's Mutex had caused the mutex to collapse (get stuck in the slow path, in the OS kernel) and hundreds of goroutines to pile up on it. We suspect that part of the problem here was that each of the coalesced heartbeats was locking the Raft scheduler once per Range. So a coalesced heartbeat that contained 10k ranges would lock the scheduler 10k times on the receiver. The commit attempts to alleviate this issue by batch enqueuing Ranges in the Raft scheduler in response to coalesced heartbeats. This has a slight fixed overhead (i.e. the need for a slice) but in response, reduces the load that coalesced heartbeats place on the Raft scheduler's mutex by a factor of 128 (`enqueueChunkSize`). This should reduce the impact that a large number of Ranges have on contention in the Raft scheduler. Release note (performance improvement): Interactions between Raft heartbeats and the Raft goroutine pool scheduler are now more efficient and avoid excessive mutex contention. This was observed to prevent instability on large machines (32+ vCPU) in clusters with many ranges (50k+ per node).

57241: build: install tar in our docker image r=kenliu a=chrisseto In the switch to redhat's ubi image, many binaries were excluded in the container. The loss of tar breaks kubectl cp, a critical tool for debugging cockroachdb clusters that are running in kubernetes. This commit includes tar into our docker images such that kubectl cp will work out of the box. Release note (general change): Included tar in docker images. This allows users to use kubectl cp on 20.2.x containers. Co-authored-by: Chris Seto <chriskseto@gmail.com>

This commit removes the templaing of "with-nulls" and "without-nulls" cases of the merge joiner because the benchmarks have shown that skipping nulls check in the latter case doesn't offer much performance improvement, but having that templating increases the amount of generated code by a factor of 2 or more. Additionally, this commit adjusts the template slightly to omit some empty lines. Release note: None

57252: logictest: add tests for backfilling multi-column inverted indexes r=RaduBerinde a=mgartner Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>

This commit removes multiple arguments in favor of using the spec when initializing the cFetcher as well as refactors several loops in order to avoid making copies of descriptors during table reader planning. Additionally, it moves a map from scanNode next to the map's only user (stats job) and removes the map from the scanNode itself as well as removes some of the unused arguments. Release note: None

57267: sql,colexec: minor cleanup of scan-related things r=yuzefovich a=yuzefovich This commit removes multiple arguments in favor of using the spec when initializing the cFetcher as well as refactors several loops in order to avoid making copies of descriptors during table reader planning. Additionally, it moves a map from scanNode next to the map's only user (stats job) and removes the map from the scanNode itself as well as removes some of the unused arguments. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

This is meant to be temporary until we discover the cause of cockroachdb#55290. Release note: None

57263: sql: raise error if opaque statement receives nil plan node r=RaduBerinde a=angelapwen Addresses cockroachdb#56861. Previously there was no check for a situation in which a planNode was nil. For opaque statements, this resulted in occasional panics when a nil planNode was returned. This commit adds a check for opaque statements to make sure the planNode is not nil before moving on to execution. Release note: None Co-authored-by: angelapwen <angelaw@cockroachlabs.com>

57233: ccl/sqlproxyccl: lazily initialize closedCh r=jbowens a=jbowens Lazily initialize `sqlproxyccl.Conn`'s `closedCh`. This avoids allocating a channel when it's not needed and allows package consumers to construct their own `sqlproxyccl.Conn`s without needing access to the unexported `closedCh` member field. Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>

57207: sql: remove pointless Clock update closure allocation r=jordanlewis a=jordanlewis Previously, we made a pointless closure on every query to update the hlc.Clock on receipt of new observed timestamps. This is pointless because we can be passing pre-allocated objects that can update themselves (most often the *hlc.Clock itself). ``` name old time/op new time/op delta FlowSetup/vectorize=true/distribute=false-24 132µs ± 2% 132µs ± 1% ~ (p=0.661 n=10+9) name old alloc/op new alloc/op delta FlowSetup/vectorize=true/distribute=false-24 28.3kB ± 1% 28.2kB ± 1% ~ (p=0.605 n=9+9) name old allocs/op new allocs/op delta FlowSetup/vectorize=true/distribute=false-24 232 ± 0% 230 ± 0% -0.86% (p=0.000 n=8+8) ``` Release note: None 57271: lease: add logging for questionable descriptors r=jordanlewis a=jordanlewis This is meant to be temporary until we discover the cause of cockroachdb#55290. Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

Release note: None

This commit changes the VectorizedFlow to store its type resolver as a struct directly, rather than a pointer. This prevents a pointless allocation per flow. Release note: None

Previously, when initializing a processor, we'd need to allocate a fresh SemaCtx for every processor. Instead of doing this, we add a SemaCtx value inside of ProcessorBase to share the ProcessorBase-allocated memory. Release note: None

Release note: None

Previously, resolving a table would always need to heap allocate the TableName that is used to look up the names. This was unnecessary. Release note: None

56587: importccl: Added row limit option for importing bundle formats r=adityamaru a=mokaixu A row limit option allows users to specify a fixed number of rows that they want to import. This functionality is only available for CSV/DELIMITED/AVRO formats. With this code change, users can limit the number of rows they want to import for PGDUMP and MYSQLDUMP bundle data formats. That is, they can specify: `WITH row_limit="{$num}"` to only import $num rows. Release note (sql change): Added WITH row_limit="{$num}" option for importing bundle formats to allow users to do a quick test run on an import of $num rows. Ex. IMPORT ... WITH row_limit="3"; Resolves: cockroachdb#26493 Co-authored-by: Monica Xu <monicax@cockroachlabs.com>

This commit adds a simple ContentionEvent protobuf. This protobuf is not yet emitted by KV although the idea is that it will be. In the meantime, the SQL Execution engine can use this message to generate mock contention events in order to build higher-level observability infrastructure around contention. Release note: None

This commit also implements mock contention event generation when either a testing knob or cluster setting is set. The former is useful for programmable tests, while the latter will be useful for observing changes to the DB console (e.g. global contention view). Release note: None (new behavior is only enabled in tests)

@tbg

56906: sql: produce mock ContentionEvents and display contention time in EXPLAIN ANALYZE r=RaduBerinde,tbg a=asubiotto Please take a look at individual commits for details Release note: None Closes cockroachdb#56612 @tbg could you take a look at the first commit which defines a `ContentionEvent` protobuf? It's close to what's described in cockroachdb#55583 (minus some fields that can be added later). Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com>

50869: ui: Correct responses for restricted access to non-admin users r=koorosh a=koorosh Resolves cockroachdb#48152, cockroachdb#50360 Depends on cockroachdb/admin-ui-components#37 Depends on cockroachdb/yarn-vendored#44 Previously, request handlers for node locations and eventslogs didn't check if the user has an admin role or not and proceeds with an attempt to query data from tables with granted permissions for admin users only. This error handled as server error then. Current change validates if a user has admin role before requesting data from tables and responds with Permission denied error when user doesn't have an admin role. UI changes: - Add InlineAlert component to display redesigned messages; **Screens** Single info message (Restricted permissions) <img width="1174" alt="Screenshot 2020-07-02 at 18 34 03" src="https://user-images.githubusercontent.com/3106437/86460281-c3d7c880-bd30-11ea-9ef4-006e436c26e8.png"> Single error message <img width="1154" alt="Screenshot 2020-07-02 at 18 41 54" src="https://user-images.githubusercontent.com/3106437/86460294-cd613080-bd30-11ea-9e81-9009f4d47b5e.png"> Multiple error messages <img width="1259" alt="Screenshot 2020-07-02 at 18 24 34" src="https://user-images.githubusercontent.com/3106437/86460323-d9e58900-bd30-11ea-81bf-6ff47fdadba1.png"> Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com>

57254: cloud: Bump orchestration version to 20.2.2 r=nathanstilwell a=nathanstilwell Release note: None Co-authored-by: Nathan Stilwell <nathanstilwell@cockroachlabs.com>

57197: sql: enum introspection improvements r=otan a=arulajmani See individual commit messages for details. Co-authored-by: arulajmani <arulajmani@gmail.com>

56968: colexec: retune hash table constants in all of its uses r=yuzefovich a=yuzefovich **colexec: adjust benchmarks to include cases with small number of tuples** This commit adjusts several benchmarks in order to run them against inputs with small number of tuples. The reasoning for such change is that we intend to remove the vectorize_row_count_threshold, so we now should be paying attention to both small and large input sets. Additionally, this commit removes the "with-nulls" cases from the aggregation benchmarks because the speed of "no-nulls" and "with-nulls" are roughly the same, so having both doesn't provide us more useful info, yet slows down the benchmark by a factor of 2. Release note: None **colexec: retune hash table constants in all of its uses** This commit changes the load factor and the initial number of buckets of the hash table according to the recent micro-benchmarking and running of TPCH queries. The numbers needed an update since we made the hash table as well as the operators that use it more dynamic. The benchmarks show an improvement on the inputs with small sizes, sometimes a regression on inputs roughly of `coldata.BatchSize()` in size, and mostly improvements on larger inputs. This trade-off seems beneficial to me. Release note: None 57179: cli: Warn users if --locality doesn't include "region" r=otan a=ajstorm With the new multi-region abstractions coming in 21.1, users will need to add regions to databases (and optionally, tables). The regions defined at the cluster level are derived from the locality flags, but we search explicitly for "region". As a result, if the user wishes to leverage the new abstractions, they must provide a "region" tier in the locality flag when starting their nodes. To push users in that direction, we warn on `cockroach start` if the supplied locality flag doesn't include a region tier. Release note (cli change): Warn users if locality flag doesn't contain a "region" tier. 57201: sql: remove a bunch of allocations from name and type resolution paths r=jordanlewis a=jordanlewis See individual commits. This PR shaves 3% of the total number of allocations in the FlowSetup benchmark via a variety of methods. ``` name old time/op new time/op delta FlowSetup/vectorize=true/distribute=false-24 144µs ± 1% 142µs ± 2% -1.01% (p=0.023 n=10+10) name old alloc/op new alloc/op delta FlowSetup/vectorize=true/distribute=false-24 28.3kB ± 1% 27.8kB ± 1% -1.76% (p=0.000 n=9+9) name old allocs/op new allocs/op delta FlowSetup/vectorize=true/distribute=false-24 232 ± 0% 224 ± 0% -3.45% (p=0.000 n=8+8) ``` Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Adam Storm <storm@cockroachlabs.com> Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

57266: colexec: remove templating nulls vs no-nulls case in merge joiner r=yuzefovich a=yuzefovich This commit removes the templaing of "with-nulls" and "without-nulls" cases of the merge joiner because the benchmarks have shown that skipping nulls check in the latter case doesn't offer much performance improvement, but having that templating increases the amount of generated code by a factor of 2 or more. Additionally, this commit adjusts the template slightly to omit some empty lines. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

This RFC puts forth a proposal for making schema changes in transactions behave more like postgresql and the expections of clients and ORMs. Release note: None

…db#80762 cockroachdb#80773 79911: opt: refactor and test lookup join key column and expr generation r=mgartner a=mgartner #### opt: simplify fetching outer column in CustomFuncs.findComputedColJoinEquality Previously, `CustomFuncs.findComputedColJoinEquality` used `CustomFuncs.OuterCols` to retrieve the outer columns of computed column expressions. `CustomFuncs.OuterCols` returns the cached outer columns in the expression if it is a `memo.ScalarPropsExpr`, and falls back to calculating the outer columns with `memo.BuildSharedProps` otherwise. Computed column expressions are never `memo.ScalarPropsExpr`s, so we use just use `memo.BuildSharedProps` directly. Release note: None #### opt: make RemapCols a method on Factory instead of CustomFuncs Release note: None #### opt: use partial-index-reduced filters when building lookup expressions This commit makes a minor change to `generateLookupJoinsImpl`. Previously, equality filters were extracted from the original `ON` filters. Now they are extracted from filters that have been reduced by partial index implication. This has no effect on behavior because equality filters that reference columns in two tables cannot exist in partial index predicates, so they will never be eliminated during partial index implication. Release note: None #### opt: moves some lookup join generation logic to lookup join package This commit adds a new `lookupjoin` package. Logic for determining the key columns and lookup expressions for lookup joins has been moved to `lookupJoin.ConstraintBuilder`. The code was moved with as few changes as possible, and the behavior does not change in any way. This move will make it easier to test this code in isolation in the future, and allow for further refactoring. Release note: None #### opt: generalize lookupjoin.ConstraintBuilder API This commit makes the lookupjoin.ConstraintBuilder API more general to make unit testing easier in a future commit. Release note: None #### opt: add data-driven tests for lookupjoin.ConstraintBuilder Release note: None #### opt: add lookupjoin.Constraint struct The `lookupjoin.Constraint` struct has been added to encapsulate multiple data structures that represent a strategy for constraining a lookup join. Release note: None 80511: pkg/cloud/azure: Support specifying Azure environments in storage URLs r=adityamaru a=nlowe-sx The Azure Storage cloud provider learned a new parameter, AZURE_ENVIRONMENT, which specifies which azure environment the storage account in question belongs to. This allows cockroach to backup and restore data to Azure Storage Accounts outside the main Azure Public Cloud. For backwards compatibility, this defaults to "AzurePublicCloud" if AZURE_ENVIRONMENT is not specified. Fixes cockroachdb#47163 ## Verification Evidence I spun up a single node cluster: ``` nlowe@nlowe-z4l:~/projects/github/cockroachdb/cockroach [feat/47163-azure-storage-support-multiple-environments L|✚ 2] [🗓 2022-04-22 08:25:49] $ bazel run //pkg/cmd/cockroach:cockroach -- start-single-node --insecure WARNING: Option 'host_javabase' is deprecated WARNING: Option 'javabase' is deprecated WARNING: Option 'host_java_toolchain' is deprecated WARNING: Option 'java_toolchain' is deprecated INFO: Invocation ID: 11504a98-f767-413a-8994-8f92793c2ecf INFO: Analyzed target //pkg/cmd/cockroach:cockroach (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //pkg/cmd/cockroach:cockroach up-to-date: _bazel/bin/pkg/cmd/cockroach/cockroach_/cockroach INFO: Elapsed time: 0.358s, Critical Path: 0.00s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action * * WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED! * * This mode is intended for non-production testing only. * * In this mode: * - Your cluster is open to any client that can access any of your IP addresses. * - Intruders with access to your machine or network can observe client-server traffic. * - Intruders can log in without password and read or write any data in the cluster. * - Intruders can consume all your server's resources and cause unavailability. * * * INFO: To start a secure server without mandating TLS for clients, * consider --accept-sql-without-tls instead. For other options, see: * * - https://go.crdb.dev/issue-v/53404/dev * - https://www.cockroachlabs.com/docs/dev/secure-a-cluster.html * * * WARNING: neither --listen-addr nor --advertise-addr was specified. * The server will advertise "nlowe-z4l" to other nodes, is this routable? * * Consider using: * - for local-only servers: --listen-addr=localhost * - for multi-node clusters: --advertise-addr=<host/IP addr> * * CockroachDB node starting at 2022-04-22 15:25:55.461315977 +0000 UTC (took 2.1s) build: CCL unknown @ (go1.17.6) webui: http://nlowe-z4l:8080/ sql: postgresql://root@nlowe-z4l:26257/defaultdb?sslmode=disable sql (JDBC): jdbc:postgresql://nlowe-z4l:26257/defaultdb?sslmode=disable&user=root RPC client flags: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach <client cmd> --host=nlowe-z4l:26257 --insecure logs: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/logs temp dir: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/cockroach-temp4100501952 external I/O path: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/extern store[0]: path=/home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data storage engine: pebble clusterID: bb3942d7-f241-4d26-aa4a-1bd0d6556e4d status: initialized new cluster nodeID: 1 ``` I was then able to view the contents of a backup hosted in an azure government storage account: ``` root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***&AZURE_ENVIRONMENT=AzureUSGovernmentCloud'] WHERE object_type = 'database'; object_name ------------------------------------------ example_database ... (17 rows) Time: 5.859632889s ``` Omitting the `AZURE_ENVIRONMENT` parameter, we can see cockroach defaults to the public cloud where my storage account does not exist: ``` root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***'] WHERE object_type = 'database'; ERROR: reading previous backup layers: unable to list files for specified blob: Get "https://account.blob.core.windows.net/container?comp=list&delimiter=path%2Fto%2Fbackup&restype=container&timeout=61": dial tcp: lookup account.blob.core.windows.net on 8.8.8.8:53: no such host ``` ## Tests Two new tests are added to verify that the storage account URL is correctly built from the provided Azure Environment name, and that the Environment defaults to the Public Cloud if unspecified for backwards compatibility. I verified the existing tests pass against a government storage account after specifying `AZURE_ENVIRONMENT` as `AzureUSGovernmentCloud` in the backup URL query parameters: ``` nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:38:26] $ export AZURE_ACCOUNT_NAME=account nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:38:42] $ export AZURE_ACCOUNT_KEY=*** nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:39:25] $ export AZURE_CONTAINER=container nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:39:48] $ export AZURE_ENVIRONMENT=AzureUSGovernmentCloud nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:40:15] $ bazel test --test_output=streamed --test_arg=-test.v --action_env=AZURE_ACCOUNT_NAME --action_env=AZURE_ACCOUNT_KEY --action_env=AZURE_CONTAINER --action_env=AZURE_ENVIRONMENT //pkg/cloud/azure:azure_test INFO: Invocation ID: aa88a942-f3c7-4df6-bade-8f5f0e18041f WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time INFO: Build option --action_env has changed, discarding analysis cache. INFO: Analyzed target //pkg/cloud/azure:azure_test (468 packages loaded, 16382 targets configured). INFO: Found 1 test target... initialized metamorphic constant "span-reuse-rate" with value 28 === RUN TestAzure === RUN TestAzure/simple_round_trip === RUN TestAzure/exceeds-4mb-chunk === RUN TestAzure/exceeds-4mb-chunk/rand-readats === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#00 cloud_test_helpers.go:226: read 3345 of file at 4778744 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#1 cloud_test_helpers.go:226: read 7228 of file at 226589 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#2 cloud_test_helpers.go:226: read 634 of file at 256284 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#3 cloud_test_helpers.go:226: read 7546 of file at 3546208 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#4 cloud_test_helpers.go:226: read 24123 of file at 4821795 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#5 cloud_test_helpers.go:226: read 16899 of file at 403428 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#6 cloud_test_helpers.go:226: read 29467 of file at 4886370 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#7 cloud_test_helpers.go:226: read 11700 of file at 1876920 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#8 cloud_test_helpers.go:226: read 2928 of file at 489781 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9 cloud_test_helpers.go:226: read 19933 of file at 1483342 === RUN TestAzure/read-single-file-by-uri === RUN TestAzure/write-single-file-by-uri === RUN TestAzure/file-does-not-exist === RUN TestAzure/List === RUN TestAzure/List/root === RUN TestAzure/List/file-slash-numbers-slash === RUN TestAzure/List/root-slash === RUN TestAzure/List/file === RUN TestAzure/List/file-slash === RUN TestAzure/List/slash-f === RUN TestAzure/List/nothing === RUN TestAzure/List/delim-slash-file-slash === RUN TestAzure/List/delim-data --- PASS: TestAzure (34.81s) --- PASS: TestAzure/simple_round_trip (9.66s) --- PASS: TestAzure/exceeds-4mb-chunk (16.45s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats (6.41s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#00 (0.15s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#1 (0.64s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#2 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#3 (0.60s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#4 (0.75s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#5 (0.80s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#6 (0.75s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#7 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#8 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9 (0.77s) --- PASS: TestAzure/read-single-file-by-uri (0.60s) --- PASS: TestAzure/write-single-file-by-uri (0.60s) --- PASS: TestAzure/file-does-not-exist (1.05s) --- PASS: TestAzure/List (2.40s) --- PASS: TestAzure/List/root (0.30s) --- PASS: TestAzure/List/file-slash-numbers-slash (0.30s) --- PASS: TestAzure/List/root-slash (0.30s) --- PASS: TestAzure/List/file (0.30s) --- PASS: TestAzure/List/file-slash (0.30s) --- PASS: TestAzure/List/slash-f (0.30s) --- PASS: TestAzure/List/nothing (0.15s) --- PASS: TestAzure/List/delim-slash-file-slash (0.15s) --- PASS: TestAzure/List/delim-data (0.30s) === RUN TestAntagonisticAzureRead --- PASS: TestAntagonisticAzureRead (103.90s) === RUN TestParseAzureURL === RUN TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset === RUN TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT --- PASS: TestParseAzureURL (0.00s) --- PASS: TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset (0.00s) --- PASS: TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT (0.00s) === RUN TestMakeAzureStorageURLFromEnvironment === RUN TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud === RUN TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud --- PASS: TestMakeAzureStorageURLFromEnvironment (0.00s) --- PASS: TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud (0.00s) --- PASS: TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud (0.00s) PASS Target //pkg/cloud/azure:azure_test up-to-date: _bazel/bin/pkg/cloud/azure/azure_test_/azure_test INFO: Elapsed time: 159.865s, Critical Path: 152.35s INFO: 66 processes: 2 internal, 64 darwin-sandbox. INFO: Build completed successfully, 66 total actions //pkg/cloud/azure:azure_test PASSED in 139.9s INFO: Build completed successfully, 66 total actions ``` 80705: kvclient: fix gRPC stream leak in rangefeed client r=tbg,srosenberg a=erikgrinaker When the DistSender rangefeed client received a `RangeFeedError` message and propagated a retryable error up the stack, it would fail to close the existing gRPC stream, causing stream/goroutine leaks. Release note (bug fix): Fixed a goroutine leak when internal rangefeed clients received certain kinds of retriable errors. 80762: joberror: add ConnectionReset/ConnectionRefused to retryable err allow list r=miretskiy a=adityamaru Bulk jobs will no longer treat `sysutil.IsErrConnectionReset` and `sysutil.IsErrConnectionRefused` as permanent errors. IMPORT, RESTORE and BACKUP will treat this error as transient and retry. Release note: None 80773: backupccl: break dependency to testcluster r=irfansharif a=irfansharif Noticed we were building testing library packages when building CRDB binaries. $ bazel query "somepath(//pkg/cmd/cockroach-short, //pkg/testutils/testcluster)" //pkg/cmd/cockroach-short:cockroach-short //pkg/cmd/cockroach-short:cockroach-short_lib //pkg/ccl:ccl //pkg/ccl/backupccl:backupccl //pkg/testutils/testcluster:testcluster Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Nathan Lowe <nathan.lowe@spacex.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

ajwerner force-pushed the ajwerner/transaction-schema-changes branch from 63e62e4 to cc8e784 Compare September 17, 2020 05:49

thoszhang reviewed Sep 17, 2020

View reviewed changes

jordanlewis reviewed Sep 17, 2020

View reviewed changes

vy-ton reviewed Sep 20, 2020

View reviewed changes

thoszhang reviewed Sep 21, 2020

View reviewed changes

ajwerner force-pushed the ajwerner/transaction-schema-changes branch 2 times, most recently from 086f427 to 610cf34 Compare October 1, 2020 02:02

ajwerner force-pushed the ajwerner/transaction-schema-changes branch 5 times, most recently from 956f69c to 9a2bbfb Compare October 22, 2020 14:41

ajwerner force-pushed the ajwerner/transaction-schema-changes branch 2 times, most recently from c789acb to 490089f Compare October 30, 2020 18:29

craig[bot] and others added 15 commits November 19, 2020 21:39

Merge cockroachdb#56914

c4032c2

56914: sqlsmith: disallow `crdb_internal.unsafe_*` metadata mutation builtins r=lucy-zhang a=lucy-zhang Closes cockroachdb#56890. Release note: None Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>

sql: allow PRIMARY REGION syntax for CREATE / ALTER DATABASE

be7dc05

Release note (sql change): Introduced stubs for ALTER DATABASE ... PRIMARY REGION and CREATE TABLE ... PRIMARY REGION

Merge cockroachdb#56883

81ac191

56883: sql: allow PRIMARY REGION syntax for CREATE / ALTER DATABASE r=ajstorm a=otan Release note (sql change): Introduced stubs for ALTER DATABASE ... PRIMARY REGION and CREATE TABLE ... PRIMARY REGION Co-authored-by: Oliver Tan <otan@cockroachlabs.com>

craig[bot] and others added 9 commits November 30, 2020 23:21

Merge cockroachdb#57252

67274b0

57252: logictest: add tests for backfilling multi-column inverted indexes r=RaduBerinde a=mgartner Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>

lease: add logging for questionable descriptors

211cf30

This is meant to be temporary until we discover the cause of cockroachdb#55290. Release note: None

ajwerner force-pushed the ajwerner/transaction-schema-changes branch from 490089f to cede00d Compare December 1, 2020 04:31

jordanlewis and others added 12 commits December 1, 2020 00:55

colfetcher: remove unused semactx object

aa5a86b

Release note: None

colflow: store type resolver as a non-pointer type

7d6cd82

This commit changes the VectorizedFlow to store its type resolver as a struct directly, rather than a pointer. This prevents a pointless allocation per flow. Release note: None

execinfra: keep SemaCtx memory in ProcessorBase

beb56d4

Previously, when initializing a processor, we'd need to allocate a fresh SemaCtx for every processor. Instead of doing this, we add a SemaCtx value inside of ProcessorBase to share the ProcessorBase-allocated memory. Release note: None

descs: remove pointless log allocs on the hot path

46bc274

Release note: None

descs: avoid heap allocation for TableName

0bd8a5d

Previously, resolving a table would always need to heap allocate the TableName that is used to look up the names. This was unnecessary. Release note: None

Merge cockroachdb#57254

0bcea47

57254: cloud: Bump orchestration version to 20.2.2 r=nathanstilwell a=nathanstilwell Release note: None Co-authored-by: Nathan Stilwell <nathanstilwell@cockroachlabs.com>

Merge cockroachdb#57197

9bae8ba

57197: sql: enum introspection improvements r=otan a=arulajmani See individual commit messages for details. Co-authored-by: arulajmani <arulajmani@gmail.com>

ajwerner force-pushed the ajwerner/transaction-schema-changes branch from cede00d to c18c23e Compare December 1, 2020 15:36

ajwerner force-pushed the ajwerner/transaction-schema-changes branch from c18c23e to d1e2d19 Compare December 1, 2020 16:48

craig[bot] and others added 2 commits December 1, 2020 16:57

RFCS: draft of transactional schema changes

93dbe44

This RFC puts forth a proposal for making schema changes in transactions behave more like postgresql and the expections of clients and ORMs. Release note: None

ajwerner force-pushed the ajwerner/transaction-schema-changes branch from d1e2d19 to 93dbe44 Compare December 1, 2020 17:31

ajwerner merged commit d6c9563 into master Nov 7, 2022

		1. When a user transaction commits, its DMLs all commit but its DDLs are
		merely queued. It's possible that the async parts of its DDLs will fail and


		# Motivation

		The lack of transactional schema changes boils down to two fundamental problems:

		* Ensure that the user transaction is able to drop the leases to the old
		table descriptor so that we don’t prevent version bumps.


		### Child Transactions

		The primary new KV concept required by this proposal is the child transaction.


		## Non-atomic DDLs ([#42061])

		Many schema changes require `O(rows)` work. Generally we've wanted to avoid

[DNM] RFCS: draft of transactional schema changes #8

[DNM] RFCS: draft of transactional schema changes #8

Conversation

ajwerner commented Sep 15, 2020 • edited Loading

ajwerner commented Sep 15, 2020

thoszhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thoszhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajwerner commented Sep 15, 2020 •

edited

Loading