-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] RFCS: draft of transactional schema changes #8
Conversation
This is a space for note taking on TODOs on the RFC prior to publishing it for more public consumption. |
63e62e4
to
cc8e784
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick first pass.
1. When a user transaction commits, its DMLs all commit but its DDLs are | ||
merely queued. It's possible that the async parts of its DDLs will fail and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor point, but there are some DDL statements (e.g., renaming a table column) that are committed in the transaction. One relatively small aspect of our problems is that it's probably not obvious to users which DDLs exhibit the non-transactional behavior and which ones don't, and which DDLs do or don't get rolled back if there are multiple within a transaction and one of the async ones fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the first issue.
|
||
# Motivation | ||
|
||
The lack of transactional schema changes boils down to two fundamental problems: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the effects of constraints being added on other writes is also worth mentioning here (either as a third distinct problem or a sub-problem of the first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the first
the physical layout has not changed but the logical layout has. When version 3 | ||
(`DELETE_ONLY`) comes into effect, the physical layout may differ from | ||
version 1. It's critical that by the time the physical layout begins to change | ||
we know that no transactions are in version 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to explicitly define "physical layout" and "logical layout" up front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Roughly speaking, physical layout refers to table data being stored in KV; logical layout refers to metadata indicating how to read/write that data to KV. The idea here is that we can't update the physical layout until the logical layout catches up.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sentence
and populates its target by synthesizing SSTs using the `bulkAdder`. All rows | ||
written by the index backfill have a timestamp that precedes any write due to | ||
transactional traffic, thus preserving subsequent writes as they are. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is quite true? IIUC we just pick Clock.Now()
as the read and write timestamp and basically overwrite our existing writes. There's also a step where we do a KV scan at that timestamp over the index span to resolve intents and populate the timestamp cache to avoid writing under the SSTs. That's about as much as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I just learned about this this week (and it turned out to be sort of buggy cockroachdb#54891).
* This transaction will create a job that then builds the relevant indexes | ||
but does not make them public. | ||
* If the job fails, the statement returns an error | ||
* Rollback should happen under the job too. We’ll wait for terminal. | ||
* If it succeeds the user transaction may proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm getting here is that the job (1) only builds the index, unlike today's schema change jobs; and (2) is essentially a black box from the perspective of the schema change state machine, in the sense that we'll just wait for the job to reach a terminal state before proceeding, sort of like today's schema change jobs in the connExecutor
. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in fact it's not clear whether job
is the right term.
* Ensure that the user transaction is able to drop the leases to the old | ||
table descriptor so that we don’t prevent version bumps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? What exactly happens if we already did some writes in the transaction with an older leased version of the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine if we did some writes with an older leased version of the table because we're going to backfill those writes into new indexes and the physical layout of the table can't change no matter how many versions we bump it through. I added a comment here.
* This is aided by the fact that we’re not re-writing rows. | ||
* The pessimistic mode interactions here are important | ||
* The backfiller should be able to read ignoring the parent’s locks. | ||
* What does it mean to re-issue previous writes to tables with new indexes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to clarify eventually that this refers to writes in the same transaction, as distinct from intents from other writing transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
schema changes. | ||
* Should jobs be used in any way for dealing with backfills and validations | ||
underneath a user transaction? | ||
* What are the pros? Lots of cons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pros are basically just that we get all the jobs observability "for free." But I guess we could hook backfills into the query progress somehow. It's unclear whether being able to pause these jobs is a good idea, but I think probably not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
|
||
### Child Transactions | ||
|
||
The primary new KV concept required by this proposal is the child transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is KV already planning to do this? Are we planning to?
|
||
When the above transaction commits, `foo` is in version 2 and has pending | ||
mutations to drop the columns c1 and c2. One thing to note is that transactions | ||
which happen after the above transaction may still observe those columns due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be something that would be resolved with the proposed future work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to "One thing to note is that transactions which happen after the above transaction may still observe those columns"? It's true that with the proposed work, we'd no longer commit the transaction at this point; the transaction would only be committed only when the entire column schema change is done. However, it's still true in general that because we lease/cache table descriptors, all descriptor changes still take place asynchronously across the cluster. We provide a guarantee that we don't return results to the client until the change has taken effect on all nodes. We can elaborate more on this offline. I'm not sure if I've answered your question.
The upshot of this design is that we can achieve compatibility with postgres | ||
and meet user expectations for what is possible within a SQL transaction. This | ||
proposal is not, however, without its risks. In order to regain | ||
transactionality, in some cases, we'll need to actually perform the work of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not in all cases
? Won't we always need to perform the schema change work within the user's transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andrew may be referring to something else, but here's one way to think of it: Currently all schema change work is performed in the user transaction and in a set of transactions after the user transaction commits. (A few simple schema changes like RENAME COLUMN are performed entirely in the user transaction.) The present proposal is to perform all the work in the user transaction, and (if needed) in a set of child transactions that commit before the user transaction commits.
Once the side schema changes and has evidence that all leases are up-to-date, | ||
the statement execution can continue. At this point, a new child transaction can | ||
write and commit a descriptor of the table descriptor which has the column in | ||
`WRITE_ONLY` with a pending mutation to perform a backfill that it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does WRITE_ONLY
status indicate to other transactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For indexes, it means that transactions will keep the index updated on writes, but not use the index for reads. For columns it's a bit more complicated, but essentially it's the same: we write to the column as though it were a normal column (with computed values, defaults, etc.) but it's not available for reads by the user.
|
||
## Non-atomic DDLs ([#42061]) | ||
|
||
Many schema changes require `O(rows)` work. Generally we've wanted to avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is 0(rows)
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means the work required is proportional to the number of rows in the table (e.g., for an index backfill, the more rows, the longer the backfill runs)
This RFC utilizes the language of | ||
[20151014_online_schema_change.md](./20151014_online_schema_change.md). | ||
|
||
* Child Transaction - a new concept in this RFC. A Child transaction is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different that nested transactions with savepoints
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is a separate internal concept.
* Why is this backfill different from all other backfills? | ||
* It needs to behave differently in the face of intents due to the user | ||
transaction. | ||
* What should it do when it encounters an intent from the user transaction? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intents are the previous statements within the same transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these are write intents from writes we did earlier
descriptor with the constraint as `PUBLIC` using a synthetic descriptor | ||
(but don't write descriptors until the end). | ||
* For concurrent transactions which see the constraint in `ADDING` | ||
* Evaluate the constraint. If failed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a concurrent txn inserts new rows AND the constraint still passes, the index backfill for the ALTER TABLE
would ignore these additional rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking about a case where we're adding an index and later adding a constraint in the same transaction? At this point the index backfill has already finished, and the index would be in the write-only state, so we'd just write to that index normally if the constraint isn't violated. Does that answer your question?
* Straw man: write to the table descriptor marking the constraint as | ||
having failed. | ||
* One problem I see is that the side-key gave us some assumption that there were not concurrent modifications from user transactions to the table descriptor | ||
* Straw man 2: write to some other key that the user transaction making the constraint public will need to read prior to committing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't all concurrent transactions need to read before committing to check for any conflicts with newly committed constraints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd be getting this information off the leased/cached copy of the descriptor (as usual), so it wouldn't be a KV read. I think we'd refresh the leased version if we detected a possible violation.
How do we expect users to use this ASYNC behavior? | ||
|
||
* This isn’t for the tools, this is for today’s users of schema changes who | ||
want their schema changes to happen if their connection disconnects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same behavior as bulk io operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, basically - we'd probably want to use jobs for this
|
||
* What should the semantics be when adding a column with a default value and using the `ASYNC` keywords? | ||
* Seems like ASYNC operations should not take effect at all. | ||
* How do we deal with DMLs and DDLs interleaved within a transaction to a table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this different than the interleaving that is happening your examples above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a first pass at responding to some of @vy-ton's comments.
|
||
## Non-atomic DDLs ([#42061]) | ||
|
||
Many schema changes require `O(rows)` work. Generally we've wanted to avoid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means the work required is proportional to the number of rows in the table (e.g., for an index backfill, the more rows, the longer the backfill runs)
|
||
When the above transaction commits, `foo` is in version 2 and has pending | ||
mutations to drop the columns c1 and c2. One thing to note is that transactions | ||
which happen after the above transaction may still observe those columns due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to "One thing to note is that transactions which happen after the above transaction may still observe those columns"? It's true that with the proposed work, we'd no longer commit the transaction at this point; the transaction would only be committed only when the entire column schema change is done. However, it's still true in general that because we lease/cache table descriptors, all descriptor changes still take place asynchronously across the cluster. We provide a guarantee that we don't return results to the client until the change has taken effect on all nodes. We can elaborate more on this offline. I'm not sure if I've answered your question.
the physical layout has not changed but the logical layout has. When version 3 | ||
(`DELETE_ONLY`) comes into effect, the physical layout may differ from | ||
version 1. It's critical that by the time the physical layout begins to change | ||
we know that no transactions are in version 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Roughly speaking, physical layout refers to table data being stored in KV; logical layout refers to metadata indicating how to read/write that data to KV. The idea here is that we can't update the physical layout until the logical layout catches up.)
The upshot of this design is that we can achieve compatibility with postgres | ||
and meet user expectations for what is possible within a SQL transaction. This | ||
proposal is not, however, without its risks. In order to regain | ||
transactionality, in some cases, we'll need to actually perform the work of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andrew may be referring to something else, but here's one way to think of it: Currently all schema change work is performed in the user transaction and in a set of transactions after the user transaction commits. (A few simple schema changes like RENAME COLUMN are performed entirely in the user transaction.) The present proposal is to perform all the work in the user transaction, and (if needed) in a set of child transactions that commit before the user transaction commits.
Once the side schema changes and has evidence that all leases are up-to-date, | ||
the statement execution can continue. At this point, a new child transaction can | ||
write and commit a descriptor of the table descriptor which has the column in | ||
`WRITE_ONLY` with a pending mutation to perform a backfill that it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For indexes, it means that transactions will keep the index updated on writes, but not use the index for reads. For columns it's a bit more complicated, but essentially it's the same: we write to the column as though it were a normal column (with computed values, defaults, etc.) but it's not available for reads by the user.
This RFC utilizes the language of | ||
[20151014_online_schema_change.md](./20151014_online_schema_change.md). | ||
|
||
* Child Transaction - a new concept in this RFC. A Child transaction is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is a separate internal concept.
instead we use a separate key to coordinate changes | ||
* Switch to pessimistic mode so that refreshes succeed (optional, see later). | ||
* Start a child transaction to add the column | ||
* What if we run into a failed, previous schema change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is if we started a schema change earlier and it never got properly cleaned up for some reason.
* Created in child txn which writes column as delete only. | ||
* First step is make column write-only and wait for that before beginning | ||
backfill. This also adds the indexes which will be backfilled by the job. | ||
* Backfill all new relevant indexes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backfill operation is mostly non-transactional (it involves just doing a bunch of bulk writes to KV) but it's coordinated from child transactions, not the user (parent) transaction.
* Why is this backfill different from all other backfills? | ||
* It needs to behave differently in the face of intents due to the user | ||
transaction. | ||
* What should it do when it encounters an intent from the user transaction? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these are write intents from writes we did earlier
086f427
to
610cf34
Compare
956f69c
to
9a2bbfb
Compare
c789acb
to
490089f
Compare
56914: sqlsmith: disallow `crdb_internal.unsafe_*` metadata mutation builtins r=lucy-zhang a=lucy-zhang Closes cockroachdb#56890. Release note: None Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
This patch removes the KeepRedactable option from the Logs and LogFile RPCs. The option existed just for backwards compatibility with pre-20.2 clients. Even for such old clients, I don't think there's real consequences to removing this option. This patch retains the behvior of the option being set. The option, when coupled with the Redact=false option, made the RPCs keep the log entries exactly as they are in the log file - i.e. redactable, with sensistive data markers. With this patch, the logs returned by the RPCs are either redacted, or redactable. "Flattening" them (i.e. not redacting, but stripping the markers) is no longer an option. The Logs RPC is used by the AdminUI, which doesn't use either Redact or KeepRedactable - so, with this patch, the UI will show logs with markers in them. I think that's OK. The LogFile RPC is used by the cli getting a debug zip. Since 20.2, the client is setting KeepRedactable, and so nothing changes. Older clients will get log entries with markers in them. This change is motivated by the fact that I'd like to replace the LogFile RPC with a more efficient version that returns a raw file, not a proto-encoded list of log entries. In fact we already have that RPC in the form of GetFiles, which we may be able to use. This KeepRedactable option was in my way. Release note: None
56458: kv: immediately propagate WriteTooOld error from DeleteRange requests r=nvanbenschoten a=nvanbenschoten Fixes cockroachdb#56111. This commit fixes a bug that allowed non-serializable state to leak from the return value of a DeleteRange request. DeleteRange requests were not returning WriteTooOld errors on conflicting tombstones at higher timestamps, so they were allowing conflicting DELETEs to influence their returned keys. This, in turn, was permitting cases where a DELETE's rows updated return value would disagree with previously observed state in a transaction. Worse than this, I think there might have even been a bug with deferring the WriteTooOld status from a DeleteRange request, because unlike most MVCC operations, MVCCDeleteRange does not complete before returning a WriteTooOldError. I think this means that it could have partially succeeded and then returned a WriteTooOldError. If the write too old status was deferred, this would result in the DeleteRange request failing to delete some keys. This is why "blind-write" requests must complete entirely even if they intend to return a WriteTooOldError (see `mvccPutInternal`). We don't particularly like this behavior, but it's been around for years. The fix for this issue is twofold. First, this commit fixes MVCCDeleteRange. It uses the relatively new `FailOnMoreRecent` flag in its initial scan to ensure that a WriteTooOld error is returned even in the presence of tombstones at equal or higher timestamps. This has the side-effect of letting us get rid of some strange code added in ee80e3d which scanned at MaxTimestamp. Second, this commit fixes DeleteRange request by giving it the `isRead` flag. This ensures that the request is never considered a blind-write and therefore never defers a "write too old" status. I intend to backport this to release-20.1 and release-20.2. Release notes (bug fix): DELETE statements no longer have a chance of returning an incorrect number of deleted rows in transactions that will eventually need to restart due to contention. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
The cgroup information is constructed by looking at `/proc/self/cgroup`, matching with the mount point in `/proc/self/mountinfo` and then inpsecting the `memory.stat` file. The matching between the cgroup and the mount point was working only in case that the mount and the cgroup are exact match relative to cgroup NS. This is however inadequate as while true for the docker case, it isn't true in the general case. In this example determining the cgroup memory limit isn't possible: ``` cgcreate -t $USER:$USER -a $USER:$USER -g memory:crdb_test echo 3999997952 > /sys/fs/cgroup/memory/crdb_test/memory.limit_in_bytes cgexec -g memory:crdb_test ./cockroach start-single-nod ``` To address this, this patch instead constructs the expected relative path of the cgroup's memory info by starting with the mount path and then adding the cgroup. Release note: None
56835: opt: constrain multi-col inverted index prefix cols by contiguous range r=mgartner a=mgartner This commit allows for the prefix columns of multi-column inverted indexes to be constrained to a contiguous range. For example, consider the schema and query: CREATE TABLE t ( k INT PRIMARY KEY, i INT, g GEOMETRY, INVERTED INDEX (i, g) ) SELECT k FROM t WHERE i IN (1, 2, 3) AND ST_CoveredBy(..., g) Prior to this commit, a scan over the inverted index would not be generated for this query. Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Release note (sql change): Introduced stubs for ALTER DATABASE ... PRIMARY REGION and CREATE TABLE ... PRIMARY REGION
56878: server: remove KeepRedactable logs rpc option r=andreimatei a=andreimatei This patch removes the KeepRedactable option from the Logs and LogFile RPCs. The option existed just for backwards compatibility with pre-20.2 clients. Even for such old clients, I don't think there's real consequences to removing this option. This patch retains the behvior of the option being set. The option, when coupled with the Redact=false option, made the RPCs keep the log entries exactly as they are in the log file - i.e. redactable, with sensistive data markers. With this patch, the logs returned by the RPCs are either redacted, or redactable. "Flattening" them (i.e. not redacting, but stripping the markers) is no longer an option. The Logs RPC is used by the AdminUI, which doesn't use either Redact or KeepRedactable - so, with this patch, the UI will show logs with markers in them. I think that's OK. The LogFile RPC is used by the cli getting a debug zip. Since 20.2, the client is setting KeepRedactable, and so nothing changes. Older clients will get log entries with markers in them. This change is motivated by the fact that I'd like to replace the LogFile RPC with a more efficient version that returns a raw file, not a proto-encoded list of log entries. In fact we already have that RPC in the form of GetFiles, which we may be able to use. This KeepRedactable option was in my way. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
56829: cli: add \x command to toggle records display format r=knz a=erikgrinaker `psql` has a `\x` command to toggle extended output for results. CRDB already supports this output format via `\set display_format=records`. This commit adds `\x [on|off]` to toggle between `records` and `table` display formats. The `auto` option from `psql` to automatically enable extended output depending on the result width is not supported, only `on` and `off`. Resolves cockroachdb#56706. Release note (cli change): A `\x [on|off]` command has been added to toggle the `records` display format, following `psql` behavior. Also adds support for `yes` and `no` as boolean options for slash commands as a separate commit, following `psql`. 56840: util, server: fix memory detection with non-root cgroups r=darinpp a=darinpp The cgroup information is constructed by looking at `/proc/self/cgroup`, matching with the mount point in `/proc/self/mountinfo` and then inpsecting the `memory.stat` file. The matching between the cgroup and the mount point was working only in case that the mount and the cgroup are exact match relative to cgroup NS. This is however inadequate as while true for the docker case, it isn't true in the general case. In this example determining the cgroup memory limit isn't possible: ``` cgcreate -t $USER:$USER -a $USER:$USER -g memory:crdb_test echo 3999997952 > /sys/fs/cgroup/memory/crdb_test/memory.limit_in_bytes cgexec -g memory:crdb_test ./cockroach start-single-nod ``` To address this, this patch instead constructs the expected relative path of the cgroup's memory info by starting with the mount path and then adding the cgroup. Release note: None Co-authored-by: Erik Grinaker <erik@grinaker.org> Co-authored-by: Darin Peshev <darinp@gmail.com>
This commit adds a field `NonReleaseBlocker` to `testSpec` which, when set to true, causes the `release-blocker` tag to not be applied to the GitHub issues for test failures. Its intended purpose is to allow roachtests to be run nightly while they're not yet stable and we still expect non-release-blocking failures frequently. Release note: None
We're still adding new features to the `schemachange` workload frequently and finding minor bugs. We want this test to run nightly, but it's not stable enough to warrant failures being marked as release blockers. Release note: None
56883: sql: allow PRIMARY REGION syntax for CREATE / ALTER DATABASE r=ajstorm a=otan Release note (sql change): Introduced stubs for ALTER DATABASE ... PRIMARY REGION and CREATE TABLE ... PRIMARY REGION Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
55474: sql: clarify help text for `show range` r=solongordon a=jseldess The SHOW RANGE FOR ROW statement now accepts the indexed values of the row as a tuple. The help text representation of this was previous (row, value, ...). The help text representation is now (value1, value2, ...), which I believe is clearer. Counterpart to this docs PR: cockroachdb/docs#8634 Release note: None Co-authored-by: Jesse Seldess <j_seldess@hotmail.com>
When iterating over the names of tables to scatter, we were sharing the loop variable between the goroutines, which is racy and caused a test failure due to reading a garbage string value for the table name. In general, this function may not scatter each table exactly once. This PR fixes the bug by moving the read out of the goroutine. Release note (bug fix): Fixed a race condition in the `tpcc` workload with the `--scatter` flag where tables could be scattered multiple times or not at all.
Relates to cockroachdb#56851. In investigations like cockroachdb#56851, we've seen the mutex in the Raft scheduler collapse due to too much concurrency. To address this, we needed to drop the scheduler's goroutine pool size to bound the amount of contention on the mutex to ensure that the scheduler was able to schedule any goroutines. This commit caps this concurrency to 96, instead of letting it grow unbounded as a function of the number of cores on the system. Release note (performance improvement): The Raft processing goroutine pool's size is now capped at 96. This was observed to prevent instability on large machines (32+ vCPU) in clusters with many ranges (50k+ per node).
Relates to cockroachdb#56851. In cockroachdb#56851, we saw that all of the Raft transport's receiving goroutines were stuck in the Raft scheduler, attempting to enqueue Ranges in response to coalesced heartbeats. We saw this in stacktraces like: ``` goroutine 321096 [semacquire]: sync.runtime_SemacquireMutex(0xc00007099c, 0xc005822a00, 0x1) /usr/local/go/src/runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc000070998) /usr/local/go/src/sync/mutex.go:138 +0xfc sync.(*Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:81 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).enqueue1(0xc000070980, 0x4, 0x19d8cb, 0x1) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:261 +0xb0 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).EnqueueRaftRequest(0xc000070980, 0x19d8cb) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:299 +0x3e github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleRaftUncoalescedRequest(0xc001136700, 0x4becc00, 0xc019f31b60, 0xc01288e5c0, 0x4ba44c0, 0xc014ff2b40, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:175 +0x201 github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).uncoalesceBeats(0xc001136700, 0x4becc00, 0xc019f31b60, 0xc035790a80, 0x37, 0x43, 0x100000001, 0x29b00000000, 0x0, 0x400000004, ...) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:110 +0x33b github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleRaftRequest(0xc001136700, 0x4becc00, 0xc019f31b60, 0xc02be585f0, 0x4ba44c0, 0xc014ff2b40, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:130 +0x1be github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).handleRaftRequest(0xc000188780, 0x4becc00, 0xc019f31b60, 0xc02be585f0, 0x4ba44c0, 0xc014ff2b40, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:299 +0xab github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).RaftMessageBatch.func1.1.1(0x4c3fac0, 0xc00d3ccdf0, 0xc000188780, 0x4becc00, 0xc019f31b60, 0x95fe98, 0x40c5720) /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:370 +0x199 ``` In that issue, we also saw that too much concurrency on the Raft scheduler's Mutex had caused the mutex to collapse (get stuck in the slow path, in the OS kernel) and hundreds of goroutines to pile up on it. We suspect that part of the problem here was that each of the coalesced heartbeats was locking the Raft scheduler once per Range. So a coalesced heartbeat that contained 10k ranges would lock the scheduler 10k times on the receiver. The commit attempts to alleviate this issue by batch enqueuing Ranges in the Raft scheduler in response to coalesced heartbeats. This has a slight fixed overhead (i.e. the need for a slice) but in response, reduces the load that coalesced heartbeats place on the Raft scheduler's mutex by a factor of 128 (`enqueueChunkSize`). This should reduce the impact that a large number of Ranges have on contention in the Raft scheduler. Release note (performance improvement): Interactions between Raft heartbeats and the Raft goroutine pool scheduler are now more efficient and avoid excessive mutex contention. This was observed to prevent instability on large machines (32+ vCPU) in clusters with many ranges (50k+ per node).
57241: build: install tar in our docker image r=kenliu a=chrisseto In the switch to redhat's ubi image, many binaries were excluded in the container. The loss of tar breaks kubectl cp, a critical tool for debugging cockroachdb clusters that are running in kubernetes. This commit includes tar into our docker images such that kubectl cp will work out of the box. Release note (general change): Included tar in docker images. This allows users to use kubectl cp on 20.2.x containers. Co-authored-by: Chris Seto <chriskseto@gmail.com>
This commit removes the templaing of "with-nulls" and "without-nulls" cases of the merge joiner because the benchmarks have shown that skipping nulls check in the latter case doesn't offer much performance improvement, but having that templating increases the amount of generated code by a factor of 2 or more. Additionally, this commit adjusts the template slightly to omit some empty lines. Release note: None
57252: logictest: add tests for backfilling multi-column inverted indexes r=RaduBerinde a=mgartner Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
This commit removes multiple arguments in favor of using the spec when initializing the cFetcher as well as refactors several loops in order to avoid making copies of descriptors during table reader planning. Additionally, it moves a map from scanNode next to the map's only user (stats job) and removes the map from the scanNode itself as well as removes some of the unused arguments. Release note: None
57267: sql,colexec: minor cleanup of scan-related things r=yuzefovich a=yuzefovich This commit removes multiple arguments in favor of using the spec when initializing the cFetcher as well as refactors several loops in order to avoid making copies of descriptors during table reader planning. Additionally, it moves a map from scanNode next to the map's only user (stats job) and removes the map from the scanNode itself as well as removes some of the unused arguments. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This is meant to be temporary until we discover the cause of cockroachdb#55290. Release note: None
57263: sql: raise error if opaque statement receives nil plan node r=RaduBerinde a=angelapwen Addresses cockroachdb#56861. Previously there was no check for a situation in which a planNode was nil. For opaque statements, this resulted in occasional panics when a nil planNode was returned. This commit adds a check for opaque statements to make sure the planNode is not nil before moving on to execution. Release note: None Co-authored-by: angelapwen <angelaw@cockroachlabs.com>
57233: ccl/sqlproxyccl: lazily initialize closedCh r=jbowens a=jbowens Lazily initialize `sqlproxyccl.Conn`'s `closedCh`. This avoids allocating a channel when it's not needed and allows package consumers to construct their own `sqlproxyccl.Conn`s without needing access to the unexported `closedCh` member field. Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
57207: sql: remove pointless Clock update closure allocation r=jordanlewis a=jordanlewis Previously, we made a pointless closure on every query to update the hlc.Clock on receipt of new observed timestamps. This is pointless because we can be passing pre-allocated objects that can update themselves (most often the *hlc.Clock itself). ``` name old time/op new time/op delta FlowSetup/vectorize=true/distribute=false-24 132µs ± 2% 132µs ± 1% ~ (p=0.661 n=10+9) name old alloc/op new alloc/op delta FlowSetup/vectorize=true/distribute=false-24 28.3kB ± 1% 28.2kB ± 1% ~ (p=0.605 n=9+9) name old allocs/op new allocs/op delta FlowSetup/vectorize=true/distribute=false-24 232 ± 0% 230 ± 0% -0.86% (p=0.000 n=8+8) ``` Release note: None 57271: lease: add logging for questionable descriptors r=jordanlewis a=jordanlewis This is meant to be temporary until we discover the cause of cockroachdb#55290. Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
490089f
to
cede00d
Compare
Release note: None
This commit changes the VectorizedFlow to store its type resolver as a struct directly, rather than a pointer. This prevents a pointless allocation per flow. Release note: None
Previously, when initializing a processor, we'd need to allocate a fresh SemaCtx for every processor. Instead of doing this, we add a SemaCtx value inside of ProcessorBase to share the ProcessorBase-allocated memory. Release note: None
Release note: None
Previously, resolving a table would always need to heap allocate the TableName that is used to look up the names. This was unnecessary. Release note: None
56587: importccl: Added row limit option for importing bundle formats r=adityamaru a=mokaixu A row limit option allows users to specify a fixed number of rows that they want to import. This functionality is only available for CSV/DELIMITED/AVRO formats. With this code change, users can limit the number of rows they want to import for PGDUMP and MYSQLDUMP bundle data formats. That is, they can specify: `WITH row_limit="{$num}"` to only import $num rows. Release note (sql change): Added WITH row_limit="{$num}" option for importing bundle formats to allow users to do a quick test run on an import of $num rows. Ex. IMPORT ... WITH row_limit="3"; Resolves: cockroachdb#26493 Co-authored-by: Monica Xu <monicax@cockroachlabs.com>
This commit adds a simple ContentionEvent protobuf. This protobuf is not yet emitted by KV although the idea is that it will be. In the meantime, the SQL Execution engine can use this message to generate mock contention events in order to build higher-level observability infrastructure around contention. Release note: None
This commit also implements mock contention event generation when either a testing knob or cluster setting is set. The former is useful for programmable tests, while the latter will be useful for observing changes to the DB console (e.g. global contention view). Release note: None (new behavior is only enabled in tests)
56906: sql: produce mock ContentionEvents and display contention time in EXPLAIN ANALYZE r=RaduBerinde,tbg a=asubiotto Please take a look at individual commits for details Release note: None Closes cockroachdb#56612 @tbg could you take a look at the first commit which defines a `ContentionEvent` protobuf? It's close to what's described in cockroachdb#55583 (minus some fields that can be added later). Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com>
50869: ui: Correct responses for restricted access to non-admin users r=koorosh a=koorosh Resolves cockroachdb#48152, cockroachdb#50360 Depends on cockroachdb/admin-ui-components#37 Depends on cockroachdb/yarn-vendored#44 Previously, request handlers for node locations and eventslogs didn't check if the user has an admin role or not and proceeds with an attempt to query data from tables with granted permissions for admin users only. This error handled as server error then. Current change validates if a user has admin role before requesting data from tables and responds with Permission denied error when user doesn't have an admin role. UI changes: - Add InlineAlert component to display redesigned messages; **Screens** Single info message (Restricted permissions) <img width="1174" alt="Screenshot 2020-07-02 at 18 34 03" src="https://user-images.githubusercontent.com/3106437/86460281-c3d7c880-bd30-11ea-9ef4-006e436c26e8.png"> Single error message <img width="1154" alt="Screenshot 2020-07-02 at 18 41 54" src="https://user-images.githubusercontent.com/3106437/86460294-cd613080-bd30-11ea-9e81-9009f4d47b5e.png"> Multiple error messages <img width="1259" alt="Screenshot 2020-07-02 at 18 24 34" src="https://user-images.githubusercontent.com/3106437/86460323-d9e58900-bd30-11ea-81bf-6ff47fdadba1.png"> Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com>
57254: cloud: Bump orchestration version to 20.2.2 r=nathanstilwell a=nathanstilwell Release note: None Co-authored-by: Nathan Stilwell <nathanstilwell@cockroachlabs.com>
57197: sql: enum introspection improvements r=otan a=arulajmani See individual commit messages for details. Co-authored-by: arulajmani <arulajmani@gmail.com>
cede00d
to
c18c23e
Compare
56968: colexec: retune hash table constants in all of its uses r=yuzefovich a=yuzefovich **colexec: adjust benchmarks to include cases with small number of tuples** This commit adjusts several benchmarks in order to run them against inputs with small number of tuples. The reasoning for such change is that we intend to remove the vectorize_row_count_threshold, so we now should be paying attention to both small and large input sets. Additionally, this commit removes the "with-nulls" cases from the aggregation benchmarks because the speed of "no-nulls" and "with-nulls" are roughly the same, so having both doesn't provide us more useful info, yet slows down the benchmark by a factor of 2. Release note: None **colexec: retune hash table constants in all of its uses** This commit changes the load factor and the initial number of buckets of the hash table according to the recent micro-benchmarking and running of TPCH queries. The numbers needed an update since we made the hash table as well as the operators that use it more dynamic. The benchmarks show an improvement on the inputs with small sizes, sometimes a regression on inputs roughly of `coldata.BatchSize()` in size, and mostly improvements on larger inputs. This trade-off seems beneficial to me. Release note: None 57179: cli: Warn users if --locality doesn't include "region" r=otan a=ajstorm With the new multi-region abstractions coming in 21.1, users will need to add regions to databases (and optionally, tables). The regions defined at the cluster level are derived from the locality flags, but we search explicitly for "region". As a result, if the user wishes to leverage the new abstractions, they must provide a "region" tier in the locality flag when starting their nodes. To push users in that direction, we warn on `cockroach start` if the supplied locality flag doesn't include a region tier. Release note (cli change): Warn users if locality flag doesn't contain a "region" tier. 57201: sql: remove a bunch of allocations from name and type resolution paths r=jordanlewis a=jordanlewis See individual commits. This PR shaves 3% of the total number of allocations in the FlowSetup benchmark via a variety of methods. ``` name old time/op new time/op delta FlowSetup/vectorize=true/distribute=false-24 144µs ± 1% 142µs ± 2% -1.01% (p=0.023 n=10+10) name old alloc/op new alloc/op delta FlowSetup/vectorize=true/distribute=false-24 28.3kB ± 1% 27.8kB ± 1% -1.76% (p=0.000 n=9+9) name old allocs/op new allocs/op delta FlowSetup/vectorize=true/distribute=false-24 232 ± 0% 224 ± 0% -3.45% (p=0.000 n=8+8) ``` Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Adam Storm <storm@cockroachlabs.com> Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
c18c23e
to
d1e2d19
Compare
57266: colexec: remove templating nulls vs no-nulls case in merge joiner r=yuzefovich a=yuzefovich This commit removes the templaing of "with-nulls" and "without-nulls" cases of the merge joiner because the benchmarks have shown that skipping nulls check in the latter case doesn't offer much performance improvement, but having that templating increases the amount of generated code by a factor of 2 or more. Additionally, this commit adjusts the template slightly to omit some empty lines. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
This RFC puts forth a proposal for making schema changes in transactions behave more like postgresql and the expections of clients and ORMs. Release note: None
d1e2d19
to
93dbe44
Compare
…db#80762 cockroachdb#80773 79911: opt: refactor and test lookup join key column and expr generation r=mgartner a=mgartner #### opt: simplify fetching outer column in CustomFuncs.findComputedColJoinEquality Previously, `CustomFuncs.findComputedColJoinEquality` used `CustomFuncs.OuterCols` to retrieve the outer columns of computed column expressions. `CustomFuncs.OuterCols` returns the cached outer columns in the expression if it is a `memo.ScalarPropsExpr`, and falls back to calculating the outer columns with `memo.BuildSharedProps` otherwise. Computed column expressions are never `memo.ScalarPropsExpr`s, so we use just use `memo.BuildSharedProps` directly. Release note: None #### opt: make RemapCols a method on Factory instead of CustomFuncs Release note: None #### opt: use partial-index-reduced filters when building lookup expressions This commit makes a minor change to `generateLookupJoinsImpl`. Previously, equality filters were extracted from the original `ON` filters. Now they are extracted from filters that have been reduced by partial index implication. This has no effect on behavior because equality filters that reference columns in two tables cannot exist in partial index predicates, so they will never be eliminated during partial index implication. Release note: None #### opt: moves some lookup join generation logic to lookup join package This commit adds a new `lookupjoin` package. Logic for determining the key columns and lookup expressions for lookup joins has been moved to `lookupJoin.ConstraintBuilder`. The code was moved with as few changes as possible, and the behavior does not change in any way. This move will make it easier to test this code in isolation in the future, and allow for further refactoring. Release note: None #### opt: generalize lookupjoin.ConstraintBuilder API This commit makes the lookupjoin.ConstraintBuilder API more general to make unit testing easier in a future commit. Release note: None #### opt: add data-driven tests for lookupjoin.ConstraintBuilder Release note: None #### opt: add lookupjoin.Constraint struct The `lookupjoin.Constraint` struct has been added to encapsulate multiple data structures that represent a strategy for constraining a lookup join. Release note: None 80511: pkg/cloud/azure: Support specifying Azure environments in storage URLs r=adityamaru a=nlowe-sx The Azure Storage cloud provider learned a new parameter, AZURE_ENVIRONMENT, which specifies which azure environment the storage account in question belongs to. This allows cockroach to backup and restore data to Azure Storage Accounts outside the main Azure Public Cloud. For backwards compatibility, this defaults to "AzurePublicCloud" if AZURE_ENVIRONMENT is not specified. Fixes cockroachdb#47163 ## Verification Evidence I spun up a single node cluster: ``` nlowe@nlowe-z4l:~/projects/github/cockroachdb/cockroach [feat/47163-azure-storage-support-multiple-environments L|✚ 2] [🗓 2022-04-22 08:25:49] $ bazel run //pkg/cmd/cockroach:cockroach -- start-single-node --insecure WARNING: Option 'host_javabase' is deprecated WARNING: Option 'javabase' is deprecated WARNING: Option 'host_java_toolchain' is deprecated WARNING: Option 'java_toolchain' is deprecated INFO: Invocation ID: 11504a98-f767-413a-8994-8f92793c2ecf INFO: Analyzed target //pkg/cmd/cockroach:cockroach (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //pkg/cmd/cockroach:cockroach up-to-date: _bazel/bin/pkg/cmd/cockroach/cockroach_/cockroach INFO: Elapsed time: 0.358s, Critical Path: 0.00s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action * * WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED! * * This mode is intended for non-production testing only. * * In this mode: * - Your cluster is open to any client that can access any of your IP addresses. * - Intruders with access to your machine or network can observe client-server traffic. * - Intruders can log in without password and read or write any data in the cluster. * - Intruders can consume all your server's resources and cause unavailability. * * * INFO: To start a secure server without mandating TLS for clients, * consider --accept-sql-without-tls instead. For other options, see: * * - https://go.crdb.dev/issue-v/53404/dev * - https://www.cockroachlabs.com/docs/dev/secure-a-cluster.html * * * WARNING: neither --listen-addr nor --advertise-addr was specified. * The server will advertise "nlowe-z4l" to other nodes, is this routable? * * Consider using: * - for local-only servers: --listen-addr=localhost * - for multi-node clusters: --advertise-addr=<host/IP addr> * * CockroachDB node starting at 2022-04-22 15:25:55.461315977 +0000 UTC (took 2.1s) build: CCL unknown @ (go1.17.6) webui: http://nlowe-z4l:8080/ sql: postgresql://root@nlowe-z4l:26257/defaultdb?sslmode=disable sql (JDBC): jdbc:postgresql://nlowe-z4l:26257/defaultdb?sslmode=disable&user=root RPC client flags: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach <client cmd> --host=nlowe-z4l:26257 --insecure logs: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/logs temp dir: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/cockroach-temp4100501952 external I/O path: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/extern store[0]: path=/home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data storage engine: pebble clusterID: bb3942d7-f241-4d26-aa4a-1bd0d6556e4d status: initialized new cluster nodeID: 1 ``` I was then able to view the contents of a backup hosted in an azure government storage account: ``` root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***&AZURE_ENVIRONMENT=AzureUSGovernmentCloud'] WHERE object_type = 'database'; object_name ------------------------------------------ example_database ... (17 rows) Time: 5.859632889s ``` Omitting the `AZURE_ENVIRONMENT` parameter, we can see cockroach defaults to the public cloud where my storage account does not exist: ``` root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***'] WHERE object_type = 'database'; ERROR: reading previous backup layers: unable to list files for specified blob: Get "https://account.blob.core.windows.net/container?comp=list&delimiter=path%2Fto%2Fbackup&restype=container&timeout=61": dial tcp: lookup account.blob.core.windows.net on 8.8.8.8:53: no such host ``` ## Tests Two new tests are added to verify that the storage account URL is correctly built from the provided Azure Environment name, and that the Environment defaults to the Public Cloud if unspecified for backwards compatibility. I verified the existing tests pass against a government storage account after specifying `AZURE_ENVIRONMENT` as `AzureUSGovernmentCloud` in the backup URL query parameters: ``` nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:38:26] $ export AZURE_ACCOUNT_NAME=account nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:38:42] $ export AZURE_ACCOUNT_KEY=*** nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:39:25] $ export AZURE_CONTAINER=container nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:39:48] $ export AZURE_ENVIRONMENT=AzureUSGovernmentCloud nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:40:15] $ bazel test --test_output=streamed --test_arg=-test.v --action_env=AZURE_ACCOUNT_NAME --action_env=AZURE_ACCOUNT_KEY --action_env=AZURE_CONTAINER --action_env=AZURE_ENVIRONMENT //pkg/cloud/azure:azure_test INFO: Invocation ID: aa88a942-f3c7-4df6-bade-8f5f0e18041f WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time INFO: Build option --action_env has changed, discarding analysis cache. INFO: Analyzed target //pkg/cloud/azure:azure_test (468 packages loaded, 16382 targets configured). INFO: Found 1 test target... initialized metamorphic constant "span-reuse-rate" with value 28 === RUN TestAzure === RUN TestAzure/simple_round_trip === RUN TestAzure/exceeds-4mb-chunk === RUN TestAzure/exceeds-4mb-chunk/rand-readats === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#00 cloud_test_helpers.go:226: read 3345 of file at 4778744 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#1 cloud_test_helpers.go:226: read 7228 of file at 226589 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#2 cloud_test_helpers.go:226: read 634 of file at 256284 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#3 cloud_test_helpers.go:226: read 7546 of file at 3546208 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#4 cloud_test_helpers.go:226: read 24123 of file at 4821795 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#5 cloud_test_helpers.go:226: read 16899 of file at 403428 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#6 cloud_test_helpers.go:226: read 29467 of file at 4886370 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#7 cloud_test_helpers.go:226: read 11700 of file at 1876920 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#8 cloud_test_helpers.go:226: read 2928 of file at 489781 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9 cloud_test_helpers.go:226: read 19933 of file at 1483342 === RUN TestAzure/read-single-file-by-uri === RUN TestAzure/write-single-file-by-uri === RUN TestAzure/file-does-not-exist === RUN TestAzure/List === RUN TestAzure/List/root === RUN TestAzure/List/file-slash-numbers-slash === RUN TestAzure/List/root-slash === RUN TestAzure/List/file === RUN TestAzure/List/file-slash === RUN TestAzure/List/slash-f === RUN TestAzure/List/nothing === RUN TestAzure/List/delim-slash-file-slash === RUN TestAzure/List/delim-data --- PASS: TestAzure (34.81s) --- PASS: TestAzure/simple_round_trip (9.66s) --- PASS: TestAzure/exceeds-4mb-chunk (16.45s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats (6.41s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#00 (0.15s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#1 (0.64s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#2 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#3 (0.60s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#4 (0.75s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#5 (0.80s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#6 (0.75s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#7 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#8 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9 (0.77s) --- PASS: TestAzure/read-single-file-by-uri (0.60s) --- PASS: TestAzure/write-single-file-by-uri (0.60s) --- PASS: TestAzure/file-does-not-exist (1.05s) --- PASS: TestAzure/List (2.40s) --- PASS: TestAzure/List/root (0.30s) --- PASS: TestAzure/List/file-slash-numbers-slash (0.30s) --- PASS: TestAzure/List/root-slash (0.30s) --- PASS: TestAzure/List/file (0.30s) --- PASS: TestAzure/List/file-slash (0.30s) --- PASS: TestAzure/List/slash-f (0.30s) --- PASS: TestAzure/List/nothing (0.15s) --- PASS: TestAzure/List/delim-slash-file-slash (0.15s) --- PASS: TestAzure/List/delim-data (0.30s) === RUN TestAntagonisticAzureRead --- PASS: TestAntagonisticAzureRead (103.90s) === RUN TestParseAzureURL === RUN TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset === RUN TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT --- PASS: TestParseAzureURL (0.00s) --- PASS: TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset (0.00s) --- PASS: TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT (0.00s) === RUN TestMakeAzureStorageURLFromEnvironment === RUN TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud === RUN TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud --- PASS: TestMakeAzureStorageURLFromEnvironment (0.00s) --- PASS: TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud (0.00s) --- PASS: TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud (0.00s) PASS Target //pkg/cloud/azure:azure_test up-to-date: _bazel/bin/pkg/cloud/azure/azure_test_/azure_test INFO: Elapsed time: 159.865s, Critical Path: 152.35s INFO: 66 processes: 2 internal, 64 darwin-sandbox. INFO: Build completed successfully, 66 total actions //pkg/cloud/azure:azure_test PASSED in 139.9s INFO: Build completed successfully, 66 total actions ``` 80705: kvclient: fix gRPC stream leak in rangefeed client r=tbg,srosenberg a=erikgrinaker When the DistSender rangefeed client received a `RangeFeedError` message and propagated a retryable error up the stack, it would fail to close the existing gRPC stream, causing stream/goroutine leaks. Release note (bug fix): Fixed a goroutine leak when internal rangefeed clients received certain kinds of retriable errors. 80762: joberror: add ConnectionReset/ConnectionRefused to retryable err allow list r=miretskiy a=adityamaru Bulk jobs will no longer treat `sysutil.IsErrConnectionReset` and `sysutil.IsErrConnectionRefused` as permanent errors. IMPORT, RESTORE and BACKUP will treat this error as transient and retry. Release note: None 80773: backupccl: break dependency to testcluster r=irfansharif a=irfansharif Noticed we were building testing library packages when building CRDB binaries. $ bazel query "somepath(//pkg/cmd/cockroach-short, //pkg/testutils/testcluster)" //pkg/cmd/cockroach-short:cockroach-short //pkg/cmd/cockroach-short:cockroach-short_lib //pkg/ccl:ccl //pkg/ccl/backupccl:backupccl //pkg/testutils/testcluster:testcluster Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Nathan Lowe <nathan.lowe@spacex.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Many details left to be sussed out.
TODOs:
Release note: None