*: long-running migrations #39182

danhhz · 2019-07-30T22:45:32Z

Backward compatibility is a necessary part of storage infrastructure, but we'd like to minimize the associated technical debt. Initially our migrations were one-off, required complicated reasoning, and under tested (example: the format version stuff that was added for column families and interleaved tables). Over time, we've added frameworks to help with this, but there's one notable gap in our story: long-running data migrations.

An example that's come up a number of times is rewriting all table descriptors in kv. An extreme example that we may have to do one day might be rewriting all data on disk.

Note that the format versions mentioned above do a mini-migration on each table descriptor as it enters the node (from kv, gossip, etc). Nothing guarantees that all table descriptors eventually get rewritten. So even though this has been around since before 1.0, the format version migrations have to stay around as permanent technical debt. The FK migration currently underway will have a similar problem.

The format version code is a small amount of debt, but it'd be nice to get rid of it. Other examples are not so simple. The learner replicas work in 19.2 replaces preemptive snapshots, but after we stop using preemptive snapshots, we need to completely flush them out of the system before the code can be deleted. One of these places is an interim state between when a preemptive snapshot has been written to rocksdb and when raft has caught up enough to make it into a real replica. To flush these out, after we stop sending preemptive snapshots, we'll need to iterate each range descriptor in kv and ensure that it is finished being added or is removed.

More examples from @tbg:

GenerationComparable (make sure that 20.1 can assume that there are no legacy generations out there) - not yet sure how exactly this is achieved but should be doable
the various Raft migrations (RangeAppliedState, unreplicated truncated state, etc) which all boil down to "run something through Raft until they're all active on all ranges but 99.99% sure they're all active already anyway"

We currently have two main frameworks for migrations. They go by different names in various places, but I'll call them startup migrations and cluster versions.

Startup Migrations

Startup migrations (in package pkg/sqlmigrations and called "Cluster Upgrade Tool" by the RFC) are used to ensure that some backwards-compatible hook is run before the code that needs it. An example of this is adding a new system table. A node that doesn't know about a system table simply doesn't use it, so it's safe to add one in a mixed version cluster.

When the first node of the new version starts up and tries to join the cluster, it notices that the migration hasn't run, runs it, and saves the result durably in kv. Any nodes that start after see that the migration has been run and skip it.

If a second node of the new version tries to join the cluster before the migration has completed, it blocks until the migration finishes. This means that startup migrations need to be relatively quick. In an ideal world, every user would do version upgrades with a rolling restart, waiting for each node to be healthy and settled before moving on. But it's not Making Data Easy if a non-rolling restart blocks every node on some extremely long startup migration, causing a total outage.

On the other hand, by running at startup, all code in the new version can assume that startup migrations have been run. This means there doesn't need to be a fallback path and that any old code can be deleted immediately.

Cluster Versions

Cluster versions are a special case of our gossip-based configuration settings. They allow for backward-incompatible behaviors to be gated by a promise from the user that they'll never downgrade. An example of this is adding a field to an existing rpc request protobuf. A node doesn't want to send such a request until it is sure that it will be handled correctly on the other end. Because of protobuf semantics, if it went to an old node, the field would be silently ignored.

Cluster versions tie together two concepts. First, a promise to the user that we don't have to worry about rpc or disk compatibility with old versions anymore. Second, a feature gate that is unlocked by that promise. There is an ordered list of these, one for each feature being gated.

Because the feature gate is initially off when a new version is rolled onto, each check of the gate needs to have a fallback. New features can return an error informing the user to finish the upgrade and bump the version, but other uses need to keep old code around for a release cycle.

Aside: To make it easier for users that don't want the control, the cluster version is automatically bumped some period of time after all nodes have been upgraded. A setting is available to disable this for users that want to keep the option to roll back until they've manually signed off on the new version.

Data Migrations

Summary: I propose a partial unification of these two systems. To avoid having three migration frameworks, they will be based on and replace Cluster Versions. Separate the two ClusterVersion concepts described above so that we can execute potentially long-running hooks in between.

The interface presented to the user is essentially the same, but slightly expanded. After rolling onto a new version of cockroach, some features are not available until the cluster version is bumped. Either the user does this manually or the auto-upgrade does. Instead of the version bump finishing immediately, a system job is created to do the upgrade, enabling the feature gates as it progresses. This is modeled as a system job both to make sure only one coordinator is running as well as exposing progress to the user. Once the job finishes (technically as each step finishes) the gated features become available to the user.

The interface presented to CockroachDB developers is also mostly the same. Each version in the list is now given the option of including a hook, which is allowed to take a while (TODO be more specific) to finish.

Details:

The hook, if given, runs after the feature gate is enabled. This simple building block can be used to build more complex migrations by using multiple gate+hook features.
To make it simpler to reason about the code that goes in the hook, the system guarantees that the gate has been pushed to every node in the system. That is, every IsActive(feature) check will always return true on every node from now on. This was not previously guaranteed and the implementation is mostly tricky around handling nodes that are unavailable when the gate is pushed.
Like startup migrations, the hook must be idempotent.
Complications such as flushing caches, etc are left to the hooks. If any compelling patterns emerge, they could be baked into the framework in the future.
Startup migrations then conceptually become very similar to this. They currently implement their own coordinator leasing system and there is an opportunity for code unification by moving them to this new system job type.

Side node: This is all very nearly the long-term vision laid out in https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20161024_cluster_upgrade_tool.md

Note to self: https://reviewable.io/reviews/cockroachdb/cockroach/38932#-LlaULyp9sd2JS_tyaKi:-LlaULyp9sd2JS_tyaKj:bfh2kib

The text was updated successfully, but these errors were encountered:

danhhz · 2019-07-30T22:48:54Z

@tbg Does this accurately reflect our conversation? @bdarnell This is still pretty handwave-y, but I'd love some initial feedback if you have a minute.

tbg · 2019-07-31T08:58:15Z

@danhhz yes it does!
Not sure what your thinking here was, but feels that there doesn't have to be anything special about "major" versions (those corresponding to releases) except that by convention the major version usually doesn't have a hook associated to it (because this hook won't have run by the time the major version is active, so it's better put in an earlier hook). For example, if we have the versions

19.1 (active)
19.1-1
19.1-2
19.1-3

(i.e. we're in the 19.2 cycle, and some migrations exist, but no 19.2 release version yet) then the data migration mechanism would move from 19.1-0 (the initial version) to 19.1-1, 19.1-2, 19.1-3 as you'd expect. Later if, that cluster got restarted into a binary that also has the release version 19.2 (=19.2-0), it'd move to that. I'm not sure whether the automatic upgrade mechanism does this today, but it should (and I hope it already does).

Were you planning to have an individual job for each transition (19.1-1 to 19.1-2 for example, and we only use jobs for transitions that have a hook)? This also would let us account for the duration of each step and make obvious to the user where we're at. This seems both most straightforward and conceptually easiest to me, better than trying to group multiple steps into one job.

As an aside, we're also planning to move to "negative" unstable versions at some point, so the above history would have taken the form

19.2-(-999)
19.2-(-998)
19.2-(-997)
19.2 (=19.2-0)

instead, so that a 19.2-alpha won't pretend to be compatible with a 19.1 release. I don't think that affects your plans here at all since in both you just move "step by step" through the unstable versions in the order in which they appear (and sort).

bdarnell · 2019-07-31T18:01:52Z

The hook, if given, runs after the feature gate is enabled.

What happens at the end of the hook? That is, how does the code and/or the admin learn that everything is finished and it's possible to move on to the next version? Do we now gossip two cluster versions, one tracking the feature gates and one for the migration completion?

danhhz · 2019-07-31T18:23:50Z

The admin runs one command that triggers a job for all necessary migrations. I've been imagining that the job steps through each version. For each step, it first communicates the version to the code, unblocking feature gates (I'd like to have stronger guarantees about this than we've had in the past, how this works is the part that is currently most handwave-y), then runs the hook, then moves on to the next one. Progress would be communicated to the user via the job. I don't think two versions are necessary.

bdarnell · 2019-07-31T18:28:50Z

So then the new step in the upgrade process will be something like "check the jobs page to make sure the migration job from the previous upgrade has finished before starting a new upgrade"? We should also consider scripting for this for embedded users who need to do upgrades in a hands-off way (and might bundle multiple upgrades in a relatively short time frame, and/or have pathological conditions that could cause migrations to get hung for a very long time).

danhhz · 2019-07-31T18:56:24Z

Given that this unblocks the shiny new features as it goes, I was imagining the upgrade docs would have a step at the end directing the user to monitor the job until it finishes. But you're right that we should mention it at the beginning, too, just in case.

I'm in favor of making it scriptable. There's some bikeshedding to be done about whether it's one SQL command that blocks for the entire time or one to kick off the job and one to wait on it (IIRC the bulk io team has been favoring the latter recently).

One blocking command (strawman, ignore the syntax):

> COMMIT TO UPGRADE;
<waits a while>

Two commands (strawman, ignore the syntax):

> COMMIT TO UPGRADE;
job_id
12345
> BLOCK ON JOB 12345;
<waits a while>

We could also wrap this SQL in a cli tool if it makes it easier for embedded uses.

I share your concerns about multiple upgrades in a short time frame. In fact, this design was largely worked backward by starting with these assumptions: a) we'll need to build long-running migrations at some point b) we should make it really hard to mess up when upgrading one version and c) there should be some sane story for upgrading multiple versions.

I'm not entirely happy with how much this achieves (c) but at some point we fundamentally have to choose between blocking on startup if they're going too quickly or requiring long-running migrations to work while spanning multiple major versions (which makes them less useful for eliminating tech debt). I should make this more explicit above, but my current thinking is that if you start on major version X, roll to Y, then start rolling to Z without the Y upgrade being committed, then the Z nodes would block on startup.

Hung migrations is a good concern, I hadn't thought of it. I'm not sure what we could do besides putting something in whatever alerting we build and your idea to document checking in on it at the beginning of the next upgrade. Any opinions here?

danhhz · 2019-07-31T21:46:49Z

Nathan pointed out in our 1:1 that my thinking around nodes rejoining after being partitioned off when a feature gate bump is pushed out is probably too harsh. @nvanbenschoten do you mind writing up your concern here so I'm sure I have it right?

bdarnell · 2019-08-01T18:15:51Z

I like the two command "block on job" variant because it's extensible to other kinds of jobs. (syntax strawman: SHOW UPGRADE STATUS instead of COMMIT TO UPGRADE. Don't use the verb COMMIT for something other than transaction commits)

then start rolling to Z without the Y upgrade being committed, then the Z nodes would block on startup.

Maybe the Z nodes should crash instead of blocking, which is more likely to trigger rolling downgrades in the deployment tooling. Either crashing or blocking is probably the best we can do here. Blocking is only better than crashing if we can be confident that some of the Y nodes will stick around to complete the process.

I'm not sure what we could do besides putting something in whatever alerting we build and your idea to document checking in on it at the beginning of the next upgrade. Any opinions here?

I don't think there's much we can do besides documenting it and providing tools to check the status and wait.

danhhz · 2019-08-15T01:22:47Z

I whiteboarded the details of this with @tbg today and we realized that a lot of the scope I was hoping to cut to get this into 19.2 is not able to be cut. So, I now think this is a long shot. I’m going to keep working on the prototype and see where I get

danhhz · 2019-09-24T20:56:00Z

Note to self: another possible use of this from @ajkr

Also I forgot to mention this but Pebble might need to be compatible with multiple RocksDB versions, simply because upgrading to 20.1 doesn't necessarily mean the data was rewritten with Pebble by the time a user upgrades to 20.2. A full compaction would be needed to guarantee that. In fact maybe they turn on Pebble in 20.1 and it sees data written by whatever RocksDB version we were using in 1.0. Is there anything preventing this?

irfansharif · 2019-11-25T16:02:09Z

Another possible use case for this would be the raft storage engine (#7807). If we’re introducing a dedicated storage engine for “raft stuff”, there has to be a cut over point for each node running a crdb version with this dedicated engine code to scoop up all existing raft data from the existing storage engine, and funneling it into the new one. I had put down some thoughts two years ago here how this could be done.

nvanbenschoten · 2019-11-25T16:08:30Z

@irfansharif that can't just be done at startup? It's not clear from the RFC why this needs to be coordinated across replicas, given that the Raft log is currently stored in the unreplicated keyspace.

irfansharif · 2019-11-25T16:24:00Z

It can/should be, by "cutover point" I was looking at a node-centric view, not a cluster wide one. Mentioned it all here because the proposal above talks about unifying startup migrations with cluster versions.

danhhz · 2019-11-25T16:44:28Z

Can we do it at startup though? Some of the other motivating examples could be done at startup, but we avoid long-running migrations at startup to protect against users rolling onto a version too fast. I'd be uncomfortable reading all raft data from one engine and writing it into another at startup, if that's what's being discussed here.

irfansharif · 2019-11-25T16:54:51Z

to protect against users rolling onto a version too fast.

Do you mind elaborating on this? I don't quite follow. We are discussing reading all raft data from one engine and writing it into another at startup.

danhhz · 2019-11-25T17:22:20Z

If you're doing a vX to vX+1 migration, you roll each node onto the next version. Ideally, you roll one node, wait for it to health check, and then roll the next node, repeating until you've done them all.

The worrysome case is a user that rolls them without the health checks. If we do too much work when first starting a node in the new version, then rolling them too fast will result in a cluster that is down for as long as the migrations take. Obviously, the user should be doing the right thing here, but extended total unavailability when they mess up is pretty bad.

Another thing to consider here is that it's likely difficult to maintain the ability to roll back to vX with an at-startup migration to a new raft log engine.

TODO: see if this migration is actually "short-running". That is, in a sufficiently large cluster, does this cause significant load? ---- This is a baby version of cockroachdb#39182 that handles only short-running migrations but is useful in itself because it allows us to migrate us fully into the following two KV below-Raft migrations: 1. use the RangeAppliedState on all ranges 2. use the unreplicated TruncatedState on all ranges These two migrations have been around for a while and it has been getting in the way of things. While ideally we introduce cockroachdb#39182 in the near future and use it to address a larger class of migration concerns, this PR serves as a starting point and work done on cockroachdb#39182 should be able to absorb this PR with relative ease. With this PR, legacy code related to 1) and 2) above can be removed from `master` once the 20.1 branch is cut, i.e. roughly in April 2020. The main ingredients in this PR are a) introduce a hook that is called during `SET CLUSTER SETTING version`, after the change has been validated but before it is made. b) introduce a KV-level `Migrate` ranged write command that triggers the migrations for 1) and 2) above. It is called from the hook for all of the keyspace. c) before returning to the client, `Migrate` waits for the command to be durably applied on all followers. Trying to get this 100% correct has proven tricky, perhaps foreshadowing similar issues that expect us once we try cockroachdb#39182 in earnest. For one, the mechanism only reaches replicas that members of the raft group, that is, it won't touch replicas which are gc'able. For the migrations at hand this means that in 20.2 there may - in theory - still be replicas that have a replicated truncated state. I believe that our recent efforts around not re-using replicas across replicaIDs has made sure that this isn't an issue for this particular migration, but in general it will have to remain on the radar. Similarly, it seems hard to prove conclusively that no snapshot is in-flight that would set up a new follower with a state predating the explicit migration, though it would be exceptionally rare in practice. Release note (general change): version upgrades now perform maintenance duties that may slightly delay the `SET CLUSTER SETTING version` command and may cause a small amount of additional load on the cluster while an upgrade is being finalized.

rohany · 2019-12-18T18:50:04Z

Is there anymore discussion about this? This would be very useful for the FK migration work right now (we want to upgrade all table descriptors as a migration).

danhhz · 2019-12-18T19:20:05Z

I have it on my backburner to finish the prototype, but noone should block on me for anything. @irfansharif has expressed an interest in this area at one point, dunno if anything has changed there.

irfansharif · 2019-12-18T20:28:27Z

The issue with table descriptors (for my own reference).

There's a new FK representation (as of 19.2) and we want to make sure all the table descriptors have it in 20.1. If old table descriptors (the 19.1 representation) are left lying around, we have to keep maintaining the "upgrade on read" code path introduced in 19.2. For a running cluster, there's currently the possibility that certain tables would have not been read since 19.1 and thus would still be identified using the old representation.

Having the migration story here ironed out would reduce this maintenance burden + build confidence in table descriptor version upgrades.

spaskob · 2020-01-10T14:39:21Z

Handling jobs from 2.0 clusters may also be relevant:

cockroach/pkg/jobs/registry.go

Line 766 in d416681

if log.V(2) {

It will be good to remove case from the codebase.

We split off ClusterVersion out of pkg/settings/cluster into a dedicated pkg/clusterversion. This is to pave the way for cockroachdb#39182 where we introduce long running version migration infrastructure. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>

45143: ui: Release notes signup r=dhartunian a=koorosh Resolves: #43912 Depends on: #44744, #44821, #44856 - [x] WIP. Current branch has two branches merged which haven't been approved/merged in master branch yet. - [x] rebase branch to remove merge commits from branches other than master. Add Release notes signup form to Cluster Overview page right after page title. Release Notes signup view is created in `src/views/dashboard` directory because it will be the main page where we display this view. And Cluster Overview page is a temporary place while Dashboard view doesn't exist. These changes integrate three main parts: ReleaseNotesForm, AlertNotification component and custom analytics saga. 45426: coldata: minor tweak of flat bytes r=yuzefovich a=yuzefovich This commit changes `maybeBackfillOffsets` to update `maxSetIndex` accordingly (this might be a minor performance improvement). In a sense, when we're backfilling offsets, we *are* setting indices to point to empty `[]byte` slices. Also, the logic for `Set` method is slightly refactored. Release note: None 45455: clusterversion: significantly rework cluster version handling r=tbg a=irfansharif We split off ClusterVersion out of pkg/settings/cluster into a dedicated pkg/clusterversion. This is to pave the way for #39182 where we introduce long running version migration infrastructure. We then introduce clusterversion.Handle as a read only view to the active cluster version and this binary's version details. We similarly fold in the actual global cluster version setting into pkg/clusterversion, and enforce all external usage to go through clusterversion.Handle. We can also hide away external usage of the baked in cluster.Binary{,MinimumSupported}Version constants. Instead we have the implementation of clusterversion.Handle allow for tests to override binary and minimum supported versions as needed. Along the way we clean up Settings.Initialized, as it is not really used. We document all the various "versions" in play ("binary version", "minimum supported version", "active version") and change usage of what was previously "serverVersion" to simply be "binaryVersion", because that's what it is. We also clean up the Settings constructors into Test/Non-Test types that differ in cluster version setting initialization behaviour. --- For reviewers: It's probably easiest to start with what pkg/settings/cluster/cluster_settings.go looks like, then following into pkg/clusterversion/cluster_version.go and then pkg/clusterversion/setting.go. --- I still don't like the following detail about our pkg structure: - pkg/settings/cluster depends on it's "parent" pkg, pkg/settings - pkg/settings/cluster depends pkg/clusterversion, which in turn depends on pkg/settings Naively, pkg/settings/cluster should be a top level package, but I'm not making that change in this PR. For now I've renamed the settings.go file to cluster_settings.go. Release note: None 45535: Revert "storage,libroach: update MVCCIncrementalIterator to look at every updated key" r=pbardea a=pbardea Reverts #45163 To stop the errors we're seeing on #45524. Will investigate further once it's off master. Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com> Co-authored-by: Paul Bardea <pbardea@gmail.com>

45455: clusterversion: significantly rework cluster version handling r=irfansharif a=irfansharif We split off ClusterVersion out of pkg/settings/cluster into a dedicated pkg/clusterversion. This is to pave the way for #39182 where we introduce long running version migration infrastructure. We then introduce clusterversion.Handle as a read only view to the active cluster version and this binary's version details. We similarly fold in the actual global cluster version setting into pkg/clusterversion, and enforce all external usage to go through clusterversion.Handle. We can also hide away external usage of the baked in cluster.Binary{,MinimumSupported}Version constants. Instead we have the implementation of clusterversion.Handle allow for tests to override binary and minimum supported versions as needed. Along the way we clean up Settings.Initialized, as it is not really used. We document all the various "versions" in play ("binary version", "minimum supported version", "active version") and change usage of what was previously "serverVersion" to simply be "binaryVersion", because that's what it is. We also clean up the Settings constructors into Test/Non-Test types that differ in cluster version setting initialization behaviour. --- For reviewers: It's probably easiest to start with what pkg/settings/cluster/cluster_settings.go looks like, then following into pkg/clusterversion/cluster_version.go and then pkg/clusterversion/setting.go. --- I still don't like the following detail about our pkg structure: - pkg/settings/cluster depends on it's "parent" pkg, pkg/settings - pkg/settings/cluster depends pkg/clusterversion, which in turn depends on pkg/settings Naively, pkg/settings/cluster should be a top level package, but I'm not making that change in this PR. For now I've renamed the settings.go file to cluster_settings.go. Release note: None 45515: sql: rationalize some output events from the connection state machine r=andreimatei a=andreimatei See individual commits. Release note: None 45582: colexec: plan disk-spilling enabled operators when vectorize=auto r=yuzefovich a=asubiotto Each commit turns on one of the HashRouter, Sorter, and HashJoiner and includes the relevant test changes (all plan changes). The only thing of note is that explain analyze plans include both row and column stats for wrapped operators. It seems like this is expected though (according to `vectorize_local`) and changing this naively results in other failures (`unexpectedly not collecting stats`), so will leave this up to discussion of whether we want to change this. Regardless, I think it is out of scope for this PR. Closes #45172 Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com>

irfansharif · 2020-05-15T15:48:57Z

RFC here: #48843.

TODO: see if this migration is actually "short-running". That is, in a sufficiently large cluster, does this cause significant load? ---- This is a baby version of cockroachdb#39182 that handles only short-running migrations but is useful in itself because it allows us to migrate us fully into the following two KV below-Raft migrations: 1. use the RangeAppliedState on all ranges 2. use the unreplicated TruncatedState on all ranges These two migrations have been around for a while and it has been getting in the way of things. While ideally we introduce cockroachdb#39182 in the near future and use it to address a larger class of migration concerns, this PR serves as a starting point and work done on cockroachdb#39182 should be able to absorb this PR with relative ease. With this PR, legacy code related to 1) and 2) above can be removed from `master` once the 20.1 branch is cut, i.e. roughly in April 2020. The main ingredients in this PR are a) introduce a hook that is called during `SET CLUSTER SETTING version`, after the change has been validated but before it is made. b) introduce a KV-level `Migrate` ranged write command that triggers the migrations for 1) and 2) above. It is called from the hook for all of the keyspace. c) before returning to the client, `Migrate` waits for the command to be durably applied on all followers. Trying to get this 100% correct has proven tricky, perhaps foreshadowing similar issues that expect us once we try cockroachdb#39182 in earnest. For one, the mechanism only reaches replicas that members of the raft group, that is, it won't touch replicas which are gc'able. For the migrations at hand this means that in 20.2 there may - in theory - still be replicas that have a replicated truncated state. I believe that our recent efforts around not re-using replicas across replicaIDs has made sure that this isn't an issue for this particular migration, but in general it will have to remain on the radar. Similarly, it seems hard to prove conclusively that no snapshot is in-flight that would set up a new follower with a state predating the explicit migration, though it would be exceptionally rare in practice. Release note (general change): version upgrades now perform maintenance duties that may slightly delay the `SET CLUSTER SETTING version` command and may cause a small amount of additional load on the cluster while an upgrade is being finalized.

irfansharif · 2020-12-30T06:14:05Z

the various Raft migrations (RangeAppliedState, unreplicated truncated state, etc) which all boil down to "run something through Raft until they're all active on all ranges but 99.99% sure they're all active already anyway"

#58088 uses the long running migrations infrastructure proposed in #48843 (x-ref linked PRs in #56107) to onboard exactly the above.

danhhz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv Anything in KV that doesn't belong in a more specific category. labels Jul 30, 2019

danhhz self-assigned this Jul 30, 2019

danhhz changed the title ~~CockroachDB Data Migrations~~ *: long-running migrations Jul 30, 2019

tbg mentioned this issue Nov 27, 2019

server: introduce a hook for short-running migrations #42822

Closed

irfansharif self-assigned this Mar 31, 2020

irfansharif mentioned this issue Mar 31, 2020

sql: safe to upgrade cluster query #44643

Closed

tbg mentioned this issue Apr 1, 2020

roachtest: acceptance/version-upgrade failed #44732

Closed

otan mentioned this issue Apr 8, 2020

*: datadriven approach to mixed cluster tests #47229

Closed

tbg mentioned this issue Apr 22, 2020

sql: version compat between SQL tenant and KV backend #47919

Closed

tbg mentioned this issue May 5, 2020

sql: upgrade multi-tenant clusters #48436

Closed

irfansharif mentioned this issue May 12, 2020

raft: why the raft log does not use a separate rocksdb instance? #46773

Closed

thoszhang mentioned this issue May 12, 2020

sql: long-running migration to finally remove pre-19.2 foreign keys #48743

Closed

irfansharif closed this as completed Dec 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: long-running migrations #39182

*: long-running migrations #39182

danhhz commented Jul 30, 2019 •

edited

Loading

danhhz commented Jul 30, 2019

tbg commented Jul 31, 2019

bdarnell commented Jul 31, 2019

danhhz commented Jul 31, 2019

bdarnell commented Jul 31, 2019

danhhz commented Jul 31, 2019

danhhz commented Jul 31, 2019

bdarnell commented Aug 1, 2019

danhhz commented Aug 15, 2019

danhhz commented Sep 24, 2019 •

edited

Loading

irfansharif commented Nov 25, 2019

nvanbenschoten commented Nov 25, 2019

irfansharif commented Nov 25, 2019

danhhz commented Nov 25, 2019

irfansharif commented Nov 25, 2019

danhhz commented Nov 25, 2019

rohany commented Dec 18, 2019

danhhz commented Dec 18, 2019

irfansharif commented Dec 18, 2019

spaskob commented Jan 10, 2020 •

edited by irfansharif

Loading

irfansharif commented May 15, 2020

irfansharif commented Dec 30, 2020

*: long-running migrations #39182

*: long-running migrations #39182

Comments

danhhz commented Jul 30, 2019 • edited Loading

Startup Migrations

Cluster Versions

Data Migrations

danhhz commented Jul 30, 2019

tbg commented Jul 31, 2019

bdarnell commented Jul 31, 2019

danhhz commented Jul 31, 2019

bdarnell commented Jul 31, 2019

danhhz commented Jul 31, 2019

danhhz commented Jul 31, 2019

bdarnell commented Aug 1, 2019

danhhz commented Aug 15, 2019

danhhz commented Sep 24, 2019 • edited Loading

irfansharif commented Nov 25, 2019

nvanbenschoten commented Nov 25, 2019

irfansharif commented Nov 25, 2019

danhhz commented Nov 25, 2019

irfansharif commented Nov 25, 2019

danhhz commented Nov 25, 2019

rohany commented Dec 18, 2019

danhhz commented Dec 18, 2019

irfansharif commented Dec 18, 2019

spaskob commented Jan 10, 2020 • edited by irfansharif Loading

irfansharif commented May 15, 2020

irfansharif commented Dec 30, 2020

danhhz commented Jul 30, 2019 •

edited

Loading

danhhz commented Sep 24, 2019 •

edited

Loading

spaskob commented Jan 10, 2020 •

edited by irfansharif

Loading