-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migration: add scaffolding for the migrations manager #56368
migration: add scaffolding for the migrations manager #56368
Conversation
This callback will be called after validating a `SET CLUSTER SETTING version` but before executing it. It will be used in future PRs to execute arbitrary migrations to allow us to eventually remove code to support legacy behavior. This diff was pulled out of the long-running migrations prototype (cockroachdb#56107). For more details, see the RFC (cockroachdb#48843). Release note: None
864c7c2
to
f18ffa7
Compare
Package migration captures the facilities needed to define and execute migrations for a crdb cluster. These migrations can be arbitrarily long running, are free to send out arbitrary requests cluster wide, change internal DB state, and much more. They're typically reserved for crdb internal operations and state. Each migration is idempotent in nature, is associated with a specific cluster version, and executed when the cluster version is made activate on every node in the cluster. Examples of migrations that apply would be migrations to move all raft state from one storage engine to another, or purging all usage of the replicated truncated state in KV. A "sister" package of interest is pkg/sqlmigrations. --- This commit only introduces the basic scaffolding and wiring from existing functionality. We'll flesh in the missing bits in future commits. Release note: None
f18ffa7
to
2850729
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 5 of 5 files at r2, 3 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @knz)
pkg/migration/manager.go, line 46 at r2 (raw file):
// Registry. Be sure to key it in with the new cluster version we just added. // During cluster upgrades, once the operator is able to set a cluster version // setting that's past the version that was introduced (typically the major
at or past, right? If I attach a migration to 21.1.0-185
, it will be run at the step going from 21.1.0-184
to 21.1.0-185
. If I introduce VersionRemoveLegacyTombstone
, I want to have removed legacy tombstones when that version is active.
pkg/migration/manager.go, line 79 at r2 (raw file):
// TODO(irfansharif): Should we inject every ctx here with specific labels // for each migration, so they log distinctly? Do we need an AmbientContext? _ = logtags.AddTag(ctx, "migration-mgr", nil)
I like the idea of annotating the context with the current work item. I think no ambient context is needed, the caller can pass a populated context already.
pkg/migration/manager.go, line 111 at r2 (raw file):
// TODO(irfansharif): We'll also want a testing override here to be able // to stub out migrations as needed. _ = Registry[version]
I think the most canonical way to get this is to return &Manager{registry: Registry}
in NewManager
and in tests, you simply put another registry in. No need for knobs at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of populating intermediate commits with API Stubs and "TODO" comments. This makes the review much easier. I will reuse your idea.
Reviewed 3 of 3 files at r1, 4 of 5 files at r2, 3 of 3 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)
The migration manager will need all of the above in order to execute migrations. It'll need: - A `Dialer`, to send out migration RPCs to individual nodes in the cluster. - A handle on `Liveness`, to figure out which nodes are part of the cluster - An `Executor`, for migrations to inspect/set internal SQL state, and to log progress into a dedicated system table - A `kv.DB`, for migrations to inspect/set internal KV state, and to send out Migrate requests to ranges for execute below-Raft migrations For more details, see the RFC (cockroachdb#48843). The fully "fleshed" out version of this manager was originally prototyped in cockroachdb#56107. This PR is simply pulling out the boring bits from there to move things along. Release note: None
2850729
to
53d8746
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will reuse your idea.
Heh, I remember getting this idea from you during a 1:1. Thanks for the reviews!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz and @tbg)
pkg/migration/manager.go, line 46 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
at or past, right? If I attach a migration to
21.1.0-185
, it will be run at the step going from21.1.0-184
to21.1.0-185
. If I introduceVersionRemoveLegacyTombstone
, I want to have removed legacy tombstones when that version is active.
yep, here the thing I was getting at was that the user would only be able to set major versions as the active cluster version (so 21.1.0
for e.g.), and not the intermediate versions directly. I still need to do some work to properly introduce these fence versions, and the semantics of wanting to have removed legacy tombstones once VersionRemoveLegacyTombstone
is active is also what I'm aiming to go for. TODO.
Build succeeded: |
56476: server: introduce the `Migration` service r=irfansharif a=irfansharif The upcoming migration manager (prototyped in #56107) will want to execute a few known RPCs on every node in the cluster. Part of being the "migration infrastructure", we also want authors of individual migrations to be able to define arbitrary node-level operations to execute on each node in the system. To this end we introduce a `Migration` service, and populate it with the two known RPCs the migration manager will want to depend on: - ValidateTargetClusterVersion: used to verify that the target node is running a binary that's able to support the given cluster version. - BumpClusterVersion: used to inform the target node about a (validated) cluster version bump. Both these RPCs are not currently wired up to anything, and BumpClusterVersion will be fleshed out just a tiny bit further in a future PR, but they'll both be used to propagate cluster version bumps across the crdb cluster through direct RPCs, supplanting our existing gossip based distribution mechanism. This will let the migration manager bump version gates in a more controlled fashion. See #56107 for what that will end up looking like, and see the long-running migrations RFC (#48843) for the motivation. Like we mentioned earlier, we expect this service to pick up more RPCs over time to service specific migrations. Release note: None --- Ignore the first four commits. They're from #56368 and #56474 Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
56480: settings,migration: disconnect cluster version from gossip r=irfansharif a=irfansharif ...in favor of direct RPCs to all nodes in the cluster. It uses the building blocks we've added thus far to replace the use of gossip to disseminate the cluster version. It does so by sending out individual RPCs to each node in the cluster, informing them of a version bump, all the while retaining the same guarantees provided by our (now previously) gossip-backed mechanism. This is another in the series of PRs to introduce long running migrations (#48843), pulled out of our original prototype in #56107. This diff has the following "pieces": - We disconnect the version setting updates through gossip (by disconnecting the setting type within the updater process) - We use the `Migration` service to send out RPCs to each node in the cluster, containing the payload that each node would otherwise receive through gossip. We do this by first introducing two primitives in pkg/migrations: - `RequiredNodes` retrieves a list of all nodes that are part of the cluster. It's powered by `pkg/../liveness`. - `EveryNode` is a shorthand that allows us to send out node-level migration RPCs to every node in the cluster. We combine these primitives with the RPCs introduced in #56476 (`ValidateTargetClusterVersion`, `BumpClusterVersion`) to actually carry out the version bumps. - We expand the `clusterversion.Handle` interface to allow setting the active version directly through it. We then make use of it in the implementation for `BumpClusterVersion`. - Within `BumpClusterVersion`, we persists the cluster version received from the client node first, within `keys.StoreClusterVersionKey`, before bumping the version gate. This is a required invariant in the system in order for us to not regress our cluster version on restart. It was previously achieved by attaching callbacks on the version handle (`SetBeforeChange`). - We no longer need the callbacks attached to gossip to persist cluster versions to disk. We're doing it as part of the `BumpClusterVersion` RPC. We remove them entirely. - We use the active version provided by the join RPC to set the version setting directly (after having persisted it first). This too was previously achieved through gossip + the callback. Release note: None --- Only the last commit is of interest. All prior commits should be reviewed across #56476, #56474 and #56368. Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
The commits in this PR were pulled out of our original prototype of the
migrations manager (#56107). These are the "boring bits", and simply introduces
the scaffolding for the manager without materially hooking it up to anything.
It will be fleshed out in future PRs, following the direction set by our
prototype above. For more details, see the RFC (#48843).