Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New rivermigrate Go API for running migrations from code + target version support #67

Merged
merged 2 commits into from
Dec 3, 2023

Conversation

brandur
Copy link
Contributor

@brandur brandur commented Nov 26, 2023

Here, as requested by several users, add a Go API that's able to run
migrations as an alternative to migrating via the CLI. We add a new
package rivermigrate to mirror the conventions of riverdriver and
rivertest.

rivermigrate is largely just a refactor of the existing internal
package internal/dbmigrate with a couple important tweaks:

  • We give it the ability to take a driver the same way as we do for
    river.Client and rivertest.

  • To mirror client and rivertest we add a transaction-specific variant
    that takes a TTx (MigrateTx).

  • Because we now have the non-transaction and transaction variants, I
    take away the Down/Up function distinctions, and make up/down a
    parameter to Migrate/MigrateTx instead (so we only need two
    functions instead of four).

  • I move to using an embed.FS instead of specific embedded strings for
    each version and the file system is walked to get all migrations
    within it. This will make adding new migrations in the future easier
    and more foolproof.

I also added a new option for TargetVersion so that it's possible to
do a limited migration to a specific version instead of going all the
way up or down. This is a pretty standard feature in most migration
frameworks.

This still can't support Goose because it takes an sql.DB, but I think
we can get there by adding a minimal driver for the sql package that
can run migrations and nothing else. More of this to be explored later.

@brandur brandur force-pushed the brandur-river-migrate branch 2 times, most recently from 37a5f32 to 6329b59 Compare November 26, 2023 02:01
@brandur brandur requested a review from bgentry November 26, 2023 02:07
@brandur brandur force-pushed the brandur-river-migrate branch 5 times, most recently from 7c0819d to 2f61b67 Compare November 26, 2023 16:03
@brandur brandur force-pushed the brandur-river-migrate branch 2 times, most recently from 71da00f to d1b9490 Compare November 26, 2023 23:23
Comment on lines 167 to 173
// When migrating down, use with caution. MigrateOpts.MaxSteps should be set to
// 1 to only migrate down one step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bit of a footgun. I would have been inclined to default to running a single migration at a time just to avoid this. Do you think it's too risky?

At the very least we might want the examples to show the single-migration usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see running one at a time to be somewhat defensible, but my fear is that it'll be the wrong thing for what might end up being the most common use of the Go migration API, which is to raise tables for test and development purposes.

In these cases what you most want out of the migrate API is for it to give you a fully functional River schema. Running one migration at a time doesn't do that, so you'd either have to hammer it into place by setting MaxSteps: 1000 or by setting a new TargetVersion every time a new River DB version comes out (not convenient).

I think I'd lean toward documenting this and making sure to have MaxSteps: 1 in most examples as you suggest.

Copy link
Contributor

@bgentry bgentry Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so much concerned with making it easy to go from 0 to N for test + dev, but with that being the default for this API in code. For example the CLI could easily default to running all migrations, even if the API here required you to pass a special value like -1 to achieve that. You're already using -1 for other purposes so that would need to be worked but that's the general idea of what I'm asking.

Running one migration at a time doesn't do that, so you'd either have to hammer it into place by setting MaxSteps: 1000 or by setting a new TargetVersion every time a new River DB version comes out (not convenient).

I think this is actually what I'd want for a production application, though. The reason is that the River code version is tied to the migration/schema version. When I'm upgrading between River releases that include new migrations, I'll need to also be sure to run the appropriate schema migrations prior to trying to run that newer version of River's code against the DB. At past jobs I would have added a new migration file to accompany the upgrade and deploy them at the same time. We may even want to go so far as to have the River Client check the schema version at launch and complain or error if it hasn't been migrated far enough yet.

Of course if this is what I want in prod it's also what I'll be practicing in dev, although test envs are potentially different (may want more of a schema:load than a db:migrate for speed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I still feel like this specific behavior:

// When migrating down, use with caution. MigrateOpts.MaxSteps should be set to
// 1 to only migrate down one step.

is too dangerous and should never be the default if I forget to specify some other arg(s).

// if err != nil {
// // handle error
// }
func (m *Migrator[TTx]) Migrate(ctx context.Context, direction Direction, opts *MigrateOpts) (*MigrateResult, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the other comment, I'm not quite sure about the API here. I'm thinking in terms of trying to make this foolproof and as difficult as possible to accidentally mess up your prod database, while also being sure exactly what's being run at a given step.

Rather than stating how many migrations to run, do you think it might be safer to specify which migration you want to run? This would require a bit of a different API, because it'd be something like RunUp(ctx, 1) or Run(ctx, rivermigrate.DirectionUp, 3). That forces the use case of running multiple migrations into a different API like RunN(), but I expect that use case to be more unusual. Also if you went that route, the N arg in RunN() could be an explicit arg and not something with a default zero value.

My thought process here is that if I'm running an app with River and I see that there's a new version out, I'll check the changelog or website for notes about new migrations in that version. If I see one, I'm going to make a new migration file in my goose/golang-migrate/etc migration setup where I have it call out to rivermigrate. I would want to be extra sure that I'm running exactly the migration I expect to be running so that I'm sure I'm prepared and have followed any related instructions if necessary.

Copy link
Contributor Author

@brandur brandur Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So both use cases are supported, with what you describe being the TargetVersion option. I originally had this even closer to what you described as just Version that'd run a single version one off, but I changed it after thinking about the various footguns and ergonomic issues that fell out of being able to target versions that are multiple steps beyond your current version, and would therefore have to error or be applied out of order. TargetVersion applies a target version and any versions up to it that are necessary based on your current database state.

I think this would be our general recommendation for usage when writing upgrade guides for specific versions, so the up migration would look like:

rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
     TargetVersion: 4,
}

And then the down migration something like:

rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
     TargetVersion: 3,
}

//
// or ... 
//
// (neither of these is strictly right when upgrading multiple versions at once — 
// the user would have to make sure to set their original version)
//

rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
     MaxSteps: 1,
}

See also the note above, but I didn't want TargetVersion to be the only API because I think there's going to be demand to use this in test suites and one-off small projects (which don't have a full migration line) where you just want to have River raise a usable schema with no hassle and without needing to update code when new River versions come out. Those can just use the really easy:

rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, nil)

Copy link
Contributor

@bgentry bgentry Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I missed this on the first pass, my bad! I think you've highlighted that there are indeed two distinct use cases here: get a working DB as fast as possible, vs give me control and certainty for integrating with my existing migration setup. Iff folks already have an existing DB they likely already have an existing migration setup, which for dev and tests could either consist of stepping through migrations one-at-a-time or doing a bulk schema load.

Since this API supports both, that's great! I want to make sure we at least document these conventions clearly and ensure there aren't any major footguns in them (per above thread):

  1. I'm just getting started with River in an existing migration-driven project. What code do I add to a new migration file to go from 0 to latest? It looks like this would be:
    rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
         TargetVersion: 3,
    }
    Because there's no max steps specified, it will run as many steps as necessary to get to the target version. Correct?
  2. I've already adopted river per the above with an older migration version (say, v2). What do I put in a new migration to ensure I go precisely from v2 to v3, no more and no less?
    rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
         TargetVersion: 3,
         MaxSteps: 1,
    }
    Correct?
  3. I've already adopted river per the above with an older migration version (say, v2) and multiple migrations have come out since my last upgrade. What do I put in a new migration to ensure I go precisely from v2 to i.e. v4, no more and no less?
    rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
         TargetVersion: 4,
         MaxSteps: 2,
    }
    Is that correct? Am I guaranteed to begin with up migration 3 in this scenario, because I've said I want to finish at target version 4 and want a max of 2 steps? Or will it try to check the current version, attempt to run CURRENT_VERSION+1 and CURRENT_VERSION+2 regardless of whether I've successfully ran all prior migrations? Will this API error if I'm requesting v5 with a max of 2 steps, but I'm currently on v2?
  4. For down migrations, I want to be 100% sure that I'm undoing exactly the same steps as I specified above, no more and no less. Does the API let me guarantee that?
  5. I don't want to integrate my River tables into an existing migration system and just want to use a CLI to quickly bring River to the latest version from wherever it is.
  6. I don't want to integrate my River tables into an existing migration system, but I still want the tight control and guarantees from (2) and (3) above.

For (2) and (3) above, would the recommendation be to use both TargetVersion and MaxSteps together? Or how would I guarantee I'm running only the code from migration 3's up migration, no matter what else I've done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (2) and (3) above, would the recommendation be to use both TargetVersion and MaxSteps together? Or how would I guarantee I'm running only the code from migration 3's up migration, no matter what else I've done?

No, not recommended — by including a MaxSteps with everything, you're making things strictly more dangerous.

Say you were trying to get to version 3, and you thought you were on version 2, so you had MaxSteps: 1. But actually, you were on version 1! So TargetVersion: 3, MaxSteps: 1 actually takes you to version 2 and then River might not start up properly.

You might combine these two options while testing something, but you wouldn't want to do so as part of any upgrade guides or the like. TargetVersion knows what it's trying to do and will do the right thing based on the current state of the local database.

For down migrations, I want to be 100% sure that I'm undoing exactly the same steps as I specified above, no more and no less. Does the API let me guarantee that?

Nope, and I don't think there's any way to do so without farming the work out to the user themselves. We can tell people that if they know they're incrementing exactly one version, they can use a MaxSteps: 1 , but otherwise they'll have to look at their current version and include that as TargetVersion: <version>.

One thing I should point out: you generally should never get to the point where you have to run down migrations for any app in production, because if you do, the battle's already lost. River may be an exception here to some degree, but chances are that down migrations are poorly tested and may be unexpectedly destructive/lossy. It's probably less risky to fix forward rather than run them in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not recommended — by including a MaxSteps with everything, you're making things strictly more dangerous.

Say you were trying to get to version 3, and you thought you were on version 2, so you had MaxSteps: 1. But actually, you were on version 1! So TargetVersion: 3, MaxSteps: 1 actually takes you to version 2 and then River might not start up properly.

You might combine these two options while testing something, but you wouldn't want to do so as part of any upgrade guides or the like. TargetVersion knows what it's trying to do and will do the right thing based on the current state of the local database.

Alright, this is a fair point. If you're trying to get to v4 (the current version) and you know this, it should be safe to let the migration engine do what it needs to in order to get to that version. On the down migrations, you could get bit by not migrating down far enough (if you were actually further behind than you thought you were and set a MaxSteps) but that could be uncovered through actually testing your up + down migrations together.

The overall design here should be workable as long as we document all these common use cases clearly.

Comment on lines 125 to 128
// apply. e.g. If requesting an up migration with version 3, version 3 not
// already be applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// apply. e.g. If requesting an up migration with version 3, version 3 not
// already be applied.
// apply. e.g. If requesting an up migration with version 3, version 3 must
// not already be applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

Copy link
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving the PR broadly, nice work on this. However I do want to reiterate that I think defaulting down migrations to unlimited steps is super dangerous and we should think about adding a failsafe for that scenario. I can't think of many (any?) reasons why someone would really ever want to do that, let alone have it be the default. Thoughts on that remaining bit?

@brandur brandur force-pushed the brandur-river-migrate branch 4 times, most recently from a4ed879 to 89489ee Compare December 3, 2023 00:29
@brandur
Copy link
Contributor Author

brandur commented Dec 3, 2023

@bgentry Okay I added a new commit here to make some changes that I think are along the lines of what you requested:

  • When migrating down, defaults to moving only one step for safety. The asymmetry of up/down still feels a little strange to me, but I acknowledge that the benefits outweigh that downside here. I've added documentation on each migrate function and on MaxSteps to make it clear what the default is supposed to be.
  • Because otherwise with this change there'd be no way of removing River's schema completely. Setting TargetVersion to the special value of -1 causes the migrator to migrate all the way down to zero out all of River's migrations.

Seem okay?

Copy link
Contributor

@bgentry bgentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the different behavior on down is asymmetric and feels a bit odd, but if it saves just one person from accidentally nuking their jobs table by mistake it seems worthwhile IMO. Most people shouldn't be migrating down in a non-dev environment to begin with.

Nice work on this :shipit:

…ersion support

Here, as requested by several users, add a Go API that's able to run
migrations as an alternative to migrating via the CLI. We add a new
package `rivermigrate` to mirror the conventions of `riverdriver` and
`rivertest`.

`rivermigrate` is largely just a refactor of the existing internal
package `internal/dbmigrate` with a couple important tweaks:

* We give it the ability to take a driver the same way as we do for
  `river.Client` and `rivertest`.

* To mirror client and `rivertest` we add a transaction-specific variant
  that takes a `TTx` (`MigrateTx`).

* Because we now have the non-transaction and transaction variants, I
  take away the `Down`/`Up` function distinctions, and make up/down a
  parameter to `Migrate`/`MigrateTx` instead (so we only need two
  functions instead of four).

* I move to using an `embed.FS` instead of specific embedded strings for
  each version and the file system is walked to get all migrations
  within it. This will make adding new migrations in the future easier
  and more foolproof.

I also added a new option for `TargetVersion` so that it's possible to
do a limited migration to a specific version instead of going all the
way up or down. This is a pretty standard feature in most migration
frameworks.

This still can't support Goose because it takes an `sql.DB`, but I think
we can get there by adding a minimal driver for the `sql` package that
can run migrations and nothing else. More of this to be explored later.
@brandur brandur force-pushed the brandur-river-migrate branch 2 times, most recently from 29b1f00 to d3d7cc7 Compare December 3, 2023 03:27
… unlimited

A few tweaks to the originally proposed migration API:

* When migrating down, defaults to a maximum of one step by default.
  This is meant as a safety feature so that it's harder to remove the
  River schema completely by accident.

* When migrating down, `TargetVersion` can be set to `-1` explicitly to
  move an unlimited number of steps. This adds a way to remove the River
  schema completely by migrating to the equivalent of version zero (we
  can't use `0` for this because that's the default value of `int`).
@brandur brandur force-pushed the brandur-river-migrate branch from d3d7cc7 to 73e87ae Compare December 3, 2023 03:29
@brandur
Copy link
Contributor Author

brandur commented Dec 3, 2023

Awesome! Okay, this will still need more work to be function with Goose's sql.DB, but it's a start on getting that working.

@brandur brandur merged commit 9871811 into master Dec 3, 2023
7 checks passed
@brandur brandur deleted the brandur-river-migrate branch December 3, 2023 03:32
@brandur
Copy link
Contributor Author

brandur commented Dec 3, 2023

Cut release 0.0.12. I'll write up some additional docs for the site too.

brandur added a commit that referenced this pull request Dec 3, 2023
Follow up #67 by adding an example test for the new Go migration API.
Helps provide a more copy/pastable example for River's godoc, and
something we can link to from our other docs.
brandur added a commit that referenced this pull request Dec 7, 2023
Follow up #67 by adding an example test for the new Go migration API.
Helps provide a more copy/pastable example for River's godoc, and
something we can link to from our other docs.
brandur added a commit that referenced this pull request Dec 7, 2023
Follow up #67 by adding an example test for the new Go migration API.
Helps provide a more copy/pastable example for River's godoc, and
something we can link to from our other docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants