-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New rivermigrate
Go API for running migrations from code + target version support
#67
Conversation
37a5f32
to
6329b59
Compare
7c0819d
to
2f61b67
Compare
71da00f
to
d1b9490
Compare
rivermigrate/river_migrate.go
Outdated
// When migrating down, use with caution. MigrateOpts.MaxSteps should be set to | ||
// 1 to only migrate down one step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bit of a footgun. I would have been inclined to default to running a single migration at a time just to avoid this. Do you think it's too risky?
At the very least we might want the examples to show the single-migration usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see running one at a time to be somewhat defensible, but my fear is that it'll be the wrong thing for what might end up being the most common use of the Go migration API, which is to raise tables for test and development purposes.
In these cases what you most want out of the migrate API is for it to give you a fully functional River schema. Running one migration at a time doesn't do that, so you'd either have to hammer it into place by setting MaxSteps: 1000
or by setting a new TargetVersion
every time a new River DB version comes out (not convenient).
I think I'd lean toward documenting this and making sure to have MaxSteps: 1
in most examples as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so much concerned with making it easy to go from 0 to N for test + dev, but with that being the default for this API in code. For example the CLI could easily default to running all migrations, even if the API here required you to pass a special value like -1 to achieve that. You're already using -1 for other purposes so that would need to be worked but that's the general idea of what I'm asking.
Running one migration at a time doesn't do that, so you'd either have to hammer it into place by setting
MaxSteps: 1000
or by setting a newTargetVersion
every time a new River DB version comes out (not convenient).
I think this is actually what I'd want for a production application, though. The reason is that the River code version is tied to the migration/schema version. When I'm upgrading between River releases that include new migrations, I'll need to also be sure to run the appropriate schema migrations prior to trying to run that newer version of River's code against the DB. At past jobs I would have added a new migration file to accompany the upgrade and deploy them at the same time. We may even want to go so far as to have the River Client check the schema version at launch and complain or error if it hasn't been migrated far enough yet.
Of course if this is what I want in prod it's also what I'll be practicing in dev, although test envs are potentially different (may want more of a schema:load
than a db:migrate
for speed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I still feel like this specific behavior:
// When migrating down, use with caution. MigrateOpts.MaxSteps should be set to
// 1 to only migrate down one step.
is too dangerous and should never be the default if I forget to specify some other arg(s).
// if err != nil { | ||
// // handle error | ||
// } | ||
func (m *Migrator[TTx]) Migrate(ctx context.Context, direction Direction, opts *MigrateOpts) (*MigrateResult, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the other comment, I'm not quite sure about the API here. I'm thinking in terms of trying to make this foolproof and as difficult as possible to accidentally mess up your prod database, while also being sure exactly what's being run at a given step.
Rather than stating how many migrations to run, do you think it might be safer to specify which migration you want to run? This would require a bit of a different API, because it'd be something like RunUp(ctx, 1)
or Run(ctx, rivermigrate.DirectionUp, 3)
. That forces the use case of running multiple migrations into a different API like RunN()
, but I expect that use case to be more unusual. Also if you went that route, the N
arg in RunN()
could be an explicit arg and not something with a default zero value.
My thought process here is that if I'm running an app with River and I see that there's a new version out, I'll check the changelog or website for notes about new migrations in that version. If I see one, I'm going to make a new migration file in my goose/golang-migrate/etc migration setup where I have it call out to rivermigrate
. I would want to be extra sure that I'm running exactly the migration I expect to be running so that I'm sure I'm prepared and have followed any related instructions if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So both use cases are supported, with what you describe being the TargetVersion
option. I originally had this even closer to what you described as just Version
that'd run a single version one off, but I changed it after thinking about the various footguns and ergonomic issues that fell out of being able to target versions that are multiple steps beyond your current version, and would therefore have to error or be applied out of order. TargetVersion
applies a target version and any versions up to it that are necessary based on your current database state.
I think this would be our general recommendation for usage when writing upgrade guides for specific versions, so the up migration would look like:
rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
TargetVersion: 4,
}
And then the down migration something like:
rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
TargetVersion: 3,
}
//
// or ...
//
// (neither of these is strictly right when upgrading multiple versions at once —
// the user would have to make sure to set their original version)
//
rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{
MaxSteps: 1,
}
See also the note above, but I didn't want TargetVersion
to be the only API because I think there's going to be demand to use this in test suites and one-off small projects (which don't have a full migration line) where you just want to have River raise a usable schema with no hassle and without needing to update code when new River versions come out. Those can just use the really easy:
rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, nil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I missed this on the first pass, my bad! I think you've highlighted that there are indeed two distinct use cases here: get a working DB as fast as possible, vs give me control and certainty for integrating with my existing migration setup. Iff folks already have an existing DB they likely already have an existing migration setup, which for dev and tests could either consist of stepping through migrations one-at-a-time or doing a bulk schema load.
Since this API supports both, that's great! I want to make sure we at least document these conventions clearly and ensure there aren't any major footguns in them (per above thread):
- I'm just getting started with River in an existing migration-driven project. What code do I add to a new migration file to go from 0 to latest? It looks like this would be:
Because there's no max steps specified, it will run as many steps as necessary to get to the target version. Correct?
rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{ TargetVersion: 3, }
- I've already adopted river per the above with an older migration version (say, v2). What do I put in a new migration to ensure I go precisely from v2 to v3, no more and no less?
Correct?
rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{ TargetVersion: 3, MaxSteps: 1, }
- I've already adopted river per the above with an older migration version (say, v2) and multiple migrations have come out since my last upgrade. What do I put in a new migration to ensure I go precisely from v2 to i.e. v4, no more and no less?
Is that correct? Am I guaranteed to begin with up migration 3 in this scenario, because I've said I want to finish at target version 4 and want a max of 2 steps? Or will it try to check the current version, attempt to run
rivermigrate.New(...).Migrate(ctx, rivermigrate.DirectionUp, &rivermigrate.Opts{ TargetVersion: 4, MaxSteps: 2, }
CURRENT_VERSION+1
andCURRENT_VERSION+2
regardless of whether I've successfully ran all prior migrations? Will this API error if I'm requesting v5 with a max of 2 steps, but I'm currently on v2? - For down migrations, I want to be 100% sure that I'm undoing exactly the same steps as I specified above, no more and no less. Does the API let me guarantee that?
- I don't want to integrate my River tables into an existing migration system and just want to use a CLI to quickly bring River to the latest version from wherever it is.
- I don't want to integrate my River tables into an existing migration system, but I still want the tight control and guarantees from (2) and (3) above.
For (2) and (3) above, would the recommendation be to use both TargetVersion
and MaxSteps
together? Or how would I guarantee I'm running only the code from migration 3's up migration, no matter what else I've done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For (2) and (3) above, would the recommendation be to use both
TargetVersion
andMaxSteps
together? Or how would I guarantee I'm running only the code from migration 3's up migration, no matter what else I've done?
No, not recommended — by including a MaxSteps
with everything, you're making things strictly more dangerous.
Say you were trying to get to version 3, and you thought you were on version 2, so you had MaxSteps: 1
. But actually, you were on version 1! So TargetVersion: 3, MaxSteps: 1
actually takes you to version 2 and then River might not start up properly.
You might combine these two options while testing something, but you wouldn't want to do so as part of any upgrade guides or the like. TargetVersion
knows what it's trying to do and will do the right thing based on the current state of the local database.
For down migrations, I want to be 100% sure that I'm undoing exactly the same steps as I specified above, no more and no less. Does the API let me guarantee that?
Nope, and I don't think there's any way to do so without farming the work out to the user themselves. We can tell people that if they know they're incrementing exactly one version, they can use a MaxSteps: 1
, but otherwise they'll have to look at their current version and include that as TargetVersion: <version>
.
One thing I should point out: you generally should never get to the point where you have to run down migrations for any app in production, because if you do, the battle's already lost. River may be an exception here to some degree, but chances are that down migrations are poorly tested and may be unexpectedly destructive/lossy. It's probably less risky to fix forward rather than run them in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not recommended — by including a
MaxSteps
with everything, you're making things strictly more dangerous.Say you were trying to get to version 3, and you thought you were on version 2, so you had
MaxSteps: 1
. But actually, you were on version 1! SoTargetVersion: 3, MaxSteps: 1
actually takes you to version 2 and then River might not start up properly.You might combine these two options while testing something, but you wouldn't want to do so as part of any upgrade guides or the like.
TargetVersion
knows what it's trying to do and will do the right thing based on the current state of the local database.
Alright, this is a fair point. If you're trying to get to v4 (the current version) and you know this, it should be safe to let the migration engine do what it needs to in order to get to that version. On the down migrations, you could get bit by not migrating down far enough (if you were actually further behind than you thought you were and set a MaxSteps
) but that could be uncovered through actually testing your up + down migrations together.
The overall design here should be workable as long as we document all these common use cases clearly.
rivermigrate/river_migrate.go
Outdated
// apply. e.g. If requesting an up migration with version 3, version 3 not | ||
// already be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// apply. e.g. If requesting an up migration with version 3, version 3 not | |
// already be applied. | |
// apply. e.g. If requesting an up migration with version 3, version 3 must | |
// not already be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving the PR broadly, nice work on this. However I do want to reiterate that I think defaulting down migrations to unlimited steps is super dangerous and we should think about adding a failsafe for that scenario. I can't think of many (any?) reasons why someone would really ever want to do that, let alone have it be the default. Thoughts on that remaining bit?
a4ed879
to
89489ee
Compare
@bgentry Okay I added a new commit here to make some changes that I think are along the lines of what you requested:
Seem okay? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the different behavior on down
is asymmetric and feels a bit odd, but if it saves just one person from accidentally nuking their jobs table by mistake it seems worthwhile IMO. Most people shouldn't be migrating down in a non-dev environment to begin with.
Nice work on this
…ersion support Here, as requested by several users, add a Go API that's able to run migrations as an alternative to migrating via the CLI. We add a new package `rivermigrate` to mirror the conventions of `riverdriver` and `rivertest`. `rivermigrate` is largely just a refactor of the existing internal package `internal/dbmigrate` with a couple important tweaks: * We give it the ability to take a driver the same way as we do for `river.Client` and `rivertest`. * To mirror client and `rivertest` we add a transaction-specific variant that takes a `TTx` (`MigrateTx`). * Because we now have the non-transaction and transaction variants, I take away the `Down`/`Up` function distinctions, and make up/down a parameter to `Migrate`/`MigrateTx` instead (so we only need two functions instead of four). * I move to using an `embed.FS` instead of specific embedded strings for each version and the file system is walked to get all migrations within it. This will make adding new migrations in the future easier and more foolproof. I also added a new option for `TargetVersion` so that it's possible to do a limited migration to a specific version instead of going all the way up or down. This is a pretty standard feature in most migration frameworks. This still can't support Goose because it takes an `sql.DB`, but I think we can get there by adding a minimal driver for the `sql` package that can run migrations and nothing else. More of this to be explored later.
29b1f00
to
d3d7cc7
Compare
… unlimited A few tweaks to the originally proposed migration API: * When migrating down, defaults to a maximum of one step by default. This is meant as a safety feature so that it's harder to remove the River schema completely by accident. * When migrating down, `TargetVersion` can be set to `-1` explicitly to move an unlimited number of steps. This adds a way to remove the River schema completely by migrating to the equivalent of version zero (we can't use `0` for this because that's the default value of `int`).
d3d7cc7
to
73e87ae
Compare
Awesome! Okay, this will still need more work to be function with Goose's |
Cut release 0.0.12. I'll write up some additional docs for the site too. |
Follow up #67 by adding an example test for the new Go migration API. Helps provide a more copy/pastable example for River's godoc, and something we can link to from our other docs.
Follow up #67 by adding an example test for the new Go migration API. Helps provide a more copy/pastable example for River's godoc, and something we can link to from our other docs.
Follow up #67 by adding an example test for the new Go migration API. Helps provide a more copy/pastable example for River's godoc, and something we can link to from our other docs.
Here, as requested by several users, add a Go API that's able to run
migrations as an alternative to migrating via the CLI. We add a new
package
rivermigrate
to mirror the conventions ofriverdriver
andrivertest
.rivermigrate
is largely just a refactor of the existing internalpackage
internal/dbmigrate
with a couple important tweaks:We give it the ability to take a driver the same way as we do for
river.Client
andrivertest
.To mirror client and
rivertest
we add a transaction-specific variantthat takes a
TTx
(MigrateTx
).Because we now have the non-transaction and transaction variants, I
take away the
Down
/Up
function distinctions, and make up/down aparameter to
Migrate
/MigrateTx
instead (so we only need twofunctions instead of four).
I move to using an
embed.FS
instead of specific embedded strings foreach version and the file system is walked to get all migrations
within it. This will make adding new migrations in the future easier
and more foolproof.
I also added a new option for
TargetVersion
so that it's possible todo a limited migration to a specific version instead of going all the
way up or down. This is a pretty standard feature in most migration
frameworks.
This still can't support Goose because it takes an
sql.DB
, but I thinkwe can get there by adding a minimal driver for the
sql
package thatcan run migrations and nothing else. More of this to be explored later.