Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrations are a right royal pain in the arse #482

Closed
ErisDS opened this issue Aug 21, 2013 · 6 comments
Closed

Migrations are a right royal pain in the arse #482

ErisDS opened this issue Aug 21, 2013 · 6 comments
Milestone

Comments

@ErisDS
Copy link
Member

ErisDS commented Aug 21, 2013

The migration system is a convoluted & undocumented little corner of hell. Please help me out by discussing how we can improve it.

Don't get me wrong, it works, and it works really quite well. However it seems that it is tricky to understand what is going on, and on top of that getting a migration, fixtures file, import & export setup just to change a tiny property on the DB is... ridiculous.

I think we need to break this whole problem down into smaller chunks which are solved separately and independently. Then we need to document how this works & put into place tools to make it as easy to do as possible.

The following is a brain dump of all the things I know to be true about the current system, along with some ideas on how to improve it. It's a pretty big topic so discussing various areas / suggestions for improvements may need to be done by creating a new issue for that idea.

Migrations

Migrations are what we call changes to the database schema. However, the schema itself is defined through the magical method of a series of migrations. There is nowhere that the entire schema can be seen and understood, apart from my magical gist: https://gist.github.com/ErisDS/cd0f3d33072410309f37.

I think Migrations are really two problems.

  1. The database schema as it must exist to run the current version of the code.
  2. How to get there from older versions of the database.

At the very least, I think that at this stage we are, and need to be, moving much faster than we can with this system in terms of DB changes. So I think it would be a good idea, at some point before the so called 0.3.0 release, to stabilise the schema, create a version of Ghost which upgrades to the stable schema, and then in the next version flatten everything down again & remove the upgrade path. This might mean for example that 0.3.0 is in fact the 'stable schema version' and that anyone (just us & our VIPS atm) on 0.2.0 must upgrade to 0.3.0 before they can upgrade to any 0.4.0 version. 0.4.0 would then become the version we release to KS backers.

KS backers would get a nice clean migration system. VIPs would have to jump through the 0.3.0 hoop or start a fresh 0.4.0 install.

Fixtures

Fixtures are the default data which should exist in a fresh DB at the current version of the code. The data relies on the structure, so they can only be inserted once we have reached the correct structure.

Upgrading fixtures is a strange problem that I think has been misunderstood.
Fixtures should only be for the initial state of the DB, it is expected that the user will overwrite them and therefore we cannot change them later. If we decide we want the 'Fresh Install State' to be different, then we simply change the fixtures.

However, at the moment, we are using 'fixture upgrades' to add new settings which are required to be present, and this is the only thing we are using them for. I think this is indicative of a completely different problem.

Settings, as they stand, cannot really be added through the API. Additionally, apart from the fixtures, we have no way of describing the settings which are required, what sort of data they should contain and how to validate them.

I think a lot of problems could be solved by improving the API for settings. Having a definition of required settings, their default value, their validation rules, if they are not overridable (see import) and then having a way to request a setting which, if it is not present, adds it with it's default setting.

** Update: ** we do also want to move away from using these fixtures as test fixtures and only use them in tests if we are testing the 'Fresh Install State'.

Export

Export is a JSON dump of EVERYTHING in the DB, so that the DB can be recreated later if something goes wrong. At the moment, the only thing that is different between different versions of export is the list of tables which are present.

This information should be accessible by doing some raw SQL queries to list the tables I believe?

I think we can flatten export down into a single, simple tool by querying the database and asking what's there. We will have a data version stamp which we can put in the meta data, and I recommend also adding the Ghost version number, just in case.

Import

Import is a slightly different ball game to export. Bringing data in is a bit harder as we need to know what version of data we have, and what version of database schema we have and how to get from one to the other.

We also have to know a bit more about the structure / type of the data we are importing. At the moment there are a set of rules which are something like:

  • if it is a post, strip the primary key and insert (effectively appending posts to the current ones)
  • if it is a user, overwrite the current user. - This ensures that there is only ever 1 user of id 1. It will break if there is no current user... but then as import is run through the interface you have to have logged in to do it.
  • if it is a setting, update the setting's value if the key is present, otherwise insert it, unless it is the currentVersion setting, in which case do nothing as this setting must not be overridden.
  • for anything else, strip the primary key and insert it

These rules are reasonably sane, and the way the import is written there is very little reason to have any more than 1 importer. In the case that this changes, we could expand the import system, but for now I believe it can be flattened?

For example, if we finally update the users to work for multi-user, we can change the importer at the same time, and any user importing TO that version will get the correct functionality.

Summary

The user can only ever be on a single version of the codebase, which has a single version of the database schema. Thus all each version needs is to be able to handle getting TO that version from any earlier version.

The code we have achieves this, but I think it's a little bit complex and I have the feeling it's a bit backwards. It seems to be focused around pushing from nothing towards a new version, when in fact we should be pulling users into the right version from wherever they are. It's a subtle difference that I'm not articulating very well.

Thoughts?

@gotdibbs
Copy link
Contributor

@ErisDS Having been recently in there a fair bit, I should have a good perspective on some of those changes. Just need to compile my thoughts.

Off the bat though, I think part of simplifying the process at least (wouldn't be simplifying the amount of code we right) might be having a merged database schema per version so that each time a new instance is spun up it doesn't have to start at ground zero and run through all the migrations/changes to get to the current version.

@ErisDS
Copy link
Member Author

ErisDS commented Aug 21, 2013

That's what I'm trying to get at.. although not very eloquently. Even though migrations as we have them are a pretty standard model (I think it is used in rails, and is essentially what liquibase does) it just seems backwards to start at ground zero and work up.. when it's where we want to be that is the known thing, and getting there that is the difficult part.

@ErisDS
Copy link
Member Author

ErisDS commented Aug 23, 2013

Hmm so it turns out that right now, Knex has no ability to modify columns :( knex/knex#46

Is going to need to be built ourselves, either building the necessary features into knex or manually writing migrations

@codeincontext
Copy link
Contributor

Ok, so we definitely need to be able to modify columns. Options are (I guess) extend Knex, or try thinking about this another way. We also definitely need the schema somewhere. Even if that's a generated file using @ErisDS's script (which would be read-only, obviously).

I'm not sure if it's over-complicating things a bit, but one idea about how we could rework migrations:

There's a schema file containing all the table data. Any new Ghost installs build their database from this file. (same as how Rails does new installs)

When making schema changes, you change the schema file. If you try to run the migration task, you will get presented with an error telling you that the current migrations don't match up to the schema and prompts you to create a matching migration to get there.

Two benefits of this approach:

  1. It's a different mindset, the schema comes first, and migrations help you get there.
  2. It's a bit like TDD for migrations. You're making sure the migration is doing what you want.

The clever part (not needed right now). The migration system should in theory be able to detect table additions/deletions, as well as column additions/deletions in the schema. When you try to migrate it could pre-load a new migration file with what it thinks you're trying to do.

Thoughts?

@ErisDS
Copy link
Member Author

ErisDS commented Sep 2, 2013

There are 2 clear, very high priority next steps for this:

  1. Strip settings out of fixtures: Settings API: Separate default settings from fixtures #567
  2. Get column modifications working: Migrations: Column modification #601

Anyone up for taking one or t'other?

@ErisDS
Copy link
Member Author

ErisDS commented Sep 14, 2013

Raised: #731, #732, #737, #734, #735 and #737

@ErisDS ErisDS closed this as completed Sep 14, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants