Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loads database dump on every startup #85

Merged
merged 2 commits into from
Jul 19, 2024
Merged

Loads database dump on every startup #85

merged 2 commits into from
Jul 19, 2024

Conversation

falquaddoomi
Copy link
Collaborator

@falquaddoomi falquaddoomi commented Jul 18, 2024

TL;DR: this PR loads dumps on every startup, not just when the database is empty, removing the need to first delete the current database via down -v and then restart the stack to reload a dump when the database changes.

This PR patches the postgis image's init script so that it executes scripts located in /docker-entrypoint-postinit.d/ every time the database container starts. It operates in a similar fashion to how the image executes scripts from /docker-entrypoint-initdb.d/, but unlike those scripts the post-init ones are executed on every container startup.

This PR also moves load_latest_dump.sh into the post-init scripts folder, so that it loads the latest dump on every startup, not just when there's no database. This has the effect of destroying any changes you might have made to the database while it was running, so be sure to enter the db container and export the database via the /db-exports/make_db_export.sh script if there's anything you want to save for next time.

While this scheme works for now, the behavior should probably be revisited for a few reasons:

  • while the database dump is currently small enough that it can be quickly loaded on each execution, it might become expensive in the future
  • any data that's written to the database is essentially ephemeral, which would become a problem if we start using it for, e.g., user logins or user-submitted data
  • if we upgrade to a new version of the postgres/postgis image and the init script changes, we'll have to reimplement the patch

…it, even when there's an existing database. Moves load_latest_dump.sh into the post-init scripts folder.
Copy link

netlify bot commented Jul 18, 2024

Deploy Preview for exploring-cancer-in-colorado canceled.

Name Link
🔨 Latest commit f58051b
🔍 Latest deploy log https://app.netlify.com/sites/exploring-cancer-in-colorado/deploys/669ac5b1b552fa00089ac715

@vincerubinetti
Copy link
Collaborator

Could you make some note, probably at the top of the patched file, of where the source is from, e.g.:

MODIFIED FROM https://github.com/docker-library/postgres/blob/master/15/alpine3.20/docker-entrypoint.sh

To aid in review, could you also point to which lines you changed in that file?

@vincerubinetti
Copy link
Collaborator

Aside from that, it seems to work as expected. Worked with an existing image/volume I had on disk already, seemed to create everything from scratch.

@falquaddoomi falquaddoomi mentioned this pull request Jul 19, 2024
@vincerubinetti
Copy link
Collaborator

This is probably a dumb question, but could you also elaborate on this approach vs. just always baking the "down" compose flag into run_stack?

…stgis:15-3.4 since 15-3.3 is no longer current.
@falquaddoomi
Copy link
Collaborator Author

could you also elaborate on this approach vs. just always baking the "down" compose flag into run_stack?

The problem is that the -v flag deletes all volumes, not just the database one, for all containers in the stack. I'm using more than just the database volumes in production to persist some data between stack invocations, including certificates which are infeasible to obtain on each run of the stack. Purging all volumes every time you run the stack is also just not something you'd expect IMHO, so if people were to extend this it might catch them by surprise.

I'm kind of surprised that the PostgreSQL image doesn't have this functionality already built-in, and as you can see there are others who share that sentiment: docker-library/postgres#191 (that's just one I found now, but I recall there being other issues with the same idea.)

Also, while this is something I think should be included in the base postgres image, I also don't see loading the database fresh from a dump every time as a viable long-term solution for populating the database, especially if we ever end up having data that's generated during runtime (e.g., statistics, user accounts, submissions, etc.). Instead, I'd like to figure out a way to perform data migrations where you can bring in new data without having to replace the entire database. It's been a while since I've worked on a project that's a mix of fixture and runtime data, so I need to refresh my knowledge on the topic before I start implementing it.

@falquaddoomi
Copy link
Collaborator Author

falquaddoomi commented Jul 19, 2024

FYI, I added some comments based on your requests, @vincerubinetti. I also found while investigating the source of the entrypoint script that postgis had moved to a new version and apparently had removed 15-3.3 from the repo's main branch, so I bumped the version in this PR to 15-3.4 as well. I tested running it, and it seems the new version is fine with the previous database and dumps, so nothing to change there, thankfully.

Copy link
Collaborator

@vincerubinetti vincerubinetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation and changes. LGTM.

@falquaddoomi falquaddoomi merged commit a0b050f into main Jul 19, 2024
@falquaddoomi falquaddoomi deleted the pg-load-dumpfile branch July 19, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants