Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for arbirary database connection string #5

Open
ColdenCullen opened this issue Aug 31, 2023 · 6 comments
Open

Add support for arbirary database connection string #5

ColdenCullen opened this issue Aug 31, 2023 · 6 comments

Comments

@ColdenCullen
Copy link

It would be nice to be able to pass DATABASE_URL as an environment variable to the docker container in order to use an external database for enhanced robustness. This would allow us to run this tool in the cloud without requiring any on-disk persistence.

It seems like this should be possible by:

  1. Replace usages of sqlx::SqlitePool with sqlx::AnyPool
  2. Refactor the --database parameter to accept the full connection string, not just the part after sqlite:
  3. Add an ENV entry to the Dockerfile to set a default and allow for overrides

I gave it a shot locally, but I don't know enough about sqlx to make the swap. I ran into issues with the query_scalar macro returning a QueryScalar for the Sqlite driver and not the Any driver (I think this has to do with the files in the .sqlx folder?).

@jorgenpt
Copy link
Owner

jorgenpt commented Sep 16, 2023

@ColdenCullen Interesting idea, I didn't realize it would be so straight forward to support in sqlx. I'll take a peek at this (maybe today or tomorrow if I can). Out of curiosity, how are you running the Docker container?

FWIW, I think you might need to run cargo sqlx prepare -- --lib with the DATABASE_URL set with your database type to update the .sqlx files.

@jorgenpt
Copy link
Owner

Ah, it's a bit of a problem, it looks like -- the query macros don't support the Any connection type. (launchbadge/sqlx#964)

That means that there are a handful of other paths forward:

  • Make the database type a compile-time setting and publish images for two-three different database types (which one would you be looking at supporting? pgsql or mysql?)
  • Stop using the query macros and use the query functions instead
  • Switch (the default) from sqlite to a non-local database

I really like how trivial it is to spin up an instance with sqlite, and performance wise it seems totally reasonable for smaller query volumes (we're ~30-40 developers and it seems relatively low load -- plus we run it in our own Kubernetes cluster, rather than on e.g. AWS Elastic Container Service, so we would actually have to spin up a db for it), so that third option is probably off the table.

@ColdenCullen
Copy link
Author

This all makes sense to me!

We run our container on an EC2 machine (not ECS) deployed by a CDK stack. The problem is that sometimes the CDK can decide that a machine requires replacement if there's been a significant enough OS update. The advantage of also having our CDK stack spin up an RDS instance to connect this to is that the machine running RUGS can be completely ephemeral, and we can apply a backup strategy only to the database, and not care if the machine running RUGS itself goes up, down, or sideways, since it can be trivially re-created.

Let me know if there's anything else I can do to help make this work!

@jorgenpt
Copy link
Owner

jorgenpt commented Oct 7, 2023

Hm, interesting. Does it not persist Docker volumes when it replaces the container? That seems a little scary! I figured the whole point of using Docker-managed volumes (and not the bind mounts) was that the volumes were separate from the container and managed by the Docker tools.

Do you have a sense for which database backend you'd want to use it with?

@ColdenCullen
Copy link
Author

It does persist docker volumes between container, but CDK will sometimes replace the entire EC2 instance. The idea is that the instance piece is just ephemeral compute for data stored elsewhere. That isn’t always a practical model, but it’s the model we’re stuck with if we want easy infrastructure-as-code.

We default to Postgres on RDS unless we have a reason to use something else, so Postgres as the engine would be my first choice.

@jorgenpt
Copy link
Owner

jorgenpt commented Oct 7, 2023

Ok! Looking into this a little more, I think a potential path forward is to switch to SeaORM. It specifically abstracts it away in a manner where it supports dynamically picking the database connection at runtime. Overall I don't think the conversion will be too bad, since we use so few queries in RUGS, but I need to figure out a good way to handle migrations from existing databases -- creating a matching schema, then making sure that SeaORM doesn't think it needs migrating further. I also need to remove the dependency on the sqlite-specific COLLATE NOCASE. I see that PGSQL has citext, but I don't know that SeaORM supports it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants