-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: New Architecture for sqlx::query!()
Macros
#1598
Comments
I'm wondering if there's a way that we can support incremental adoption of Perhaps |
If #1183 could be finished and the workspace situation improved (see discussion in #1223), I think that would solve my own pain points. I'm not convinced a separate daemon would make working with SQLx easier at all. For context: The database part of the service I work on is updated infrequently enough that an offline-first workflow works well ( |
@jplatte hmm, then as a compromise, maybe the macros could use the cached data first if it exists and the migrations haven't changed. Then only the modified macro invocation will actually have to talk to the database. It's certainly worth trying as a first step at least. |
I have thought a lot about how to handle these compile-time query checking features (not in rust, but also in other languages). Needing access to a database of the same kind as the production base during builds is almost always hard and leads to complex build system. This is why I want to suggest an alternate approach:
So in the end, you generate a
Is there any use case where getting the error directly at application startup (or by running the checker function directly via cli after build or even as an automatic post-build step) is significantly worse than getting them during build? The main motivation for compile-time query checking from my experience is that you want to avoid situations where you first have to trigger a certain code path to see the error, which leads to long cycle times and is inconvenient. But if you can check all queries directly at application startup or with a CLI command, that solves the same issue without the disadvantages of compile-time network access. |
This is quite simply not feasible. SQLx checks the queries directly against the database (or against cached metadata previously obtained from the database) because the alternative is parsing / interpreting the query string, which could be using many, many different syntax constructs and features depending on the DB and even how it is set up. As a little thought experiment, consider this simple query: |
Not to derail the conversation topic to badly but i see a use case in the mechanics that @bennofs proposed with Edit: |
@pythoneer It sounds like you are describing |
Hmm, couldn't you specify the return type explicitly in the code? Are there many other things apart from return types and argument types that you'd need to know about the query? Ideally you'd be able to record the (parametrized) query, return and argument types and then check against the database that those types are convertible to/from the the actual query parameters/return value. |
In theory yes, but that would be a lot of unnecessary extra boilerplate.
Yes, a bunch of things. And you would have to be able to fully parse the query to get that information, which may be reasonable for a simple DB, but SQLx supports multiple very complex databases. Even "just" a self-made parser for PostgreSQL's SQL flavor would be a huge project. |
I definitely agree that parsing queries should be avoided. As far as I can see, with Anyway, this is not core of sqlx so I may just try to implement that myself as a separate crate, and see how well it works :) |
The processing is still driven by the metadata returned from the database. Proc macros can't introspect any code that isn't in their input, so all It will happily emit fields that aren't in the struct or omit ones that are if they aren't in the query, and it's up to the compiler to sanity check that. |
At least considering Postgres, couldn't this tool use an exported schema using |
@loafofpiecrust have a look at this FAQ answer: https://github.com/launchbadge/sqlx/blob/master/FAQ.md#why-cant-sqlx-just-look-at-my-database-schemamigrations-and-parse-the-sql-itself It's a bit snarky but the suggestion comes up a lot and so it gets a bit annoying to respond to. |
Yup @abonander sorry for the noise. |
Also related, I find using
I stopped using it due to that, but then I added a crate dependency that uses it, and it's even worse - it is misusing features and provides offline support only for PostgreSQL. I think you're on the right track that macro opening any kind of network connections is pretty wild design. Is there any way to do something like:
I'd rather not daemon:
Sorry for the wall of text. |
This is an interesting idea, and is a much bigger idea than my thought of having a sqlx describe command to inspect the results of a query. I do see significant potential advantages to separating the query analysis from the rust compilation:
The potential disadvantages I see are:
Relying on something resembling 'offline' mode, with an external tool to do the sql query analysis, seems like a better default than the current default. For how I use sqlx, I would still prefer if cargo run/check was capable of handling everything, even if it was opt-in. But even if that was functionality was lost, sqlx would STILL be better than only finding out about a typo after the application tries to run the query. |
Status Quo
Currently, the default mode of operation for the
sqlx::query!()
family of macros is to connect to a running database server (or open that database in-process for SQLite) as specified by theDATABASE_URL
environment variable, and then use database-specific commands to parse and analyze the passed query, to get information on bind parameters and output columns and generate code based on that information.As an option, the macros can also operate in an "offline" mode where instead of connecting to a database server, they read a
sqlx-data.json
file created bycargo sqlx prepare
which contains the same data and use that to generate code.The implementation is relatively straightforward and functions well enough, but has some drawbacks:
Compiling a project with a lot of
query!()
macro invocations is painfully slow.Since each macro invocation involves opening and working with a TCP connection to a database server from within an asynchronous runtime, the query macros execute a lot more code than your typical procedural macro. This is compounded by the fact that procedural macros are exclusively compiled in debug mode which means no optimizations (which async code relies pretty heavily on to be fast).
I have tried playing around with setting
rustflags
in aconfig.toml
to forcesqlx-macros
to compile in release mode but it didn't really help and seemed to actually worsen compile times if it did anything at all; the most noticeable thing that it did was clobber all the cached compiler artifacts for the whole dependency tree, requiring a full recompile.Macro invocations are currently executed serially which further compounds the problem.
Currently as a mitigation we only start a single runtime and then cache it in a
lazy_static!()
; since proc macros are executed in a separate process that is kept running for the duration of the compiler, this saves some overhead per invocation. I've been meaning to see if caching aPool
of connections saves any overhead as well. I imagine it would, at least somewhat. We could cache analysis results as well, but that would only help for queries that are copy-pasted a lot.Many people find the idea of procedural macros opening arbitrary TCP connections to be highly unpalatable.
Oftentimes when someone comes across SQLx and starts reading about the query macros, their first reaction is something along the lines of, "wow, the compiler actually lets you do that? That's disgusting!"
Fair enough, the current procedural macro environment is extremely laissez-faire and there have been calls to lock down what procedural macros (and build scripts) can actually do, thanks to sensationalized proof-of-concept projects showing how a proc macro executed by a transitive dependency could steal sensitive information like SSH keys and the like.
Personally, I think the threat is kind of overblown, or at least rather moot, considering a malicious crate in your dependency tree could accomplish all kinds of nastiness already, just at runtime instead of compile time. I fear that a big push to sandbox procedural macros and build scripts could lure people into a false sense of security and make them feel like they don't need to audit their dependencies. But, I digress.
The
query!()
macros don't play that well with IDEs.This situation has improved drastically compared to when we first introduced SQLx, to the point where IntelliJ-Rust and Rust Analyzer can both provide code completion for
sqlx::query!()
. However, it can still take some time to get results, in part thanks to point 1 here, and the macros aren't "pure" in the sense that their output could change from run to run based on schema changes, without the compiler or IDEs being able to pick up on it. They should be working from cached output where possible, and be able to invalidate that cache in a way that IDE plugins can learn about.Offline mode is finicky to use and it's easy to forget to run
cargo sqlx prepare
before committing.Offline mode isn't easily discoverable (in part thanks to just how dense the documentation on
sqlx::query!()
has gotten... we really need to move that stuff to an MDBook or something). It's also easy to forget to runcargo sqlx prepare
before committing, which often leads to really confusing CI errors.In the past, what I've done is add
cargo sqlx prepare && git add sqlx-data.json
to.git/hooks/precommit
to make sure I don't forget, but this makes committing take forever thanks to point 1.The macros don't have a way to ensure that the database is up-to-date with migrations.
This is somewhat on purpose as we don't want the macros automatically executing migrations which may end up breaking the user's local development database and/or clobbering manually generated test data. Still, it does get annoying.
Proposal
Instead of having the macros connect directly to the database server at compile time, they could communicate with a daemon process that the user starts and leaves running while working on their project.
I'm imagining this daemon process as another subcommand of
sqlx-cli
, perhapscargo sqlx watch
or something like that. It would watchtarget/sqlx
which the macros would write their query strings to as files. I considered using a crate likeinterprocess
to let the macros and daemon talk directly, but I think communicating with the filesystem is better because that's a medium that the compiler natively understands as I'll elaborate on in a second.Since a
cargo install
ed binary is built in release mode by default, the latency of the actual request to parse and analyze the query should improve quite a bit.The macros could submit their request to the daemon and then return some dummy code (or even a temporary error, like
Waiting for query analysis. Is `cargo sqlx watch` running?
), telling the compiler to watchtarget/sqlx
for the daemon to write the result of processing the query. Then the compiler could move on to the next macro invocation instead of blocking on the completion of the first one. This would let the macros essentially run in parallel, which should significantly reduce overall compile times.Assuming #570 is implemented as part of this rearchitecting, the daemon could also automatically update entries in
.sqlx/
for successfully parsed queries so thatcargo sqlx prepare
doesn't have to be manually run before committing.This would also be more amenable to sandboxing as we would only require write access to
target/
and read access to the project, which wouldn't be doing anything more than what build scripts are already expected to do, and any sandboxing scheme that is implemented would be required to support. Support for Windows named pipes or Unix domain sockets would probably be right out in this case unless we specifically asked for it, and that would be an uphill battle.The daemon could also periodically re-run the analysis of queries to pick up changes to the database schema.
@mehcode suggested that the daemon could automatically migrate the database if necessary, so no other manual setup would be needed. It could even spin up a temporary database so migrations could be automatically tested and synchronized with the query macros without breaking a local development database.
The text was updated successfully, but these errors were encountered: