-
Couldn't load subscription status.
- Fork 1.5k
PostgreSQL SASL – run SHA256 in a blocking executor #4006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The majority of the time is likely being spent in this loop: sqlx/sqlx-postgres/src/connection/sasl.rs Lines 214 to 218 in 753d327
(It looks like there's an off-by-one error, but the first iteration is outside the loop.) The iteration count is stored with the hashed password, but it's set for new passwords by the As the Tokio docs point out, So if your server is using a pool with a I've worked around this in the past using a shared semaphore to limit the number of blocking tasks spawned at once, but I'm not sure that's appropriate here. However, we could instead just yield to the executor here every so many iterations. There's There's also no equivalent for Assuming you haven't changed the default for And it just so happens that we already have |
To prevent spenting too much time doing synchronous work on the event loop, we yield every few iterations of the hmac-sha256.
753d327 to
ef2e809
Compare
|
Hey! Sorry, I missed this message. I was checking the issue and not the PR... Thank you for the thorough response!
Ah yes, I see! From the tokio docs:
This sounds like a potential solution, albeit a bit more involved. And I guess we'd like to avoid bringing in rayon as a dependency?
Yep, looks like it (which makes sense, it's where we do the 4096 iterations):
Let's give this a go, at least to start. On my machine, yielding every 10 iterations, it usually yields every 100us as expected, but sometimes much higher. Even yielding every 5 iterations, I saw it take occasionally 250us between yields. It's never going to be perfect though, so I've implemented it to yield every 5 iterations, which usually gives us ~50us between yields. Let me know if you'd like something different! Is the benchmarking stuff still used? |
Yeah, for a few reasons:
Yeah, that's going to depend on the exact timing of various things. That could just be the thread itself getting pre-empted at some point, which would occur occasionally on that time scale. If we really wanted to be more precise here, we could measure the time spent in the loop and ensure we yield at least every 50-100us, but that's probably overkill. The solution you've arrived at is pretty much exactly what I was thinking of.
You mean this? sqlx/sqlx-postgres/src/connection/sasl.rs Lines 208 to 225 in 69bb595
I didn't realize that existed, honestly. I didn't write this code myself. We don't run benchmarks in our CI suite. That appears to be using We've since switched to using So it's an open question of what to do with this benchmark function. You could keep it as-is, but instead of trying to make it an It's certainly possible to adapt it to Or you could just delete it (or leave it for posterity but comment it out). I'm not sure of the right answer, here. I'm leaning towards just deleting it, though. |
Done. I guess it'll still be in the commit history for posterity if we ever want it. Any idea when the next release might be published? I'd be keen to test out a pre-release with a real (ish) workload to see what effect it has. |
There's so many PRs with breaking changes that still need to be reviewed and merged, so I'm not sure. But I'm working on it. |
If you don't plan to publish your crates on crates.io or are bound by any policy against it, you can use a |
* Yield while salting password for PG SASL To prevent spenting too much time doing synchronous work on the event loop, we yield every few iterations of the hmac-sha256. * Remove unused bench
Does your PR solve an issue?
Fixes #4005
Is this a breaking change?
No
Discussion
I've put together a PoC of offloading the SHA256 HMAC generation into a blocking executor, taking it off the main event loop.
There's also a little unit test demonstrating how long this code takes. On my machine this takes ~50ms.
I can remove the test and/or put something more useful in there before merging.