-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ruby] Add a Pipelining API? #189
Comments
Ref: trilogy-libraries#189 This is the most basic API possible and uses a hack, but it demonstrate that it's possible to do pipelining with MySQL API.
I opened #197 to demonstrate how it's possible to pipeline queries. I went with the simplest possible API, but might be worth offering something more flexible perhaps. But as it stands, it would already be useful as a better alternative to multi-statements. |
This would be hugely beneficial for certain types of GraphQL batching. GraphQL fields are commonly resolved in bulk across a set of records, and sometimes require a unique query per field that goes beyond the capabilities of simple key batching (ie: As mentioned in the issue, the ideal solution would be concurrency. However, that assumes infra can deliver warmed connections quickly and can scale database connections by order of web workers. Using what we reliably have today (one connection per request), being able to pipeline batches of queries through that one connection to save hundreds of round trips would provide considerable advantage. |
Context
Since MySQL server is generally hosted on a different machine, an important performance factor is the number of roundtrips you have to perform.
e.g. if you have a
1ms
round trip time, and 3 queries to perform in a transaction, you'll have to do 5 full roundtrips for a total of 5ms. Sometimes there is no way around this because you need the result of the previous query to perform the next one, but sometimes you don't.The classic technique for minimizing the cost of roundtrips, is to send multiple queries in a row without immediately waiting for the response, AKA pipelining
Existing Solution with Trilogy
Right now some people are essentially doing poor man's pipelining by enabling
MULTI_STATEMENTS
and concatenating their queries together. It works but is wonky and a bit unsafe, asMULTI_STATEMENTS
is off by default for a good reason, it prevent various injection vulnerabilities.Existing Solution with other DB clients
Mysql2
The
mysql2
gem has some limited support for pipelining, which they refer to as async: https://github.com/brianmario/mysql2?tab=readme-ov-file#asyncIt's a bit wonky because they only offer to queue the query and then block on
async_result
, but I suppose it works.PG
The
pg
gem has some support for a pipeline API since PostgreSQL 14 (2021), I haven't tried it, and it seems a bit wonky, but it's the same idea. You start a pipeline with#enter_pipeline_mode
, queue some queries in that pipeline, and can flush the pipelined queries whenever you want so they are sent all at once.Here's the spec for it in the gem: https://github.com/ged/ruby-pg/blob/d072b21852865ecb84e6345df11d68eed50702bb/spec/pg/connection_spec.rb#L2009-L2122
Other databases
While pipelining isn't very commonly used with RDBMS, it has been used extensively for years with
redis-rb
andredis-client
.redis-client
only offer a simple block based API, and return all the results at once in an Array:redis-rb
offer some more convenience by having pipelined queries return aFuture
object as well.Some older versions of
redis-rb
also had a "queue" API I deprecated:Use case
As mentioned in introduction, the use case for such API would be to optimized some specific cases, particularly around transactions. If your queries aren't in a transaction, you may simply want to perform the queries concurrently each in their own connection, but as soon as transactions are involved you can't do this anymore.
One specific place where I'd like to use this is in Active Record's transaction code. Right now Active Record's
BEGIN
queries are lazy, they are only send when the first query of the transaction is triggered.If Trilogy had such pipelining API, we could send the
BEGIN
and the first query of the transaction in a pipeline, and save one roundtrip per transaction.FYI @kirs
@matthewd @jhawthorn @composerinteralia @adrianna-chang-shopify @eileencodes any opinions / concerns about introducing such capability? I'm not set on any specific API design, but I'm happy to explore, etc.
The text was updated successfully, but these errors were encountered: