Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ruby] Add a Pipelining API? #189

Open
casperisfine opened this issue Jun 5, 2024 · 2 comments
Open

[Ruby] Add a Pipelining API? #189

casperisfine opened this issue Jun 5, 2024 · 2 comments

Comments

@casperisfine
Copy link
Contributor

Context

Since MySQL server is generally hosted on a different machine, an important performance factor is the number of roundtrips you have to perform.

e.g. if you have a 1ms round trip time, and 3 queries to perform in a transaction, you'll have to do 5 full roundtrips for a total of 5ms. Sometimes there is no way around this because you need the result of the previous query to perform the next one, but sometimes you don't.

The classic technique for minimizing the cost of roundtrips, is to send multiple queries in a row without immediately waiting for the response, AKA pipelining

image

Existing Solution with Trilogy

Right now some people are essentially doing poor man's pipelining by enabling MULTI_STATEMENTS and concatenating their queries together. It works but is wonky and a bit unsafe, as MULTI_STATEMENTS is off by default for a good reason, it prevent various injection vulnerabilities.

Existing Solution with other DB clients

Mysql2

The mysql2 gem has some limited support for pipelining, which they refer to as async: https://github.com/brianmario/mysql2?tab=readme-ov-file#async

It's a bit wonky because they only offer to queue the query and then block on async_result, but I suppose it works.

PG

The pg gem has some support for a pipeline API since PostgreSQL 14 (2021), I haven't tried it, and it seems a bit wonky, but it's the same idea. You start a pipeline with #enter_pipeline_mode, queue some queries in that pipeline, and can flush the pipelined queries whenever you want so they are sent all at once.

Here's the spec for it in the gem: https://github.com/ged/ruby-pg/blob/d072b21852865ecb84e6345df11d68eed50702bb/spec/pg/connection_spec.rb#L2009-L2122

Other databases

While pipelining isn't very commonly used with RDBMS, it has been used extensively for years with redis-rb and redis-client.

redis-client only offer a simple block based API, and return all the results at once in an Array:

result_1, result_2 = client.pipelined do |pipeline|
  pipeline.call("COMMAND 1")
  pipeline.call("COMMAND 2")
end

redis-rb offer some more convenience by having pipelined queries return a Future object as well.

future_1, future_2
result_1, result_2 = client.pipelined do |pipeline|
  future_1 = pipeline.command1
  future_2 = pipeline.command2
end
future_1.value == result_1
future_2.value == result_2

Some older versions of redis-rb also had a "queue" API I deprecated:

client.queue(:command_1)
client.queue(:command_2)
result_1, result_2 = client.commit

Use case

As mentioned in introduction, the use case for such API would be to optimized some specific cases, particularly around transactions. If your queries aren't in a transaction, you may simply want to perform the queries concurrently each in their own connection, but as soon as transactions are involved you can't do this anymore.

One specific place where I'd like to use this is in Active Record's transaction code. Right now Active Record's BEGIN queries are lazy, they are only send when the first query of the transaction is triggered.

If Trilogy had such pipelining API, we could send the BEGIN and the first query of the transaction in a pipeline, and save one roundtrip per transaction.

FYI @kirs
@matthewd @jhawthorn @composerinteralia @adrianna-chang-shopify @eileencodes any opinions / concerns about introducing such capability? I'm not set on any specific API design, but I'm happy to explore, etc.

casperisfine pushed a commit to casperisfine/trilogy that referenced this issue Aug 7, 2024
Ref: trilogy-libraries#189

This is the most basic API possible and uses a hack, but it demonstrate
that it's possible to do pipelining with MySQL API.
@casperisfine
Copy link
Contributor Author

I opened #197 to demonstrate how it's possible to pipeline queries. I went with the simplest possible API, but might be worth offering something more flexible perhaps. But as it stands, it would already be useful as a better alternative to multi-statements.

@gmac
Copy link

gmac commented Aug 22, 2024

This would be hugely beneficial for certain types of GraphQL batching. GraphQL fields are commonly resolved in bulk across a set of records, and sometimes require a unique query per field that goes beyond the capabilities of simple key batching (ie: select * from my_things where id IN (1, 2, 3)).

As mentioned in the issue, the ideal solution would be concurrency. However, that assumes infra can deliver warmed connections quickly and can scale database connections by order of web workers. Using what we reliably have today (one connection per request), being able to pipeline batches of queries through that one connection to save hundreds of round trips would provide considerable advantage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants