[Ruby] Add a Pipelining API? #189

casperisfine · 2024-06-05T07:59:39Z

Context

Since MySQL server is generally hosted on a different machine, an important performance factor is the number of roundtrips you have to perform.

e.g. if you have a 1ms round trip time, and 3 queries to perform in a transaction, you'll have to do 5 full roundtrips for a total of 5ms. Sometimes there is no way around this because you need the result of the previous query to perform the next one, but sometimes you don't.

The classic technique for minimizing the cost of roundtrips, is to send multiple queries in a row without immediately waiting for the response, AKA pipelining

Existing Solution with Trilogy

Right now some people are essentially doing poor man's pipelining by enabling MULTI_STATEMENTS and concatenating their queries together. It works but is wonky and a bit unsafe, as MULTI_STATEMENTS is off by default for a good reason, it prevent various injection vulnerabilities.

Existing Solution with other DB clients

Mysql2

The mysql2 gem has some limited support for pipelining, which they refer to as async: https://github.com/brianmario/mysql2?tab=readme-ov-file#async

It's a bit wonky because they only offer to queue the query and then block on async_result, but I suppose it works.

PG

The pg gem has some support for a pipeline API since PostgreSQL 14 (2021), I haven't tried it, and it seems a bit wonky, but it's the same idea. You start a pipeline with #enter_pipeline_mode, queue some queries in that pipeline, and can flush the pipelined queries whenever you want so they are sent all at once.

Here's the spec for it in the gem: https://github.com/ged/ruby-pg/blob/d072b21852865ecb84e6345df11d68eed50702bb/spec/pg/connection_spec.rb#L2009-L2122

Other databases

While pipelining isn't very commonly used with RDBMS, it has been used extensively for years with redis-rb and redis-client.

redis-client only offer a simple block based API, and return all the results at once in an Array:

result_1, result_2 = client.pipelined do |pipeline|
  pipeline.call("COMMAND 1")
  pipeline.call("COMMAND 2")
end

redis-rb offer some more convenience by having pipelined queries return a Future object as well.

future_1, future_2
result_1, result_2 = client.pipelined do |pipeline|
  future_1 = pipeline.command1
  future_2 = pipeline.command2
end
future_1.value == result_1
future_2.value == result_2

Some older versions of redis-rb also had a "queue" API I deprecated:

client.queue(:command_1)
client.queue(:command_2)
result_1, result_2 = client.commit

Use case

As mentioned in introduction, the use case for such API would be to optimized some specific cases, particularly around transactions. If your queries aren't in a transaction, you may simply want to perform the queries concurrently each in their own connection, but as soon as transactions are involved you can't do this anymore.

One specific place where I'd like to use this is in Active Record's transaction code. Right now Active Record's BEGIN queries are lazy, they are only send when the first query of the transaction is triggered.

If Trilogy had such pipelining API, we could send the BEGIN and the first query of the transaction in a pipeline, and save one roundtrip per transaction.

FYI @kirs
@matthewd @jhawthorn @composerinteralia @adrianna-chang-shopify @eileencodes any opinions / concerns about introducing such capability? I'm not set on any specific API design, but I'm happy to explore, etc.

The text was updated successfully, but these errors were encountered:

Ref: trilogy-libraries#189 This is the most basic API possible and uses a hack, but it demonstrate that it's possible to do pipelining with MySQL API.

casperisfine · 2024-08-07T11:59:21Z

I opened #197 to demonstrate how it's possible to pipeline queries. I went with the simplest possible API, but might be worth offering something more flexible perhaps. But as it stands, it would already be useful as a better alternative to multi-statements.

gmac · 2024-08-22T19:58:56Z

This would be hugely beneficial for certain types of GraphQL batching. GraphQL fields are commonly resolved in bulk across a set of records, and sometimes require a unique query per field that goes beyond the capabilities of simple key batching (ie: select * from my_things where id IN (1, 2, 3)).

As mentioned in the issue, the ideal solution would be concurrency. However, that assumes infra can deliver warmed connections quickly and can scale database connections by order of web workers. Using what we reliably have today (one connection per request), being able to pipeline batches of queries through that one connection to save hundreds of round trips would provide considerable advantage.

casperisfine mentioned this issue Aug 7, 2024

Preserve isolation level when retrying MySQL transactions rails/rails#52529

Merged

casperisfine pushed a commit to casperisfine/trilogy that referenced this issue Aug 7, 2024

PoC: Pipelining support

393ba85

Ref: trilogy-libraries#189 This is the most basic API possible and uses a hack, but it demonstrate that it's possible to do pipelining with MySQL API.

casperisfine mentioned this issue Aug 7, 2024

PoC: Pipelining support #197

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ruby] Add a Pipelining API? #189

[Ruby] Add a Pipelining API? #189

casperisfine commented Jun 5, 2024

casperisfine commented Aug 7, 2024

gmac commented Aug 22, 2024 •

edited

Loading

[Ruby] Add a Pipelining API? #189

[Ruby] Add a Pipelining API? #189

Comments

casperisfine commented Jun 5, 2024

Context

Existing Solution with Trilogy

Existing Solution with other DB clients

Mysql2

PG

Other databases

Use case

casperisfine commented Aug 7, 2024

gmac commented Aug 22, 2024 • edited Loading

gmac commented Aug 22, 2024 •

edited

Loading