-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Combine args of executemany()
in batches
#289
Conversation
Yes, although I wouldn't do that exhaustively. Instead, I think, it makes sense to batch queries. Say one wants to execute 990 queries, we can do that in 10 steps by with batch size set to 100. This preserves backwards compatibility and allows to execute thousands of queries via a single await call efficiently. I'm -1 on the current solution.
I'd say that asyncpg is a low-level interface and shouldn't provide any transactional guarantees. Wrapping an |
I think adding |
Got it, thanks! I'm with you. Will update the PR. |
Wait ;) I'll post an update on this in 30 minutes ;) |
No hurry 😃 |
Quoting a comment from the PR:
IIRC combining Postgres messages in one write doesn't guarantee atomicity. Only using explicit
A single network packet isn't that big usually. The PR combines all data in a single buffer, which can result in many packets sent with arbitrary delays between them by the OS. A few more comments: Batching by number of queries is actually suboptimal. Packed arguments of a single query can require an arbitrary big buffer, so if we try to guess an optimal a number of queries per batch we can end up in a situation when our write buffers are too small or too big. Giving users a configurable option also doesn't make a lot of sense. Currently we simply send queries one by one. This PR changes that to accumulate all of them in a single big write. This is wrong as it effectively disables flow control when a large number of queries is batched. Instead, I propose to batch writes in 32kb blocks. A single block can fit many or just one query. The block size won't be configurable (it will be a constant in We can later optimize this by creating up to four 32kb blocks at a time and calling Now, adding batching isn't, strictly speaking, a backwards compatible change. Ideally, we should add a new keyword-only @elprans your thoughts? |
It must be pretty late in the night at your place. Thanks for the reply!
Combining doesn't, using a single
And the
Oh yes, thanks for the correction!
That would be awesome! I'll look into uvloop code for this. |
Interesting, thanks for looking into this! I'll also take a look at postgres source tomorrow. In any case, I now think we should always batch queries in
So batching might change how many queries are committed before the failed one. In practice, though, it shouldn't really matter. Usually a user won't be able to pinpoint the exact point of failure anyways. Either they do handle errors in Therefore we can probably always batch and we should better document Lastly, it would be interesting to implement the basic batching and run benchmarks. I expect to see a nice perf improvement, but if we don't see any then we don't want to complicate the code. |
I think we should either send one I'm leaning toward making |
Alright, let's try to send one Here's a concrete plan:
Strictly speaking this is a backwards incompatible change:
In practice, I doubt that there are valid use cases where a user wants a partial commit, so we can just go forward with the proposal and call it a bugfix and perf-improvement. |
Hi guys, I want to share with you my recent observation which may be relevant to your discussion here. I have an
I thought For me the only way to avoid dead locks now is to rewrite one-line code to a loop, which will also execute slower. |
@sergeyspatar Currently,
If doing consistent updates is not possible, then retrying on a |
Sorry I'm a bit confused - a transaction with a single import asyncio
import asyncpg
dsn = 'postgresql://localhost/postgres'
async def tx():
conn = await asyncpg.connect(dsn)
while True:
async with conn.transaction():
print('in transaction')
# update id=2 first to cause a deadlock
await conn.execute(
'UPDATE test_deadlock SET value = 20 WHERE id = 2')
print('after update 2')
await conn.execute(
'UPDATE test_deadlock SET value = 10 WHERE id = 1')
print('after update 1')
async def main():
# initialize table and data for the test
conn = await asyncpg.connect(dsn)
try:
await conn.execute(
'CREATE TABLE test_deadlock (id INT PRIMARY KEY, value INT)')
await conn.execute('INSERT INTO test_deadlock VALUES (1, null)')
await conn.execute('INSERT INTO test_deadlock VALUES (2, null)')
except asyncpg.DuplicateTableError:
pass
# start concurrent coroutine and set exit condition
running = [True]
asyncio.ensure_future(tx()).add_done_callback(lambda fut: running.clear())
while running:
print('before executemany')
await conn.executemany(
'UPDATE test_deadlock SET value = $1 WHERE id = $2', [
(1, 1),
(2, 2),
])
print('after executemany')
asyncio.get_event_loop().run_until_complete(main()) However it fails randomly in milliseconds with
Also tried with 128 rows in the table, same result. Therefore @sergeyspatar could you please share your asyncpg version? Are there any database triggers on the table? But anyway, suggestion from @elprans is definitely a good practice to follow. |
@1st1 I tweaked pgbench a bit to test asyncpg 0.15.0
This PR just for testing:
Seems to be pretty improving! Did a rough fix according to the "concrete plan" above, and the result is about the same (each row is 133 bytes, 5 batches per query):
|
executemany()
in a batchexecutemany()
in batches
If your statement causes functions or triggers to fire, or anything that increases the Also, |
Ah that explains :) thanks |
Great! A green light from me to try to implement #289 (comment). |
You are right. A trigger on this table was updating another table... I changed the logic and no dead locks so far. Sorry for a false alarm. |
Closing in favor of #295 |
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: #289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. Closes: MagicStack#289
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take advantage of `writelines()`. A single `Sync` is sent at last, so that all args live in the same transaction. pgbench results of inserting 1000 rows per query with executemany() on Python 3.6 of 2.2GHz 2015 MacBook Air (best out of 5 runs): asyncpg 0.18.2: 710 queries in 30.31 seconds Latency: min 341.88ms; max 636.29ms; mean 425.022ms; std: 39.782ms (9.36%) Latency distribution: 25% under 401.67ms; 50% under 414.26ms; 75% under 435.37ms; 90% under 478.39ms; 99% under 576.638ms; 99.99% under 636.299ms Queries/sec: 23.42 Rows/sec: 23424.32 This patch: 4125 queries in 30.02 seconds Latency: min 23.14ms; max 734.91ms; mean 72.723ms; std: 49.226ms (67.69%) Latency distribution: 25% under 59.958ms; 50% under 65.414ms; 75% under 71.538ms; 90% under 80.95ms; 99% under 175.375ms; 99.99% under 734.912ms Queries/sec: 137.39 Rows/sec: 137389.64 This is a backwards incompatible change. Here `executemany()` becomes atomic, whereas previously any error in the middle of argument iteration would retain the results of the preceding set of arguments unless an explicit transaction block was used. Closes: #289
Is it a good idea to combine all the args in
executemany()
into asingle network packetsingle buffer to send (thx Yury) with a list ofBind
andExecute
command pairs? And, is it a good idea to make this method atomic? Please kindly give me some advices when time. Many thanks!Sync
command, that meansexecutemany()
will be atomic in an implicit transaction, if not called from an existing transaction.executemany()
to prepared statement.References:
executemany
#36