Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch executemany #295

Merged
merged 3 commits into from
Nov 26, 2020
Merged

Batch executemany #295

merged 3 commits into from
Nov 26, 2020

Conversation

fantix
Copy link
Member

@fantix fantix commented May 29, 2018

Refs #289, this is a rewrite of bind_execute_many():

  • Batch bind args into 4 x 32KB buffers, sent with writelines() once flow control turns green.
  • Send one SYNC at the very last - making use of the implicit transaction for the whole call to executemany().
  • Abort immediately on either client error (with explicit rollback) or server error.
  • Support async iterable.
  • Add executemany() to prepared statement.
  • test_server_failure_during_writes is failing on Windows
  • Fix unstable test_timeout (test_execute.TestExecuteMany)

This is a breaking change due to the implicit transaction.

executemany

The 3-level infinite loop keeps filling buffers and send them under flow control, the only exit is raising StopIteration with 3 possible values:

  • True means there was data sent in the last loop
  • False means the last loop sent nothing
  • exception means the data source raised an exception

Please note, during the loop, if server-side error is detected before exhausting the data source, it is also considered as StopIteration(True), ending the process early with a SYNC_MESSAGE.

(I wrote a post about the details of implicit transaction.)

UPDATED - pgbench results of inserting 1000 rows per query with executemany() on Python 3.6 of 2.2GHz 2015 MacBook Air (best out of 5 runs):

asyncpg 0.18.2

710 queries in 30.31 seconds
Latency: min 341.88ms; max 636.29ms; mean 425.022ms; std: 39.782ms (9.36%)
Latency distribution: 25% under 401.67ms; 50% under 414.26ms; 75% under 435.37ms; 90% under 478.39ms; 99% under 576.638ms; 99.99% under 636.299ms
Queries/sec: 23.42
Rows/sec: 23424.32

This PR:

4125 queries in 30.02 seconds
Latency: min 23.14ms; max 734.91ms; mean 72.723ms; std: 49.226ms (67.69%)
Latency distribution: 25% under 59.958ms; 50% under 65.414ms; 75% under 71.538ms; 90% under 80.95ms; 99% under 175.375ms; 99.99% under 734.912ms
Queries/sec: 137.39
Rows/sec: 137389.64

Most branches are covered by tests, overall coverage 85% (1352 miss) -> 85% (1356 miss).

UPDATED - Python 3.8 of 2.3GHz 2020 MacBook Pro:

截屏2020-11-24 17 21 26

@fantix fantix changed the title [WIP] Batch executemany Batch executemany May 29, 2018
fantix added a commit to fantix/asyncpg that referenced this pull request May 30, 2018
Copy link
Member

@elprans elprans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the transaction status check, LGTM! Thanks!

asyncpg/protocol/protocol.pyx Outdated Show resolved Hide resolved
asyncpg/protocol/protocol.pyx Outdated Show resolved Hide resolved
asyncpg/protocol/protocol.pyx Outdated Show resolved Hide resolved
@fantix
Copy link
Member Author

fantix commented Jun 23, 2018

Update - I've met a racing condition issue during fixing tests. Trying to find time to fix that.

@elprans
Copy link
Member

elprans commented Aug 28, 2018

Hey @fantix! Are you still working on this?

@fantix
Copy link
Member Author

fantix commented Aug 28, 2018

Hi! Yes I’m trying to find some time to fix the racing issue, sorry for the delay!

@elprans
Copy link
Member

elprans commented Sep 18, 2018

@fantix, I just merged #361 and #362, which may help improve the situation with the race condition you were seeing.

@fantix
Copy link
Member Author

fantix commented Oct 25, 2018

Working on this now - rebased and I'll review the changes, and deal with the racing issue.

@fantix fantix force-pushed the t289_batch_executemany branch 5 times, most recently from f62a347 to 06b412b Compare November 26, 2018 12:24
@fantix
Copy link
Member Author

fantix commented Nov 26, 2018

Sorry for the delay! This PR is updated now. Benchmark shows the improvement is consistent. I've also added a diagram to explain the details.

@fantix fantix force-pushed the t289_batch_executemany branch 7 times, most recently from 46ecc07 to 6c206e9 Compare November 28, 2018 10:05
@elprans elprans mentioned this pull request Mar 22, 2019
@elprans
Copy link
Member

elprans commented May 8, 2019

@fantix The test_timeout test is flaky because of pipelining. All commands, including the final Sync can actually go in under 500ms, so the changes are persisted despite the TimeoutError. You might get better luck by passing a generator with time.sleep() in it to executemany().

@robd003
Copy link

robd003 commented Sep 29, 2020

@elprans any chance this can get merged in after the conflict in asyncpg/protocol/protocol.pyx is fixed?

@gtlanigan315
Copy link

Checking in to see if this PR is still planning on being merged

@elprans
Copy link
Member

elprans commented Oct 29, 2020

@fantix, are you interested in getting this in shape to be merged?

@cmflynn
Copy link

cmflynn commented Oct 31, 2020

bumping this PR. it would be awesome to see it finally merged!! lets goooo ^^

@fantix
Copy link
Member Author

fantix commented Oct 31, 2020

Let me try this weekend!

@fantix fantix force-pushed the t289_batch_executemany branch from 4be5198 to a3bfcff Compare November 2, 2020 05:04
@fantix
Copy link
Member Author

fantix commented Nov 2, 2020

Only rebased - I do want to revisit this to make sure everything is still alright.

@fantix fantix force-pushed the t289_batch_executemany branch from a3bfcff to 557f515 Compare November 24, 2020 06:41
Now `Bind` and `Execute` pairs are batched into 4 x 32KB buffers to take
advantage of `writelines()`. A single `Sync` is sent at last, so that
all args live in the same transaction.

Closes: MagicStack#289
@fantix fantix force-pushed the t289_batch_executemany branch from 557f515 to b137184 Compare November 24, 2020 18:34
@fantix fantix force-pushed the t289_batch_executemany branch from b137184 to 39dfbb4 Compare November 24, 2020 22:51
@fantix
Copy link
Member Author

fantix commented Nov 24, 2020

I've rewritten the core part to be more readable, and included @elprans 's commit to fix the timeout, and added one more test. If the Travis CI passes, I think it'll be good for another review.

@robd003
Copy link

robd003 commented Nov 25, 2020

Looks like all tests passed! Can we get this merged in?

@elprans elprans merged commit 690048d into MagicStack:master Nov 26, 2020
@elprans
Copy link
Member

elprans commented Nov 26, 2020

LGTM and merged. @fantix, thank you very much for working on this!

@fantix fantix deleted the t289_batch_executemany branch November 27, 2020 00:12
elprans added a commit that referenced this pull request Dec 21, 2020
A new asyncpg release is here, just in time for Christmas.  Notable
additions include Python 3.9 support, support for recently added
PostgreSQL types like `jsonpath`, and last but not least, vastly
improved `executemany()` performance.  Importantly, `executemany()` is
also now _atomic_, which means that either all iterations succeed, or
none at all, whereas previously partial results would have remained in
place, unless `executemany()` was called in a transaction.

There is also the usual assortment of improvements and bugfixes, see the
details below.

This is the last release of asyncpg that supports Python 3.5, which is
has reached EOL in September.

Improvements
------------

* Vastly speedup executemany by batching protocol messages (#295)
  (by @fantix in 690048d for #295)

* Allow using custom Record class
  (by @elprans in db4f1a6 for #577)

* Add Python 3.9 support (#610)
  (by @elprans in c05d726 for #610)

* Prefer SSL connections by default (#660)
  (by @elprans in 16183aa for #660)

* Add codecs for a bunch of new builtin types (#665)
  (by @elprans in b53f038 for #665)

* Expose Pool as asyncpg.Pool (#669)
  (by @rugleb in 0e0eb8d for #669)

Fixes
-----

* Add a workaround for bpo-37658
  (by @elprans in 2bac166 for #21894)

* Fix wrong default transaction isolation level (#622)
  (by @fantix in 4a627d5 for #622)

* Fix set_type_codec() to accept standard SQL type names (#619)
  (by @elprans in 68b40cb for #619)

* Ignore custom data codec for internal introspection (#618)
  (by @fantix in e064f59 for #618)

* Fix null/NULL quoting in array text encoder (#627)
  (by @fantix in 92aa806 for #627)

* Fix link in connect docstring (#653)
  (by @samuelcolvin in 8b313bd for #653)

* Make asyncpg work with pyinstaller (#651)
  (by @Atem18 in 5ddabb1 for #651)

* Fix possible AttributeError exception in `ConnectionSettings` (#632)
  (by @petriborg in 0d23182 for #632)

* Prohibit custom codecs on domains
  (by @elprans in 50f964f for #457)

* Raise proper error on anonymous composite input (tuple arguments) (#664)
  (by @elprans in 7252dbe for #664)

* Fix incorrect application of custom codecs in some cases (#662)
  (by @elprans in 50f65fb for #662)
elprans added a commit that referenced this pull request Dec 21, 2020
A new asyncpg release is here, just in time for Christmas.  Notable
additions include Python 3.9 support, support for recently added
PostgreSQL types like `jsonpath`, and last but not least, vastly
improved `executemany()` performance.  Importantly, `executemany()` is
also now _atomic_, which means that either all iterations succeed, or
none at all, whereas previously partial results would have remained in
place, unless `executemany()` was called in a transaction.

There is also the usual assortment of improvements and bugfixes, see the
details below.

This is the last release of asyncpg that supports Python 3.5, which is
has reached EOL in September.

Improvements
------------

* Vastly speedup executemany by batching protocol messages (#295)
  (by @fantix in 690048d for #295)

* Allow using custom `Record` class
  (by @elprans in db4f1a6 for #577)

* Add Python 3.9 support (#610)
  (by @elprans in c05d726 for #610)

* Prefer SSL connections by default (#660)
  (by @elprans in 16183aa for #660)

* Add codecs for a bunch of new builtin types (#665)
  (by @elprans in b53f038 for #665)

* Expose Pool as `asyncpg.Pool` (#669)
  (by @rugleb in 0e0eb8d for #669)

Fixes
-----

* Add a workaround for bpo-37658
  (by @elprans in 2bac166 for #21894)

* Fix wrong default transaction isolation level (#622)
  (by @fantix in 4a627d5 for #622)

* Fix `set_type_codec()` to accept standard SQL type names (#619)
  (by @elprans in 68b40cb for #619)

* Ignore custom data codec for internal introspection (#618)
  (by @fantix in e064f59 for #618)

* Fix null/NULL quoting in array text encoder (#627)
  (by @fantix in 92aa806 for #627)

* Fix link in connect docstring (#653)
  (by @samuelcolvin in 8b313bd for #653)

* Make asyncpg work with pyinstaller (#651)
  (by @Atem18 in 5ddabb1 for #651)

* Fix possible `AttributeError` exception in `ConnectionSettings` (#632)
  (by @petriborg in 0d23182 for #632)

* Prohibit custom codecs on domains
  (by @elprans in 50f964f for #457)

* Raise proper error on anonymous composite input (tuple arguments) (#664)
  (by @elprans in 7252dbe for #664)

* Fix incorrect application of custom codecs in some cases (#662)
  (by @elprans in 50f65fb for #662)
elprans added a commit that referenced this pull request Dec 21, 2020
A new asyncpg release is here, just in time for Christmas.  Notable
additions include Python 3.9 support, support for recently added
PostgreSQL types like `jsonpath`, and last but not least, vastly
improved `executemany()` performance.  Importantly, `executemany()` is
also now _atomic_, which means that either all iterations succeed, or
none at all, whereas previously partial results would have remained in
place, unless `executemany()` was called in a transaction.

There is also the usual assortment of improvements and bugfixes, see the
details below.

This is the last release of asyncpg that supports Python 3.5, which has
reached EOL in September.

Improvements
------------

* Vastly speedup executemany by batching protocol messages (#295)
  (by @fantix in 690048d for #295)

* Allow using custom `Record` class
  (by @elprans in db4f1a6 for #577)

* Add Python 3.9 support (#610)
  (by @elprans in c05d726 for #610)

* Prefer SSL connections by default (#660)
  (by @elprans in 16183aa for #660)

* Add codecs for a bunch of new builtin types (#665)
  (by @elprans in b53f038 for #665)

* Expose Pool as `asyncpg.Pool` (#669)
  (by @rugleb in 0e0eb8d for #669)

Fixes
-----

* Add a workaround for bpo-37658
  (by @elprans in 2bac166 for #21894)

* Fix wrong default transaction isolation level (#622)
  (by @fantix in 4a627d5 for #622)

* Fix `set_type_codec()` to accept standard SQL type names (#619)
  (by @elprans in 68b40cb for #619)

* Ignore custom data codec for internal introspection (#618)
  (by @fantix in e064f59 for #618)

* Fix null/NULL quoting in array text encoder (#627)
  (by @fantix in 92aa806 for #627)

* Fix link in connect docstring (#653)
  (by @samuelcolvin in 8b313bd for #653)

* Make asyncpg work with pyinstaller (#651)
  (by @Atem18 in 5ddabb1 for #651)

* Fix possible `AttributeError` exception in `ConnectionSettings` (#632)
  (by @petriborg in 0d23182 for #632)

* Prohibit custom codecs on domains
  (by @elprans in 50f964f for #457)

* Raise proper error on anonymous composite input (tuple arguments) (#664)
  (by @elprans in 7252dbe for #664)

* Fix incorrect application of custom codecs in some cases (#662)
  (by @elprans in 50f65fb for #662)
@elprans elprans mentioned this pull request Dec 21, 2020
elprans added a commit that referenced this pull request Dec 21, 2020
A new asyncpg release is here, just in time for Christmas.  Notable
additions include Python 3.9 support, support for recently added
PostgreSQL types like `jsonpath`, and last but not least, vastly
improved `executemany()` performance.  Importantly, `executemany()` is
also now _atomic_, which means that either all iterations succeed, or
none at all, whereas previously partial results would have remained in
place, unless `executemany()` was called in a transaction.

There is also the usual assortment of improvements and bugfixes, see the
details below.

This is the last release of asyncpg that supports Python 3.5, which has
reached EOL in September.

Improvements
------------

* Vastly speedup executemany by batching protocol messages (#295)
  (by @fantix in 690048d for #295)

* Allow using custom `Record` class
  (by @elprans in db4f1a6 for #577)

* Add Python 3.9 support (#610)
  (by @elprans in c05d726 for #610)

* Prefer SSL connections by default (#660)
  (by @elprans in 16183aa for #660)

* Add codecs for a bunch of new builtin types (#665)
  (by @elprans in b53f038 for #665)

* Expose Pool as `asyncpg.Pool` (#669)
  (by @rugleb in 0e0eb8d for #669)

Fixes
-----

* Add a workaround for bpo-37658
  (by @elprans in 2bac166 for #21894)

* Fix wrong default transaction isolation level (#622)
  (by @fantix in 4a627d5 for #622)

* Fix `set_type_codec()` to accept standard SQL type names (#619)
  (by @elprans in 68b40cb for #619)

* Ignore custom data codec for internal introspection (#618)
  (by @fantix in e064f59 for #618)

* Fix null/NULL quoting in array text encoder (#627)
  (by @fantix in 92aa806 for #627)

* Fix link in connect docstring (#653)
  (by @samuelcolvin in 8b313bd for #653)

* Make asyncpg work with pyinstaller (#651)
  (by @Atem18 in 5ddabb1 for #651)

* Fix possible `AttributeError` exception in `ConnectionSettings` (#632)
  (by @petriborg in 0d23182 for #632)

* Prohibit custom codecs on domains
  (by @elprans in 50f964f for #457)

* Raise proper error on anonymous composite input (tuple arguments) (#664)
  (by @elprans in 7252dbe for #664)

* Fix incorrect application of custom codecs in some cases (#662)
  (by @elprans in 50f65fb for #662)
elprans added a commit that referenced this pull request Feb 10, 2021
A new asyncpg release is here.

Notable additions include Python 3.9 support, support for recently added
PostgreSQL types like `jsonpath`, and last but not least, vastly
improved `executemany()` performance.  Importantly, `executemany()` is
also now _atomic_, which means that either all iterations succeed, or
none at all, whereas previously partial results would have remained in
place, unless `executemany()` was called in a transaction.

There is also the usual assortment of improvements and bugfixes, see the
details below.

This is the last release of asyncpg that supports Python 3.5, which has
reached EOL last September.

Improvements
------------

* Vastly speedup executemany by batching protocol messages (#295)
  (by @fantix in 690048d for #295)

* Allow using custom `Record` class
  (by @elprans in db4f1a6 for #577)

* Add Python 3.9 support (#610)
  (by @elprans in c05d726 for #610)

* Prefer SSL connections by default (#660)
  (by @elprans in 16183aa for #660)

* Add codecs for a bunch of new builtin types (#665)
  (by @elprans in b53f038 for #665)

* Expose Pool as `asyncpg.Pool` (#669)
  (by @rugleb in 0e0eb8d for #669)

* Avoid unnecessary overhead during connection reset (#648)
  (by @kitogo in ff5da5f for #648)

Fixes
-----

* Add a workaround for bpo-37658
  (by @elprans in 2bac166 for #21894)

* Fix wrong default transaction isolation level (#622)
  (by @fantix in 4a627d5 for #622)

* Fix `set_type_codec()` to accept standard SQL type names (#619)
  (by @elprans in 68b40cb for #619)

* Ignore custom data codec for internal introspection (#618)
  (by @fantix in e064f59 for #618)

* Fix null/NULL quoting in array text encoder (#627)
  (by @fantix in 92aa806 for #627)

* Fix link in connect docstring (#653)
  (by @samuelcolvin in 8b313bd for #653)

* Make asyncpg work with pyinstaller (#651)
  (by @Atem18 in 5ddabb1 for #651)

* Fix possible `AttributeError` exception in `ConnectionSettings` (#632)
  (by @petriborg in 0d23182 for #632)

* Prohibit custom codecs on domains
  (by @elprans in 50f964f for #457)

* Raise proper error on anonymous composite input (tuple arguments) (#664)
  (by @elprans in 7252dbe for #664)

* Fix incorrect application of custom codecs in some cases (#662)
  (by @elprans in 50f65fb for #662)
dmig pushed a commit to dmig/asyncpg that referenced this pull request Feb 22, 2021
A new asyncpg release is here.

Notable additions include Python 3.9 support, support for recently added
PostgreSQL types like `jsonpath`, and last but not least, vastly
improved `executemany()` performance.  Importantly, `executemany()` is
also now _atomic_, which means that either all iterations succeed, or
none at all, whereas previously partial results would have remained in
place, unless `executemany()` was called in a transaction.

There is also the usual assortment of improvements and bugfixes, see the
details below.

This is the last release of asyncpg that supports Python 3.5, which has
reached EOL last September.

Improvements
------------

* Vastly speedup executemany by batching protocol messages (MagicStack#295)
  (by @fantix in 690048d for MagicStack#295)

* Allow using custom `Record` class
  (by @elprans in db4f1a6 for MagicStack#577)

* Add Python 3.9 support (MagicStack#610)
  (by @elprans in c05d726 for MagicStack#610)

* Prefer SSL connections by default (MagicStack#660)
  (by @elprans in 16183aa for MagicStack#660)

* Add codecs for a bunch of new builtin types (MagicStack#665)
  (by @elprans in b53f038 for MagicStack#665)

* Expose Pool as `asyncpg.Pool` (MagicStack#669)
  (by @rugleb in 0e0eb8d for MagicStack#669)

* Avoid unnecessary overhead during connection reset (MagicStack#648)
  (by @kitogo in ff5da5f for MagicStack#648)

Fixes
-----

* Add a workaround for bpo-37658
  (by @elprans in 2bac166 for #21894)

* Fix wrong default transaction isolation level (MagicStack#622)
  (by @fantix in 4a627d5 for MagicStack#622)

* Fix `set_type_codec()` to accept standard SQL type names (MagicStack#619)
  (by @elprans in 68b40cb for MagicStack#619)

* Ignore custom data codec for internal introspection (MagicStack#618)
  (by @fantix in e064f59 for MagicStack#618)

* Fix null/NULL quoting in array text encoder (MagicStack#627)
  (by @fantix in 92aa806 for MagicStack#627)

* Fix link in connect docstring (MagicStack#653)
  (by @samuelcolvin in 8b313bd for MagicStack#653)

* Make asyncpg work with pyinstaller (MagicStack#651)
  (by @Atem18 in 5ddabb1 for MagicStack#651)

* Fix possible `AttributeError` exception in `ConnectionSettings` (MagicStack#632)
  (by @petriborg in 0d23182 for MagicStack#632)

* Prohibit custom codecs on domains
  (by @elprans in 50f964f for MagicStack#457)

* Raise proper error on anonymous composite input (tuple arguments) (MagicStack#664)
  (by @elprans in 7252dbe for MagicStack#664)

* Fix incorrect application of custom codecs in some cases (MagicStack#662)
  (by @elprans in 50f65fb for MagicStack#662)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants