AsyncIO support #86

techdragon · 2015-04-26T06:43:43Z

Is it possible to get clarification on either how to use this with AsyncIO, or if there is no special effort required, on the fact that it works.

Given how fast LMDB is, I'm sure a number of people will look at it with an eye towards non blocking asyncio usage.

dw · 2015-04-26T14:22:36Z

Hi there,

py-lmdb is not yet perfectly suitable for use in a single threaded, non-blocking mode for a variety of reasons, but see e.g. issue #65 - there is no ability to defer reads or writes to a worker pool without causing the GIL to be held for the duration of the IO.

I'll think about updating the docs somehow, but generally even despite the aforementioned GIL issue, there are too many moving parts here to document a decisive "yes" or "no" answer. e.g. read-only txns over a smaller-than-RAM database could only be guaranteed not to block if the sysadmin carefully audited & disabled where necessary other programs making use of the host OS page cache. Even after this is done, ensuring a DB remains "smaller than RAM" depends less on dataset size than it does on the number/duration of read txns combined with write txns (current LMDB cannot reuse existing pages while any reader exists.

Another option, at least for the read side, may be to mlock(2) LMDB's working set using the information returned by Environment.info(), but that requires yet more knowledge of how the host OS, and at least on Linux, probably also requires the database file to be fully allocated ahead of time.

As for async writes, unless you're writing to tmpfs or have sync completely disabled, that's probably not a good idea. I imagine you'd always want write txns to run from a blocking thread.

stephane-martin · 2015-06-05T14:31:29Z

Hello,

it's not very complicated to write a thin wrapper around lmdb to use it in async frameworks: just gotta defer real operations with the database in background threads, so that they dont block the main loop (hence my other issue about thread safety).

What I really miss to do so is some kind of method to wait/block until there really is a value available for a given key. WIthout such method i doomed to do things like:

while db.begin().get(key) is None:
    time.sleep(1)

I hate that, it's too ugly ;)

This seems to be the simplest, lowest overhead and most portable solution to the GIL page fault problem: simply loop over the MDB_val, touching one byte every 4kb. Regardless of whether the value lives in a leaf page or an overflow page, is 100 bytes or 64kb, this should ensure on all but the most heavily overloaded machines that the pages are in RAM before PyString_FromStringAndSize() attempts to copy them.

dw added a commit that referenced this issue Jun 6, 2015

CFFI: Touch pages of MDB_val with GIL released. (#65, #86)

772767f

dw closed this as completed in 76c1fbb Jun 14, 2015

dineshbvadhia mentioned this issue Jan 18, 2022

Re-visiting async await #316

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AsyncIO support #86

AsyncIO support #86

techdragon commented Apr 26, 2015

dw commented Apr 26, 2015

stephane-martin commented Jun 5, 2015

AsyncIO support #86

AsyncIO support #86

Comments

techdragon commented Apr 26, 2015

dw commented Apr 26, 2015

stephane-martin commented Jun 5, 2015