-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of transaction parameter in Database's get and put methods #1
Comments
Hi, The API is slightly broken around that at present. Internally the I've looked at fixing this, but it requires some design that deviates from the underlying C API slightly which is why I've resisted (my initial use case also did not benefit from multiple transactions within a single Python process). We could however discuss that here. My idea would be to move The remainder of the Database get()/put()/etc. methods would then be moved to the Transaction object, and an optional db= parameter would be introduced to each of them. This is basically the inverse of your suggestion, but it more closely matches the underlying API.
I still hate having to pass the optional db= argument in, something doesn't feel right about it. |
I'm not sure what the right answer is, so I thought I might see what other packages over databases do. The below is just my quick survey. First, there happens to be a suggested standard python API to python databases: I don't think this applies to mdb at all; this seems to govern SQL-type databases. BerkeleyDB is very similar to mdb in its C API. ZODB seems to have a single global Transaction that you can continually commit to (without starting again). You can also have subtransactions but I don't see a way for parallel independent transactions. There's also a separate transaction package that implements context manager (e.g. I'll keep looking. I do like, and think is a common use case, an auto-commit mode that commits every operation. |
Hi again & thanks for the survey! It's really helpful. Your mention of PyBSDDB is particularly interesting, as you say, MDB is designed to clone Berkeley's interface, so there's no reason we shouldn't be doing the same in Python. I'm going on holiday for a few days, but will look more closely at PyBSDDB's interface when I return. DBAPI PEP is definitely a no-go, it doesn't make sense at all for BDB/MDB etc (unless you want to implement an SQL engine on top ;). |
Hi there, I have finally gotten around to updating the library. The old Cython binding is replaced with a cffi binding, enabling compatibility with PyPy. I settled on a compromise for the interface:
This means it's possible to work with the main database simply with:
Working with sub-databases can be accomplished by using the db= parameter. I believe this is a fair compromise, since most users will only want a single keyspace.
|
Finally, I forgot to mention, making txn= a parameter is pointless since unlike BDB, MDB does not support 'transactionless' operations, and emulating them would encourage bad behavior: users would be encouraged to make many transactions during a bulk insert without knowing the transactions exist, rather than explicitly being forced to think about how their transaction is formed. |
Is your goal to provide a mostly 1-to-1 translation of the C API, or a Pythonic wrapper? The Fwiw, in Plyvel (Python bindings to LevelDB; see https://plyvel.readthedocs.org/) I made the WriteBatch (similar to transaction it seems) specific to a database, while technically it is not linked to a database until the batch is written/applied. The end result is a more natural API (in my opinion), at the expense of slightly less flexibility. |
Hi Wouter, In LMDB it is impossible to iterate without starting a transaction, and the intent of the iteration dictates what kind of transaction should be created: in particular, a read-write transaction will block all other writers. Since one of the major benefits of LMDB over LevelDB is its support for interprocess concurrency, implicit write transactions would never make sense, and implicit read transactions require synchronization that becomes expensive – to the tune of 19.8 microseconds with my current wrapper (or about 50k requests/sec). I had thought about providing some kind of 'DatabaseTransaction' object that bound both objects together, but I could not think of a good use case where this would be beneficial in any meaningful way. With regard to transactions, opening a database potentially requires a write transaction, which means any 'Pythonic' (I detest that term - it is meaningless) interface that tries to blur these lines must deal with upgrading the user's read-only transaction as necessary, and suchlike. There is little room for compromise here that does not result in reduced concurrency, efficiency, or user surprise. The extra 6 characters is more than worth it IMHO. |
I forgot to add a further point: LMDB's Cursor object inherently binds to a specific database and supports a put operation, although it is not implemented yet. This roughly serves the case of having a 'bound writeable database within a transaction'. |
Looks fine to me. I didn't have any trouble getting your previous Cython binding built for pypy, at least enough for a smoke test. In a single-process lmdb benchmark I wrote, pypy ran 24% slower than regular python -O. However, a more serious multi-process benchmark I wrote running under pypy manages to get into what looks like deadlock, locking the database for good (even after restarting and accessing with regular python). I saw no such behavior with regular python. In other words, I welcome the cffi binding. Nic |
Hi Nic, Just a small ping to note the CPython binding has been completely rewritten in C, and has much better performance for basically every operation (simple tests get 600k random reads/sec). Right now CPython is faster than PyPy/cffi, although I'd like to make cffi competitive again somehow. |
I notice that Database.get and Database.put do not take a Transaction parameter. It sure would be nice if there were an optional field to allow a separate Transaction.
In order to use multiple transactions, do you intend the client of py-lmdb to have to close and re-open the Database object? That seems to me the only way to provide a new transaction handle to the underlying mdb_get and mdb_put methods. Or perhaps I don't understand something about mdb (for example I'm not sure why a transaction is passed into mdb_open: so the database creation itself can be rolled back?)
BTW, thanks for your work so far! I'm pleasantly impressed that I've see mdb have comparable performance to regular python dictionaries for my usage.
The text was updated successfully, but these errors were encountered: