Description
Specification
The DBTransaction
currently does not have any locking integrated. It is only a read-committed isolation-level transaction.
On top of this, users have to know to use iterators
and potentially multiple iterators to access the snapshot guarantee of leveldb, this is essential when iterating over one sublevel, and needing to access properties of another sublevel in a consistent way.
Right now users of this transaction is expected to use their own locks in order to prevent these additional phenomena:
- Repeatable read
- Phantom reads
- Lost-updates
Most of this comes down to locking a particular key that is being used, thus blocking other "threads" from starting transactions on those keys.
Key locking is required in these circumstances:
- Atomically reading multiple keys for a consistent "composite" read.
- Atomically writing multiple keys for a consistent "composite" write.
- Reading from a key, and then writing to a key a value that is derived from the read (like the counter problem)
- Atomically creating a new key-value, such that all operations also creating the same new key value coalesce and accept that that it has been created (this requires all creators to lock on the same "new" key)
Users are therefore doing something like this:
class SomeDomain {
async someMethod(tran?: DBtransaction) {
if (tran == null) {
await withF([
lockBox.lock(key1, key2),
db.transaction()
], async ([, tran]) => {
return this.someMethod(tran);
});
}
/*...*/
}
}
Notice how if the transaction is passed in to SomeDomain.someMethod
, then it doesn't bother creating its own transaction, but also doesn't bother locking key1
and key2
.
The problem with this pattern is that within a complex call graph, each higher-level call has to remember, or know what locks need to be locked before calling an transactional operation of SomeDomain.someMethod
. As the hierarchy of the callgraph expands, this requirement to remember the locking context grows exponentially, and will make our programs too difficult and complex to debug.
There are 2 solutions to this:
- Pessimistic Concurrency Control (PCC)
- uses locks
- requires a deadlock detector (otherwise you may introduce deadlocks)
- locks should be locked in the same-order, horizontally within a call, and vertically across a callgraph
- transactions can be tried automatically when deadlock is detected
- Optimistic Concurrency Control (OCC)
- does not use locks
- requires snapshot guarantees
- also referred to as "snapshot isolation" or "software transactional memory"
- transactions may be retried when guarantee is not consistent, but this depends on the caller's discretion
The tradeoffs between the 2 approaches are summarised here: https://agirlamonggeeks.com/2017/02/23/optimistic-concurrency-vs-pessimistic-concurrency-short-comparison/
Big database software often combine these ideas together into their transaction system, and allow the user to configure their transactions for their application needs.
A quick and dirty solution for ourselves will follow more along how RocksDB implemented their transactions: https://www.sobyte.net/post/2022-01/rocksdb-tx/. And details here: MatrixAI/Polykey#294 (comment).
Pessimistic Concurrency Control
I'm most familiar with pessimistic concurrency control, and we've currently designed many of our systems in PK to follow along. I'm curious whether OCC might be easier to apply to our PK programs, but we would need to have both transaction systems to test.
In terms of implementign PCC, we would need these things:
- Integrate the
LockBox
intoDBTransaction
- The
LockBox
would need to be augmented to detect deadlocks and manage re-entrant locks - Re-entrant locking means that multiple calls to lock
key1
within the same transaction will all succeed. This doesn't mean thatkey1
is a semaphore, just that if it's already locked in the transaction, then this is fine to proceed. - Deadlock detection by ensuring that all locking calls always have a timeout, when timedout, it must then check a lock metadata table for transactions that were holding the locks that this transaction needed, and throw an exception regarding this.
- Lock upgrades between read and write locks should also be considered, this means that if earlier call read-locked
key1
, a subsequent call can write-lockkey1
(but must take precedence over other blocked readers & writers), and subsequent calls to write-lockkey1
will also succeed. Lock downgrades will not be allowed. - After receiving a deadlock exception, this should bubble up to the transaction creator (or the unit of atomic operation designated as the request handler of the application to automatically retry)
- All deadlocks detected are a programmer bug, but retrying should enable users to continue work. Therefore we may not do automatic retries and expect users to report the deadlock bug, and retry on their own discretion
As for optimistic transactions, we would do something possibly alot simpler: MatrixAI/Polykey#294 (comment)
Now there is already existing code that relies on how the db transactions work, namely the EncryptedFS
.
Any updates to the DBTransaction
should be backwards compatible. So that EncryptedFS
can continue functioning as normal using its own locking system.
Therefore pessimistic and optimistic must be an opt-in.
For pessimistic, this may just mean adding some additional methods to the DBTransaction
that ends up locking certain keys.
Optimistic Concurrency Control
For optimistic, this can just be an additional option parameter to the db.transaction({ optimistic: true })
that makes it an optimistic transaction.
Because OCC transactions are meant to rely on the snapshot, this means every get
call must read from the iterator. Because this can range over the entire DB, the get
call must be done on the root of the DB.
But right now iterator
also creates their own snapshot. It will be necessary that every iterator call is iterating from the same snapshot that was created at the beginning.
Right now this means users must start their iterators at the beginning of their transaction if they were to do that.
This might mean we need to change our "virtual iterator" in DBTransaction
to seek to snapshot iterator and acquire the relevant value there. We would need to maintain separate cursors for each iterator, and ensure mutual exclusion on the snapshot iterator.
When using optimistic transactions, this means every transaction creates a snapshot. During low-concurrency states, this is not that bad, and I believe leveldb does some sort of COW. So it's not a full copy. During high-concurrency states, this means increased storage/memory usage for all the concurrent snapshots. It is very likely that transactional contexts are only created at the GRPC handler level, and quite likely we would have a low-concurrency state for majority of the time for each Polykey node.
Based on these ideas, it seems OCC should be less work to do then PCC.
Additional context
- Fine Grained Concurrency Control Standardisation Polykey#294 - this is the overall issue tackling how concurrency works in PK, note that even if OCC was supported by
DBTransaction
, the usage of locks andLockBox
will still apply in other areas that may only be interacting with in-memory state - Upgrading @matrixai/async-init, @matrixai/async-locks, @matrixai/db, @matrixai/errors, @matrixai/workers and integrating @matrixai/resources js-encryptedfs#63 - the recent massive integration of the new "read committed"
DBTransaction
in to EFS, and its usage of LockBox - https://www.sobyte.net/post/2022-01/rocksdb-tx/ - rocksdb implementation of pessimistic and optimistic concurrency
- https://www.mianshigee.com/tutorial/rocksdb-en/b603e47dd8805bbf.md - further details on rocksdb implementation, their default transaction is pessimistic and auto-locks every key that gets written to
- Implement True Snapshot Isolation for LevelDB #4 - original issue discussing how snapshot isolation should be brought into DB, but we didn't fully understand its requirements or implications
- https://ongardie.net/blog/node-leveldb-transactions/ - original attempt at implementing snapshot-isolation (optimistic) transactions in leveldb
- http://ithare.com/databases-101-acid-mvcc-vs-locks-transaction-isolation-levels-and-concurrency/
- http://www.tiernok.com/posts/adding-index-for-a-key-value-store/
Tasks
- - Implement Snapshot Isolation and integrating it into
DBTransaction
- this should be enough to enable PK integration - - Enable advisory locking via
DBTransaction.lock()
call - - Ensure that
DBTransaction.lock
calls takes care or sorting, timeouts and re-entrancy - - Implement deadlock detection