DBTransaction with Pessimistic Locking & Optimistic Locking and Deadlock Detection

### Specification

The `DBTransaction` currently does not have any locking integrated. It is only a read-committed isolation-level transaction.

![image](https://user-images.githubusercontent.com/640797/166175531-0c160cde-acd1-400c-a3ef-f72e6ea0fe78.png)

On top of this, users have to know to use `iterators` and potentially multiple iterators to access the snapshot guarantee of leveldb, this is essential when iterating over one sublevel, and needing to access properties of another sublevel in a consistent way.

Right now users of this transaction is expected to use their own locks in order to prevent these additional phenomena:

* Repeatable read
* Phantom reads
* Lost-updates

Most of this comes down to locking a particular key that is being used, thus blocking other "threads" from starting transactions on those keys.

Key locking is required in these circumstances:

* Atomically reading multiple keys for a consistent "composite" read.
* Atomically writing multiple keys for a consistent "composite" write.
* Reading from a key, and then writing to a key a value that is derived from the read (like the counter problem)
* Atomically creating a new key-value, such that all operations also creating the same new key value coalesce and accept that that it has been created (this requires all creators to lock on the same "new" key)

Users are therefore doing something like this:

```ts
class SomeDomain {
  async someMethod(tran?: DBtransaction) {
    if (tran == null) {
      await withF([
        lockBox.lock(key1, key2), 
        db.transaction()
      ], async ([, tran]) => {
        return this.someMethod(tran);
      });
    }
    /*...*/
  }
}
```

Notice how if the transaction is passed in to `SomeDomain.someMethod`, then it doesn't bother creating its own transaction, but also doesn't bother locking `key1` and `key2`.

The problem with this pattern is that within a complex call graph, each higher-level call has to remember, or know what locks need to be locked before calling an transactional operation of `SomeDomain.someMethod`. As the hierarchy of the callgraph expands, this requirement to remember the locking context grows exponentially, and will make our programs too difficult and complex to debug.

There are 2 solutions to this:

1. Pessimistic Concurrency Control (PCC)
   - uses locks
   - requires a deadlock detector (otherwise you may introduce deadlocks)
   - locks should be locked in the same-order, horizontally within a call, and vertically across a callgraph
   - transactions can be tried automatically when deadlock is detected
2. Optimistic Concurrency Control (OCC)
   - does not use locks
   - requires snapshot guarantees
   - also referred to as "snapshot isolation" or "software transactional memory"
   - transactions may be retried when guarantee is not consistent, but this depends on the caller's discretion

The tradeoffs between the 2 approaches are summarised here: https://agirlamonggeeks.com/2017/02/23/optimistic-concurrency-vs-pessimistic-concurrency-short-comparison/

Big database software often combine these ideas together into their transaction system, and allow the user to configure their transactions for their application needs.

A quick and dirty solution for ourselves will follow more along how RocksDB implemented their transactions: https://www.sobyte.net/post/2022-01/rocksdb-tx/. And details here: https://github.com/MatrixAI/js-polykey/issues/294#issuecomment-1106081466.

#### Pessimistic Concurrency Control

I'm most familiar with pessimistic concurrency control, and we've currently designed many of our systems in PK to follow along. I'm curious whether OCC might be easier to apply to our PK programs, but we would need to have both transaction systems to test.

In terms of implementign PCC, we would need these things:

* Integrate the `LockBox` into `DBTransaction`
* The `LockBox` would need to be augmented to detect deadlocks and manage re-entrant locks
* Re-entrant locking means that multiple calls to lock `key1` within the same transaction will all succeed. This doesn't mean that `key1` is a semaphore, just that if it's already locked in the transaction, then this is fine to proceed.
* Deadlock detection by ensuring that all locking calls **always** have a timeout, when timedout, it must then check a lock metadata table for transactions that were holding the locks that this transaction needed, and throw an exception regarding this.
* Lock upgrades between read and write locks should also be considered, this means that if earlier call read-locked `key1`, a subsequent call can write-lock `key1` (but must take precedence over other blocked readers & writers), and subsequent calls to write-lock `key1` will also succeed. Lock downgrades will not be allowed.
* After receiving a deadlock exception, this should bubble up to the transaction creator (or the unit of atomic operation designated as the request handler of the application to automatically retry)
* All deadlocks detected are a programmer bug, but retrying should enable users to continue work. Therefore we may not do automatic retries and expect users to report the deadlock bug, and retry on their own discretion

As for optimistic transactions, we would do something possibly alot simpler: https://github.com/MatrixAI/js-polykey/issues/294#issuecomment-1106072194

Now there is already existing code that relies on how the db transactions work, namely the `EncryptedFS`.

Any updates to the `DBTransaction` should be backwards compatible. So that `EncryptedFS` can continue functioning as normal using its own locking system.

Therefore pessimistic and optimistic must be an opt-in.

For pessimistic, this may just mean adding some additional methods to the `DBTransaction` that ends up locking certain keys.

#### Optimistic Concurrency Control

For optimistic, this can just be an additional option parameter to the `db.transaction({ optimistic: true })` that makes it an optimistic transaction.

Because OCC transactions are meant to rely on the snapshot, this means every `get` call must read from the iterator. Because this can range over the entire DB, the `get` call must be done on the root of the DB.

But right now `iterator` also creates their own snapshot. It will be necessary that every iterator call is iterating from the same snapshot that was created at the beginning.

Right now this means users must start their iterators at the beginning of their transaction if they were to do that.

This might mean we need to change our "virtual iterator" in `DBTransaction` to seek to snapshot iterator and acquire the relevant value there. We would need to maintain separate cursors for each iterator, and ensure mutual exclusion on the snapshot iterator.

When using optimistic transactions, this means **every** transaction creates a snapshot. During low-concurrency states, this is not that bad, and I believe leveldb does some sort of COW. So it's not a full copy. During high-concurrency states, this means increased storage/memory usage for all the concurrent snapshots. It is very likely that transactional contexts are only created at the GRPC handler level, and quite likely we would have a low-concurrency state for majority of the time for each Polykey node.

Based on these ideas, it seems OCC should be less work to do then PCC.

### Additional context

* https://github.com/MatrixAI/js-polykey/issues/294 - this is the overall issue tackling how concurrency works in PK, note that even if OCC was supported by `DBTransaction`, the usage of locks and `LockBox` will still apply in other areas that may only be interacting with in-memory state
* https://github.com/MatrixAI/js-encryptedfs/pull/63 - the recent massive integration of the new "read committed" `DBTransaction` in to EFS, and its usage of LockBox
* https://www.sobyte.net/post/2022-01/rocksdb-tx/ - rocksdb implementation of pessimistic and optimistic concurrency
* https://www.mianshigee.com/tutorial/rocksdb-en/b603e47dd8805bbf.md - further details on rocksdb implementation, their default transaction is pessimistic and auto-locks every key that gets written to
* #4 - original issue discussing how snapshot isolation should be brought into DB, but we didn't fully understand its requirements or implications
* https://ongardie.net/blog/node-leveldb-transactions/ - original attempt at implementing snapshot-isolation (optimistic) transactions in leveldb
* http://ithare.com/databases-101-acid-mvcc-vs-locks-transaction-isolation-levels-and-concurrency/
* http://www.tiernok.com/posts/adding-index-for-a-key-value-store/

### Tasks

1. [ ] - Implement Snapshot Isolation and integrating it into `DBTransaction` - this should be enough to enable PK integration
2. [ ] - Enable advisory locking via `DBTransaction.lock()` call
3. [ ] - Ensure that `DBTransaction.lock` calls takes care or sorting, timeouts and re-entrancy
4. [ ] - Implement deadlock detection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DBTransaction with Pessimistic Locking & Optimistic Locking and Deadlock Detection #17

Specification

Pessimistic Concurrency Control

Optimistic Concurrency Control

Additional context

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DBTransaction with Pessimistic Locking & Optimistic Locking and Deadlock Detection #17

Description

Specification

Pessimistic Concurrency Control

Optimistic Concurrency Control

Additional context

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions