[BUG] Transaction can't guarantee atomicity #487

ShooterIT · 2022-02-16T12:44:24Z

Describe the bug
The exclusive guard lock ensures you run the transaction in isolation, but it cannot guarantee the atomicity of the EXEC stage.
You would like to either have all commands applied to the db or none of them applied.
However, if some commands are executed and persisted to rocksdb and the process fails before the tx completes the db sees an intermediate state which is not desired.

Expected behavior
Guarantee the atomicity and consistency.

Eshcar · 2022-02-16T14:38:31Z

Thanks for opening this issue @ShooterIT
As of late 2021 RocksDB supports transactions , such that no other threads can see the changes in a transaction until it has been committed.
I didnt check the code yet but I assume they already take care of consistency also in case of a failure during the commit phase.

I suggest to use the API they are providing.
This would also allow you to remove the global lock you are using to protect the EXEC stage.

ShooterIT · 2022-02-17T03:11:19Z

Thanks @Eshcar Yes, i know RocksDB supports, but i worry transaction db would break kvrocks db usage, I will rethink it.
BTW, initially we just want to implement transaction like Redis

Eshcar · 2022-02-17T12:47:19Z

@ShooterIT - can you demonstrate a problematic scenario that will break other kvrocks commands when using rocksdb txs?
I would be happy to assist in understanding these cases.

ShooterIT · 2022-03-01T02:33:51Z

@Eshcar Sorry for the slow reply.

Currently, kvrocks directly access db when reading data from db, and write a writebatch when writing data. It doesn't care about rocksdb txs, if we want to use rocksdb txs, there are too many places that is needed to change. And in some places, we directly access DB for checkpoint, WAL..., So i am not sure we could use transaction db easily. If you have ideas, welcome to discuss.

Eshcar · 2022-03-02T20:16:11Z

Indeed, supporting tx atomicity will create changes to many kvrocks files, if not all. But it has the potential of improving the quality of the code.

Here is my suggestion. At the core of it is the principle of encapsulation:

db_ and storage_ fields of Database class should be private, and not be visible/accessible to all inheriting classes.
Database should include an abstract DbHandler that is responsible for calling db_ and storage_ methods.
SimpleDBHandler and TxDBHandler inherit from DbHandler and implement its API.
When a redis command object is created its dbHandler is set (after calling the c-tor), according to the context.
When running simple commands – set simpleDBHandler; when running a tx, all queued commands in the multi phase are set w the same txDBhandler.

This is the list of methods that reference db_ or storage_ and need to be covered by the handler:
db_:

LatestSnapShot ss(db_)
db_->Get(...)
db_->NewIterator(…)

storage_:

LockGuard guard(…)
batch.Put(…)
storage_->Write()
storage_->GetLockManager()
storage_->Delete()
storage_->IsSlotIdEncoded()
storage_->DeleteRange()

Maybe there are some more methods that I missed.

An example usage of the dbHandler would be:
[Examples of how to replace the code in inheriting commands, and how to implement the methods:]

Replace db_->Get() with a dbHandler->get()

SimpleDbHandler implements this by db_->Get()
TxDbHandler implement this by txn_db->Get(), where txn_db is opened by TransactionDB::Open when the transaction begins.

Replace batch.put() with dbHandler->batchPut()

SimpleDbHandler implements this by batch.put(), where the batch is only used by this command
TxDbHandler implement this by txn_db->Put(), where the same batch is used by all queued command

Replace storage_->Write() with dbHandler->write()

SimpleDbHandler implements this by storage_->Write()
TxDbHandler implement this by txn->Commit(), where txn is created by txn_db->BeginTransaction(), when the tx begins.

So the code and flow for commands is similar to the current code, the only difference is how specific calls are handled at the db_ level based on the context (tx or simple).

Do you see any inherent limitation in this suggestion? (except for the fact that it requires major refactoring of the code).

ShooterIT · 2022-03-22T14:39:15Z

Impressive idea, truly thanks @Eshcar

Recently, I always thought that, it will be really cool to support ACID transaction for KV NoSql. Even I also want to support interactive transaction, and we can support more features transaction, such as optional sync, isolation level.

MULTI sync yes/no  isolation  ReadCommitted/RepeatableRead
xxx
xxx
EXEC

sync: we call fsync after transactions, i.e. truly guarantee data persistent in disk
isolation: we can easily implement different isolation using rocksdb capacity, of course, we don't support this in the first version.

For code refactor of storage, i ever had a big idea, i think it is not good to directly access db_(pointer of RocksDB) in different data type implementation(redis_zset.cc, redis_hash.cc...), we should not expose it. I think we should provide some simple interfaces in storage layer. such as Put(support multi keys), Get(support multi keys),Del(support multi keys), RangeRead, RangeDelete. Data type implementation layer just call these functions, and we could easily change these function implementation even we can change different storage engine. Of course, we need to change too many places.

Your suggested method is much accessible i think, though it also breaks many places. Recently, i may don't have enough spare time to do this job, i think you can have a try if you want.

Eshcar · 2022-03-27T11:33:54Z

Thanks @ShooterIT
I do not have the capacity to implement/test/benchmark this solution.
I would be happy to work with/guide anyone from the community that is willing to implement this change.
Let me update soon if I can allocate alternative resources for this effort.

git-hulk · 2023-02-26T15:40:52Z

Hi, @Eshcar @ShooterIT I have an idea to workaround this bug. We can add a new function to create a shared WriteBatch for the Multi-Exec command like the below:

Status Storage::Begin() {
    is_txn_mode = true;
    txn_write_batch = make_shared<*rocksdb::WriteBatch>();
}

rocksdb::WriteBatch *Storage::GetWriteBatch() {
   if (is_txn_mode) {
      return txn_write_batch.get();
   }
   return make_shared<*rocksdb::WriteBatch>();
}

Status Storage::Commit() {
    is_txn_mode = false;
    // write txn WriteBatch to RocksDB
    txn_write_batch.reset();
}

For the Command Exec, we need to explicitly call the Begin() and Commit() to enter and leave the transaction mode. So that Kvrocks will create a new WriteBatch for each writes operation if it's NOT in transaction mode, and use the shared WriteBatch to collect all write operations if it's in the transaction mode.

ShooterIT · 2023-02-27T00:49:27Z

it is limited, in transaction, there also are some reading operations, we also hope we can read committed writing operations, maybe you can implement searching WAL(rocksdb supports it) to support this feature, but we can't resolve conflicts when we commit transaction. so i prefer to just use rocksdb transaction to implement real transaction.

git-hulk · 2023-02-27T02:00:10Z

Yes, RocksDB also supports the WriteBatchWithIndex to implement the Read-Your-Own-Write feature, but it needs to involve all read operations, so I didn't plan to do it in the current stage. Does it make sense to make write operations atomic first, then involve read operations?

Refer: https://rocksdb.org/blog/2015/02/27/write-batch-with-index.html

ShooterIT · 2023-02-27T02:17:20Z

but if you don't commit write(pending in txn_write_batch), you can't Read-Your-Own-Write in normal mode.

git-hulk · 2023-02-27T02:30:21Z

@ShooterIT Do you mean WriteBatchWithIndex can only be used in transaction mode? I see the MongoDB will search the txn_write_batch first if exists then from the DB, so it should work if it can be used in the normal model.

ShooterIT · 2023-02-27T03:07:59Z

RocksDB also supports the WriteBatchWithIndex to implement the Read-Your-Own-Write feature, but it needs to involve all read operations, so I didn't plan to do it in the current stage

You said you don't want to use WriteBatchWithIndex currently, so you can't Read-Your-Own-Write. if you use it , of course, it can.

git-hulk · 2023-02-27T03:22:10Z

oh...I got your point now. What I mean is to implement the atomic write operations first, then involves read operations at the next stage.

ShooterIT · 2023-02-28T02:06:20Z

you shouldn't ask developers not have Read-Your-Own-Write operations in transactions,

set a b
multi
set a c
get a  // if you don't commit write batch, you will get `b` instead of `c`.
set xxx yyy
...
exec

git-hulk · 2023-02-28T02:42:59Z

@ShooterIT Yes, we cannot require users to avoid this, what I mean is to separate them into two PRs to make the code review simple.

git-hulk added this to Kvrocks 2024 Roadmap Feb 27, 2023

git-hulk added the bug type bug label Feb 27, 2023

git-hulk self-assigned this Feb 27, 2023

git-hulk moved this to To Do in Kvrocks 2024 Roadmap Feb 27, 2023

git-hulk mentioned this issue Feb 27, 2023

Atomic commit write batches in the command EXEC #1281

Closed

2 tasks

git-hulk mentioned this issue Feb 28, 2023

Use the RocksDB WriteBatchWithIndex to implement the read-your-own-writes in transaction #1287

Merged

git-hulk closed this as completed in #1287 Mar 4, 2023

github-project-automation bot moved this from To Do to Done in Kvrocks 2024 Roadmap Mar 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Transaction can't guarantee atomicity #487

[BUG] Transaction can't guarantee atomicity #487

ShooterIT commented Feb 16, 2022

Eshcar commented Feb 16, 2022

ShooterIT commented Feb 17, 2022

Eshcar commented Feb 17, 2022

ShooterIT commented Mar 1, 2022

Eshcar commented Mar 2, 2022 •

edited

Loading

ShooterIT commented Mar 22, 2022

Eshcar commented Mar 27, 2022

git-hulk commented Feb 26, 2023 •

edited

Loading

ShooterIT commented Feb 27, 2023

git-hulk commented Feb 27, 2023 •

edited

Loading

ShooterIT commented Feb 27, 2023

git-hulk commented Feb 27, 2023 •

edited

Loading

ShooterIT commented Feb 27, 2023

git-hulk commented Feb 27, 2023

ShooterIT commented Feb 28, 2023

git-hulk commented Feb 28, 2023

[BUG] Transaction can't guarantee atomicity #487

[BUG] Transaction can't guarantee atomicity #487

Comments

ShooterIT commented Feb 16, 2022

Eshcar commented Feb 16, 2022

ShooterIT commented Feb 17, 2022

Eshcar commented Feb 17, 2022

ShooterIT commented Mar 1, 2022

Eshcar commented Mar 2, 2022 • edited Loading

ShooterIT commented Mar 22, 2022

Eshcar commented Mar 27, 2022

git-hulk commented Feb 26, 2023 • edited Loading

ShooterIT commented Feb 27, 2023

git-hulk commented Feb 27, 2023 • edited Loading

ShooterIT commented Feb 27, 2023

git-hulk commented Feb 27, 2023 • edited Loading

ShooterIT commented Feb 27, 2023

git-hulk commented Feb 27, 2023

ShooterIT commented Feb 28, 2023

git-hulk commented Feb 28, 2023

Eshcar commented Mar 2, 2022 •

edited

Loading

git-hulk commented Feb 26, 2023 •

edited

Loading

git-hulk commented Feb 27, 2023 •

edited

Loading

git-hulk commented Feb 27, 2023 •

edited

Loading