Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeleteRange causes put then iterate to fail on WBWI #5260

Open
adamretter opened this issue Apr 28, 2019 · 7 comments
Open

DeleteRange causes put then iterate to fail on WBWI #5260

adamretter opened this issue Apr 28, 2019 · 7 comments

Comments

@adamretter
Copy link
Collaborator

adamretter commented Apr 28, 2019

Putting the following 3 key/value pairs into a WBWI (WriteBatchWithIndex), i.e.:

batch.Put(&cf1, "a1", "some value");
batch.Put(&cf1, "a2", "some value");
batch.Put(&cf1, "a3", "some value");

With respect to batch.NewIteratorWithBase, iterating returns just those 3 key/value pairs correctly. So far so good!


However, if I call batch.DeleteRange("a", "b") before putting the values into the WBWI, then subsequently iterating the WBWI incorrectly returns 4 key/value pairs and not 3, i.e. the following code appears to cause false data to appear:

batch.DeleteRange("a", "b")

batch.Put(&cf1, "a1", "some value");
batch.Put(&cf1, "a2", "some value");
batch.Put(&cf1, "a3", "some value");

The first key/value pair returned by NewIteratorWithBase seems to be an error as its key has the value a, yet we never stored any such key!

I have created a reproducible test case showing the problem that fits into utilities/write_batch_with_index/write_batch_with_index_test.cc:

TEST_F(WriteBatchWithIndexTest, TestIteraratorWithBaseDeleteRange) {
  ColumnFamilyHandleImplDummy cf1(6, BytewiseComparator());
  WriteBatchWithIndex batch(BytewiseComparator(), 0, true);

  std::string key1("a1");
  std::string key2("a2");
  std::string key3("a3");

  const std::string fixed_value("some value");

  // delete a range (NOTE: the line below erroneously causes the test to fail!!!)
  batch.DeleteRange(&cf1, "a", "b");

  // put keys in the range we just deleted
  batch.Put(&cf1, key1, fixed_value);
  batch.Put(&cf1, key2, fixed_value);
  batch.Put(&cf1, key3, fixed_value);

  // retrieve keys using iterator
  KVMap empty_map;
  std::unique_ptr<Iterator> iter(
      batch.NewIteratorWithBase(&cf1, new KVIter(&empty_map)));

  iter->Seek("a");
      
  // check key1
  ASSERT_OK(iter->status());
  ASSERT_TRUE(iter->Valid());
  ASSERT_EQ(key1, iter->key().ToString());
  ASSERT_EQ(fixed_value, iter->value().ToString());

  iter->Next();

  // check key2
  ASSERT_OK(iter->status());
  ASSERT_TRUE(iter->Valid());
  ASSERT_EQ(key2, iter->key().ToString());
  ASSERT_EQ(fixed_value, iter->value().ToString());

  iter->Next();

  // check key3
  ASSERT_OK(iter->status());
  ASSERT_TRUE(iter->Valid());
  ASSERT_EQ(key3, iter->key().ToString());
  ASSERT_EQ(fixed_value, iter->value().ToString());

  // should have reached the end of iterator
  iter->Next();
  ASSERT_OK(iter->status());
  ASSERT_FALSE(iter->Valid());
}

Commenting out the batch.DeleteRange(&cf1, "a", "b"); above causes the test to pass, but I don't think it should actually have an impact on the iterator produced from WriteBatchWithIndex::NewIteratorWithBase.

Can someone please confirm that this is indeed an issue... and that I am not doing something blindly stupid ;-)

@ajkr
Copy link
Contributor

ajkr commented Apr 28, 2019

What you're doing looks right. We should've returned NotSupported when called on WriteBatchWithIndex since the range tombstones aren't properly interpreted for readers. Is DeleteRange still useful to you without that support, or is using it with WriteBatchWithIndex a requirement?

@adamretter
Copy link
Collaborator Author

adamretter commented Apr 29, 2019

@ajkr Thank you for your quick response :-)

We would need DeleteRange to work with WBWI (WriteBatchWithIndex).

At the moment for our transactions, each has a WBWI and Snapshot, we perform all read operations using either batch.GetFromBatchAndDB or batch.NewIteratorWithBase, all write operations are performed using batch.Put. This gives us Snapshot Isolation.

We developed our transactions before RocksDB provided any built-in transaction support. We could migrate to RocksDB's PessimisticTransaction, however looking at the RocksDB transaction code, it looks to me like this works in a very similar way by also utilising a WBI and Snapshot. So I think it would have the same problem with DeleteRange.

I wonder if it would be possible to implement DeleteRange for Iterating on a WBWI. I am speculating that we could just check for the delete ranges during iteration, if one is encountered, we need to temporarily memorise the start and end keys of the range, and then check each subsequent value found by iterating before returning them.
If they were retrieved by getting from the underlying iterator, we must check they are not within the memorized ranges. However, if they were retrieved from values which were put to the batch after a delete range, they can just be returned directly.

How does something like that sound?

@ajkr
Copy link
Contributor

ajkr commented Apr 29, 2019

@adamretter, right, the problem looks similar to pessimistic transactions, where this also isn't supported. Perhaps it's harder in transactions due to conflict prevention / checking.

One question I had about your proposal is can it handle reverse iteration? In that case we wouldn't see the range tombstone until we've already passed over keys potentially deleted by it.

I wouldn't have any problem with indexing the range tombstones in a completely separate data structure (could be just a vector to start, or a map sorted on begin key) and checking all of them during a get or scan. The case of having many range tombstones in a single WriteBatchWithIndex might be rare, so doing it a slow way might be ok.

@adamretter
Copy link
Collaborator Author

adamretter commented Apr 29, 2019

@ajkr Interesting. I hadn't considered reverse iteration... probably because I don't personally use it ;-).

Okay... is this something you are planning to work on, or should I try and implement something?

@petermattis
Copy link
Contributor

I wonder if it would be possible to implement DeleteRange for Iterating on a WBWI. I am speculating that we could just check for the delete ranges during iteration, if one is encountered, we need to temporarily memorise the start and end keys of the range, and then check each subsequent value found by iterating before returning them.
If they were retrieved by getting from the underlying iterator, we must check they are not within the memorized ranges. However, if they were retrieved from values which were put to the batch after a delete range, they can just be returned directly.

How does something like that sound?

My 2 cents: the current BaseDeltaIterator is a problematic design as it is duplicating logic that is part of normal iteration. A better design is to treat the contents of the WriteBatch as an additional layer in the LSM. In order to do this, all of the entries in the WriteBatch would have to be given real sequence numbers. One way to accomplish this is to reserve the high bit from sequence numbers, effectively limiting them to 55 bits. While this is a more significant change in some regards, WBWI would automatically get support for all of the goodness that is in normal iterators: range deletions, merges (I believe there is an issue about this), reverse iteration, etc. The end result should be less code, yet more functionality.

@ajkr
Copy link
Contributor

ajkr commented Apr 29, 2019

@adamretter I'm not available to do it right now but will keep thinking about it. Feel free to take it up if you're interested! I'm also curious to hear @siying's thoughts on this and the potential redesign.

In the meantime though we should probably make WriteBatchWithIndex::DeleteRange return NotSupported. That's at least better than potentially giving readers wrong results later.

@ajkr
Copy link
Contributor

ajkr commented May 31, 2019

@siying What do you think of the above mentioned idea to use a regular iterator that treats the write batch as the newest level in the LSM, rather than BaseDeltaIterator? Do you think it's feasible?

ajkr added a commit to ajkr/rocksdb that referenced this issue Feb 14, 2020
Summary: As discovered in facebook#5260 and facebook#5392, reads on the indexed batch do not account for range tombstones. So, return `Status::NotSupported` from `WriteBatchWithIndex::DeleteRange` until we properly support it.

Test Plan: added unit test
facebook-github-bot pushed a commit that referenced this issue Feb 18, 2020
Summary:
As discovered in #5260 and #5392, reads on the indexed batch do not account for range tombstones. So, return `Status::NotSupported` from `WriteBatchWithIndex::DeleteRange` until we properly support it.
Pull Request resolved: #5393

Test Plan: added unit test

Differential Revision: D19912360

Pulled By: ajkr

fbshipit-source-id: 0bbfc978ea015d64516ca708fce2429abba524cb
JunHe77 pushed a commit to JunHe77/rocksdb that referenced this issue Feb 20, 2020
)

Summary:
As discovered in facebook#5260 and facebook#5392, reads on the indexed batch do not account for range tombstones. So, return `Status::NotSupported` from `WriteBatchWithIndex::DeleteRange` until we properly support it.
Pull Request resolved: facebook#5393

Test Plan: added unit test

Differential Revision: D19912360

Pulled By: ajkr

fbshipit-source-id: 0bbfc978ea015d64516ca708fce2429abba524cb
yiwu-arbug pushed a commit to tikv/rocksdb that referenced this issue Aug 11, 2020
)

Summary:
As discovered in facebook#5260 and facebook#5392, reads on the indexed batch do not account for range tombstones. So, return `Status::NotSupported` from `WriteBatchWithIndex::DeleteRange` until we properly support it.
Pull Request resolved: facebook#5393

Test Plan: added unit test

Differential Revision: D19912360

Pulled By: ajkr

fbshipit-source-id: 0bbfc978ea015d64516ca708fce2429abba524cb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants