Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete by prefix operator in kvdb #360

Merged
merged 29 commits into from
Mar 27, 2020

Conversation

cheme
Copy link
Collaborator

@cheme cheme commented Mar 13, 2020

This PR add a delete by prefix operator in kvdb.

This is something that I want to use in a PR that bulk delete a substrate child trie (substrate child trie got all their keys prefixed by a unique id, so a deletion by prefix at rocksdb level can be use).

Since it is currently only related to this future child trie deletion PR (in progress), it should not be merge until really needed (unless it seems good to have either way).
I am still creating the PR for review, feedback and visibility.

Writing this PR description, I realize that we could also use an operator that delete a range (start inclusive, end exclusive), which could allow more use cases but is not really needed.

@ordian
Copy link
Member

ordian commented Mar 26, 2020

@cheme could you rebase on master? #313 is merged

@@ -129,3 +149,20 @@ pub trait KeyValueDB: Sync + Send + parity_util_mem::MallocSizeOf {
IoStats::empty()
}
}

/// Return for a start inclusive prefix, the non inclusive end.
pub fn end_prefix(prefix: &[u8]) -> Vec<u8> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function can be simplified

fn end_prefix(mut prefix: Vec<u8>) -> Vec<u8> {
    while let Some(0xff) = prefix.last() {
        prefix.pop();
    }
    if let Some(byte) = prefix.last_mut() {
        *byte += 1;
    }
    prefix
}

I'm not sure we want to expose it kvdb, I see it's used as a utility function in kvdb-rocksdb and kvdb-memorydb, but for example, we don't expose other_io_error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if the prefix is empty or 0xfffffff...ff, will it delete the whole db?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we could reuse this function for iter_from_prefix by setting https://docs.rs/rocksdb/0.13.0/rocksdb/struct.ReadOptions.html#method.set_iterate_upper_bound

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for empty prefix it does indeed delete every values (not in a very efficient way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about 0xff prefix? I think we should add some tests and document the difference between 0xff and 0xff00 as it's not intuitive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to document that, I will add a test it may make thing more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oups, I actually meant 0xff and 0x00ff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to document that,

Let's add a few examples and include interesting cases such as 0xff and 0xff00 among the examples?

kvdb-rocksdb/src/lib.rs Outdated Show resolved Hide resolved
kvdb-memorydb/src/lib.rs Outdated Show resolved Hide resolved
} else {
// delete the whole column
let end_range = &[255u8];
batch.delete_range_cf(cf, &prefix[..], &end_range[..]).map_err(other_io_err)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to delete a range, if we're deleting the whole column later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not delete the whole column (at first I tried to but it requires mutable access and some changes overall). So I end up thinking that if we use the delete range to delete the column we are doing something wrong, so using delete range for it felt more consistent.
(I will change the comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
batch.delete_range_cf(cf, &prefix[..], &end_range[..]).map_err(other_io_err)?;
batch.delete_range_cf(cf, &[], &end_range[..]).map_err(other_io_err)?;

kvdb/src/lib.rs Outdated Show resolved Hide resolved
kvdb/src/lib.rs Outdated Show resolved Hide resolved
kvdb/src/lib.rs Outdated Show resolved Hide resolved
kvdb/src/lib.rs Outdated Show resolved Hide resolved
@ordian
Copy link
Member

ordian commented Mar 26, 2020

(the CI can be fixed with cargo fmt btw)

cheme and others added 4 commits March 26, 2020 17:04
Co-Authored-By: Andronik Ordian <write@reusable.software>
Co-Authored-By: Andronik Ordian <write@reusable.software>
Co-Authored-By: Andronik Ordian <write@reusable.software>
kvdb-rocksdb/src/lib.rs Outdated Show resolved Hide resolved
kvdb-rocksdb/src/lib.rs Outdated Show resolved Hide resolved
let end_range = Bound::Excluded(kvdb::end_prefix(&prefix[..]));
let keys: Vec<_> = col.range((start_range, end_range)).map(|(k, _)| k.clone()).collect();
for key in keys.into_iter() {
col.remove(&key[..]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be optimized by using split_off and append, but currently append is slow: rust-lang/rust#34666 and I think memorydb is only used for tests and kvdb-web, so it's fine for now

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall (modulo a couple of unresolved comments).

Copy link
Contributor

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits.

let end_range = kvdb::end_prefix(&prefix[..]);
batch.delete_range_cf(cf, &prefix[..], &end_range[..]).map_err(other_io_err)?;
} else {
// delete every values in the column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// delete every values in the column
// deletes all values in the column

@@ -80,6 +83,11 @@ impl DBTransaction {
pub fn delete(&mut self, col: u32, key: &[u8]) {
self.ops.push(DBOp::Delete { col, key: DBKey::from_slice(key) });
}

/// Delete all values with the given key prefix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document what happens if called with an empty prefix (deletes all values)?

kvdb/src/lib.rs Outdated Show resolved Hide resolved
kvdb/src/lib.rs Outdated Show resolved Hide resolved
@@ -129,3 +149,20 @@ pub trait KeyValueDB: Sync + Send + parity_util_mem::MallocSizeOf {
IoStats::empty()
}
}

/// Return for a start inclusive prefix, the non inclusive end.
pub fn end_prefix(prefix: &[u8]) -> Vec<u8> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to document that,

Let's add a few examples and include interesting cases such as 0xff and 0xff00 among the examples?

kvdb/src/lib.rs Outdated
fn end_prefix_test() {
assert_eq!(end_prefix(&[5, 6, 7]), vec![5, 6, 8]);
assert_eq!(end_prefix(&[5, 6, 255]), vec![5, 7]);
// this is incorrect as the result is before start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what sense is it incorrect? Is it "misusage"? Or is the end_prefix() handling this incorrectly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought this comment was a bit confusing, but the test below uses assert not equal, which maybe redundant in the presence of assert_eq below that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is redundant, it is mostly documentation (at some point I was not remembering why we can recurse on popping the 0xff).

@ordian
Copy link
Member

ordian commented Mar 27, 2020

Writing this PR description, I realize that we could also use an operator that delete a range (start inclusive, end exclusive), which could allow more use cases but is not really needed.

Which use-cases of generic range deletion can you think of?
Unless really needed, I'd like to keep it simple for now.

cheme and others added 3 commits March 27, 2020 09:10
Co-Authored-By: David <dvdplm@gmail.com>
Co-Authored-By: David <dvdplm@gmail.com>
@cheme
Copy link
Collaborator Author

cheme commented Mar 27, 2020

Which use-cases of generic range deletion can you think of?

I do not have any, I was just considering that having range in operation could make things more extendable (it is doable with all three implementations, and probably most db key value impl will get more efficient implementation for range (not radix tree)).

So not 'needed'.

@ordian ordian merged commit dd89c9a into paritytech:master Mar 27, 2020
ordian added a commit that referenced this pull request Apr 22, 2020
* master:
  kvdb-rocksdb: optimize and rename iter_from_prefix  (#365)
  bump parity-util-mem (#376)
  parity-util-mem: fix for windows (#375)
  keccak-hash: fix bench and add one for range (#372)
  [parity-crypto] Release 0.6.1 (#373)
  keccak-hash: bump version to 0.5.1 (#371)
  keccak-hash: add keccak256_range and keccak512_range functions (#370)
  Allow pubkey recovery for all-zero messages (#369)
  Delete by prefix operator in kvdb (#360)
  kvdb: no overlay (#313)
  Ban duplicates of parity-uil-mem from being linked into the same program (#363)
  Use correct license ID (#362)
  Memtest example for Rocksdb (#349)
  Prep for release (#361)
  parity-util-mem: prepare release for 0.5.2 (#359)
  travis: test parity-util-mem on android (#358)
  parity-util-mem: update mimalloc feature (#352)
  kvdb: remove parity-bytes dependency (#351)
  parity-util-mem: use malloc for usable_size on android (#355)
  CI: troubleshoot macOS build (#356)
ordian added a commit that referenced this pull request May 5, 2020
* master: (56 commits)
  primitive-types: add no_std support for serde feature (#385)
  Add Rocksdb Secondary Instance Api (#384)
  kvdb-rocksdb: update rocksdb to 0.14 (#379)
  prepare releases for a few crates (#382)
  uint: fix UB in uint::from_big_endian (#381)
  Fix limit prefix delete case (#368)
  Add arbitrary trait implementation (#378)
  kvdb-rocksdb: optimize and rename iter_from_prefix  (#365)
  bump parity-util-mem (#376)
  parity-util-mem: fix for windows (#375)
  keccak-hash: fix bench and add one for range (#372)
  [parity-crypto] Release 0.6.1 (#373)
  keccak-hash: bump version to 0.5.1 (#371)
  keccak-hash: add keccak256_range and keccak512_range functions (#370)
  Allow pubkey recovery for all-zero messages (#369)
  Delete by prefix operator in kvdb (#360)
  kvdb: no overlay (#313)
  Ban duplicates of parity-uil-mem from being linked into the same program (#363)
  Use correct license ID (#362)
  Memtest example for Rocksdb (#349)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants