Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

[WIP] Add get_with method to KeyValueDB and HashDB #5942

Closed
wants to merge 10 commits into from

Conversation

eira-fransham
Copy link
Contributor

This adds a get_with method to both KeyValueDB and HashDB, which takes a reference-taking closure and passes the borrowed value to the function (to avoid cloning). I did some quick benchmarking and this actually seems slower than the previous code. I imagine this is because I reimplemented get in terms of get_with and the extra dynamic dispatch causes this to be slower. There are some places in the code where the cloned value is only ever used as a borrow where the get_with method would be an instant win, so it might be better to have both get and get_with implemented and use the latter only where necessary.

@parity-cla-bot
Copy link

It looks like @Vurich hasn'signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement at the following URL: https://cla.parity.io

Once you've signed, plesae reply to this thread with [clabot:check] to prove it.

Many thanks,

Parity Technologies CLA Bot

@rphmeier rphmeier added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Jun 27, 2017
let mut output = None;

{
let mut wrapper = |key: &[u8]| { output = Some((o_func.take().unwrap())(key)); };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simplified just to DBValue::from_slice(key)

codebase style replaces unwrap with expect("qualify why it never will panic; qed")


{
let mut wrapper = |key: &[u8]| {
output = Some((o_func.take().unwrap())(key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qualify unwrap and ensure invariant is documented in trait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invariant is documented already


fn payload_exec(&self, key: &H256, f: &mut FnMut(&[u8])) {
// TODO: Fix this once `KeyValueDB` has a `get_exec` method
if let Some(val) = self.payload(key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated TODO? and I think payload is now dead code

let journal_overlay = self.journal_overlay.read();
let short_key = to_short_key(key);

journal_overlay.backing_overlay.get_exec(&short_key, &mut wrapper);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the wrapper can be avoided if we just match on get_ref for backing_overlay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. get_ref doesn't exist on MemoryDB at the moment but it could be trivially implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer the version that has a wrapper and consistently uses get_exec compared to one that calls {...}.get_ref({...}).or_else(|| {...}.get()) and then calls get_exec on self.backing if that fails. They're equivalent, performance-wise, and it's really a case of personal taste. I'm happy to change it if you feel strongly about it.

Copy link
Contributor

@rphmeier rphmeier Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't MemoryDB::get_exec lead to two virtual indirections (wrapper + wrapped), while get_ref leads to only one?

the state trie will typically have at least the top few layers in the backing overlay. that said, it may well still perform the same

Copy link
Contributor Author

@eira-fransham eira-fransham Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, excellent point. I think a better solution might be to implement get_with for monomorphic statically-dispatching code so that you get this speedup for free. Maybe I'll stick a TODO there, put in a performance issue and write the get_ref version.

Copy link
Contributor Author

@eira-fransham eira-fransham Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can have a HashDB method that gets a value that returns a Cow so any code that wraps doesn't have to care if a HashDB/KeyValueDB has to return an owned value (because it's doing some kind of locking) or whether it can return a borrow. Then we can just pass whatever you happen to get up the stack and only pay the indirection cost from the trait objects. That means we pay the cloning costs precisely the minimum number of times we need to (i.e. only when not doing so would violate lifetime bounds).

Copy link
Contributor

@rphmeier rphmeier Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having that method makes sense, but a significant proportion of reads in our typical workflow are of data held within locks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's something that would need looking into

// Possible bug in the Rust compiler.
let mut wrapper = |d: &[u8]| {
was_called.set(true);
f(d);
Copy link
Contributor

@rphmeier rphmeier Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something seems fishy about this. is it not soundness hole that we can call this with data borrowed from a lock acquired within the function body? I wonder what would happen if wrapper stored the ref and we

  1. called it
  2. released the read lock
  3. acquired a write lock
  4. removed the key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that while it's possible to specify bounds on an HRTB when accepting one as a parameter, it's only possible to declare closures that fulfill universal HRTBs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where could wrapper store the ref that would be observable? The read lock must outlive the call to get_exec.

Copy link
Contributor Author

@eira-fransham eira-fransham Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HRTBs are a bit of a mess right now, it'll probably be something that gets fixed as an ultimate fallout of the Chalk work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but the definition of the function is for <'a: 'this> FnMut(&'a [u8]), so it should only be able to take references that strictly outlive the borrow of self. Data borrowed from a read lock doesn't. Soundness seems like it's only preserved here by the fact that the closure isn't allowed to make (valid) assumptions about the input data: https://is.gd/ZfJOfc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you try to make it fully generic over all lifetimes it fails because it borrows the &mut FnMut

Copy link
Contributor

@rphmeier rphmeier Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the assumptions I am operating on:

  1. 'this (lifetime of self borrow) can be safely coerced to a smaller lifetime, but must cover the entire function body
  2. the read guard itself may live for 'this (why you can return a ReadGuard<'this, T>)
  3. borrows of ReadGuard<'this, T> will be strictly smaller than 'this (why you can't return &'a [u8] from a ReadGuard<'this, Vec<u8>>)
  4. for<'a: 'this> FnMut(&'a [u8]) should be able to take references to any data that outlive the borrow of self.

Somehow we are taking data which is borrowed from an internal ReadGuard and calling a function that says "I only take data that lives at least as long as 'this" with it.

So two disjoint regions: A: 'a >= 'this and B: 'this > 'guard, but we are coercing lifetimes from B to A.

Copy link
Contributor Author

@eira-fransham eira-fransham Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's 'a >= 'this and 'this >= 'guard, I don't know why borrows of some variable can't last the entire lifetime of the variable. The only reason to deny that is if it is unsound, and I can't see any reason for it to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

borrows of a variable can last the entire lifetime of that variable, but my understanding the lifetime of the inner ReadGuard value has to be less than than 'this, because it is declared within the function body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but importantly the 'this lifetime is not the full lifetime of the borrow that calls it. It can be as short as you like as long as the inequalities are satisfied, and there's nothing that says that anything has to strictly outlive anything else.

Honestly, I trust the compiler on this one. When I tried to do a universal quantification or a parameterised concrete lifetime it failed, the only one it would accept was a bounded forall quantification. If you can write up an example of unsoundness caused by this then I'd be happy to submit it to the lang team as an unsoundness bug, but certainly as far as my (possibly faulty) intuition on lifetimes go this makes sense.

@eira-fransham
Copy link
Contributor Author

[clabot:check]

@parity-cla-bot
Copy link

It looks like @Vurich signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

util/src/kvdb.rs Outdated
let mut output = None;

{
let mut wrapper = |key: &[u8]| { output = Some((o_func.take().unwrap())(key)); };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBValue::from_slice would be simpler here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a copy-paste thing because originally I tried to use get_with. Good catch.

jFransham and others added 8 commits June 28, 2017 16:57

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
let mut wrapper = |key: &[u8]| {
// Unwrap here because `get_exec` is required to call the closure no more than
// once.
output = Some((o_func.take().unwrap())(key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move the comment into an expect?
Thread N panicked at "Called Option::unwrap on a None value", Option.rs:999
is way worse than
Thread N panicked at "this exact, expected invariant was violated!", Option.rs:999.

We don't always get backtraces from our issue reports.

Vurich added 2 commits June 28, 2017 18:29
This allows code that knows that it's working with a `MemoryDB` to avoid the
overhead of going through `get_exec`.
@eira-fransham
Copy link
Contributor Author

Original issue here #3993

@gavofyork
Copy link
Contributor

@rphmeier / @Vurich any activity here?

@eira-fransham
Copy link
Contributor Author

Waiting on my full sync to finish so I can profile this against master. It looks like it led to a regression in our benchmark suite so I want to see if it causes it to be faster or slower on the import. If it's slower then I'll have to do more work before a merge

@NikVolf
Copy link
Contributor

NikVolf commented Jul 31, 2017

is it WIP? If so, there is a special label A1-inprogress for it
and it is mutually exclusive with A0-pleasreview

@5chdn 5chdn added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. and removed A0-pleasereview 🤓 Pull request needs code review. labels Aug 7, 2017
@5chdn
Copy link
Contributor

5chdn commented Aug 7, 2017

branch has conflicts and tests fail. updated labels as per latest comment.

@gavofyork
Copy link
Contributor

looks stale. closing.

@gavofyork gavofyork closed this Sep 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants