-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[WIP] Add get_with method to KeyValueDB and HashDB #5942
Conversation
It looks like @Vurich hasn'signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement at the following URL: https://cla.parity.io Once you've signed, plesae reply to this thread with Many thanks, Parity Technologies CLA Bot |
util/src/hashdb.rs
Outdated
let mut output = None; | ||
|
||
{ | ||
let mut wrapper = |key: &[u8]| { output = Some((o_func.take().unwrap())(key)); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be simplified just to DBValue::from_slice(key)
codebase style replaces unwrap with expect("qualify why it never will panic; qed")
util/src/hashdb.rs
Outdated
|
||
{ | ||
let mut wrapper = |key: &[u8]| { | ||
output = Some((o_func.take().unwrap())(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qualify unwrap
and ensure invariant is documented in trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The invariant is documented already
util/src/journaldb/archivedb.rs
Outdated
|
||
fn payload_exec(&self, key: &H256, f: &mut FnMut(&[u8])) { | ||
// TODO: Fix this once `KeyValueDB` has a `get_exec` method | ||
if let Some(val) = self.payload(key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated TODO? and I think payload
is now dead code
let journal_overlay = self.journal_overlay.read(); | ||
let short_key = to_short_key(key); | ||
|
||
journal_overlay.backing_overlay.get_exec(&short_key, &mut wrapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the wrapper can be avoided if we just match on get_ref
for backing_overlay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. get_ref
doesn't exist on MemoryDB
at the moment but it could be trivially implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer the version that has a wrapper and consistently uses get_exec
compared to one that calls {...}.get_ref({...}).or_else(|| {...}.get())
and then calls get_exec
on self.backing
if that fails. They're equivalent, performance-wise, and it's really a case of personal taste. I'm happy to change it if you feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't MemoryDB::get_exec
lead to two virtual indirections (wrapper + wrapped), while get_ref
leads to only one?
the state trie will typically have at least the top few layers in the backing overlay. that said, it may well still perform the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, excellent point. I think a better solution might be to implement get_with
for monomorphic statically-dispatching code so that you get this speedup for free. Maybe I'll stick a TODO there, put in a performance issue and write the get_ref
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can have a HashDB
method that gets a value that returns a Cow
so any code that wraps doesn't have to care if a HashDB
/KeyValueDB
has to return an owned value (because it's doing some kind of locking) or whether it can return a borrow. Then we can just pass whatever you happen to get up the stack and only pay the indirection cost from the trait objects. That means we pay the cloning costs precisely the minimum number of times we need to (i.e. only when not doing so would violate lifetime bounds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having that method makes sense, but a significant proportion of reads in our typical workflow are of data held within locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's something that would need looking into
// Possible bug in the Rust compiler. | ||
let mut wrapper = |d: &[u8]| { | ||
was_called.set(true); | ||
f(d); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something seems fishy about this. is it not soundness hole that we can call this with data borrowed from a lock acquired within the function body? I wonder what would happen if wrapper
stored the ref and we
- called it
- released the read lock
- acquired a write lock
- removed the key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that while it's possible to specify bounds on an HRTB when accepting one as a parameter, it's only possible to declare closures that fulfill universal HRTBs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where could wrapper store the ref that would be observable? The read lock must outlive the call to get_exec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HRTBs are a bit of a mess right now, it'll probably be something that gets fixed as an ultimate fallout of the Chalk work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but the definition of the function is for <'a: 'this> FnMut(&'a [u8])
, so it should only be able to take references that strictly outlive the borrow of self. Data borrowed from a read lock doesn't. Soundness seems like it's only preserved here by the fact that the closure isn't allowed to make (valid) assumptions about the input data: https://is.gd/ZfJOfc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you try to make it fully generic over all lifetimes it fails because it borrows the &mut FnMut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the assumptions I am operating on:
'this
(lifetime of self borrow) can be safely coerced to a smaller lifetime, but must cover the entire function body- the read guard itself may live for
'this
(why you can return aReadGuard<'this, T>
) - borrows of
ReadGuard<'this, T>
will be strictly smaller than'this
(why you can't return&'a [u8]
from aReadGuard<'this, Vec<u8>>
) for<'a: 'this> FnMut(&'a [u8])
should be able to take references to any data that outlive the borrow ofself
.
Somehow we are taking data which is borrowed from an internal ReadGuard
and calling a function that says "I only take data that lives at least as long as 'this
" with it.
So two disjoint regions: A: 'a >= 'this
and B: 'this > 'guard
, but we are coercing lifetimes from B to A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's 'a >= 'this
and 'this >= 'guard
, I don't know why borrows of some variable can't last the entire lifetime of the variable. The only reason to deny that is if it is unsound, and I can't see any reason for it to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
borrows of a variable can last the entire lifetime of that variable, but my understanding the lifetime of the inner ReadGuard
value has to be less than than 'this
, because it is declared within the function body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but importantly the 'this
lifetime is not the full lifetime of the borrow that calls it. It can be as short as you like as long as the inequalities are satisfied, and there's nothing that says that anything has to strictly outlive anything else.
Honestly, I trust the compiler on this one. When I tried to do a universal quantification or a parameterised concrete lifetime it failed, the only one it would accept was a bounded forall quantification. If you can write up an example of unsoundness caused by this then I'd be happy to submit it to the lang team as an unsoundness bug, but certainly as far as my (possibly faulty) intuition on lifetimes go this makes sense.
[clabot:check] |
It looks like @Vurich signed our Contributor License Agreement. 👍 Many thanks, Parity Technologies CLA Bot |
util/src/kvdb.rs
Outdated
let mut output = None; | ||
|
||
{ | ||
let mut wrapper = |key: &[u8]| { output = Some((o_func.take().unwrap())(key)); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DBValue::from_slice
would be simpler here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was a copy-paste thing because originally I tried to use get_with
. Good catch.
util/src/hashdb.rs
Outdated
let mut wrapper = |key: &[u8]| { | ||
// Unwrap here because `get_exec` is required to call the closure no more than | ||
// once. | ||
output = Some((o_func.take().unwrap())(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you move the comment into an expect
?
Thread N panicked at "Called Option::unwrap on a None value", Option.rs:999
is way worse than
Thread N panicked at "this exact, expected invariant was violated!", Option.rs:999
.
We don't always get backtraces from our issue reports.
This allows code that knows that it's working with a `MemoryDB` to avoid the overhead of going through `get_exec`.
Original issue here #3993 |
Waiting on my full sync to finish so I can profile this against master. It looks like it led to a regression in our benchmark suite so I want to see if it causes it to be faster or slower on the import. If it's slower then I'll have to do more work before a merge |
is it WIP? If so, there is a special label |
branch has conflicts and tests fail. updated labels as per latest comment. |
looks stale. closing. |
This adds a
get_with
method to bothKeyValueDB
andHashDB
, which takes a reference-taking closure and passes the borrowed value to the function (to avoid cloning). I did some quick benchmarking and this actually seems slower than the previous code. I imagine this is because I reimplementedget
in terms ofget_with
and the extra dynamic dispatch causes this to be slower. There are some places in the code where the cloned value is only ever used as a borrow where theget_with
method would be an instant win, so it might be better to have bothget
andget_with
implemented and use the latter only where necessary.