-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(trie): combining state parts memory usage #3986
Conversation
After getting all state parts, write them to storage one at a time instead of combining them in memory. Test plan --------- test for trie seek and iteration test that creating parts behavior is the same test that combining parts behavior is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we actually save the state parts to storage?
} | ||
TrieNode::Branch(mut children, _) => { | ||
TrieNode::Branch(children, _) => { | ||
if key.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it make sense for key to be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key
is mutable, it's the seek argument after truncating the path traversed from its prefix. Empty key means the seek ends in this node, this case is unchanged apart from a refactor.
The main change is to the invariant between key_nibbles
and trail
when trail
ends with Extension
or Leaf
iterator.seek(&seek_key).unwrap(); | ||
let result1: Vec<_> = iterator.map(Result::unwrap).take(5).collect(); | ||
let result2: Vec<_> = | ||
map.range(seek_key.to_vec()..).map(|(k, v)| (k.clone(), v.clone())).take(5).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we take 5 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to take a few steps after seek.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part downloader logic validates parts and writes them to ColStateParts
, after collecting all parts we do set_state_finalize
which calls apply_state_part
on each of them. apply_state_part
reads from ColStateParts
and writes to ColState
.
} | ||
TrieNode::Branch(mut children, _) => { | ||
TrieNode::Branch(children, _) => { | ||
if key.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key
is mutable, it's the seek argument after truncating the path traversed from its prefix. Empty key means the seek ends in this node, this case is unchanged apart from a refactor.
The main change is to the invariant between key_nibbles
and trail
when trail
ends with Extension
or Leaf
iterator.seek(&seek_key).unwrap(); | ||
let result1: Vec<_> = iterator.map(Result::unwrap).take(5).collect(); | ||
let result2: Vec<_> = | ||
map.range(seek_key.to_vec()..).map(|(k, v)| (k.clone(), v.clone())).take(5).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to take a few steps after seek.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the code, but the tests look good to me. Could we start writing a test that involves syncing 30GB of state and see how it works?
Manually tested with testnet, memory usage was low and combining parts took about a minute (2.5GB state) |
fn unwrap_hash(&self) -> &CryptoHash { | ||
match self { | ||
Self::Hash(hash) => hash, | ||
Self::InMemory(_) => unreachable!(), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit, but a more idiomatic version would probably be
impl NodeHandle {
fn as_hash(&self) -> Option<&CryptoHash> {
match self {
Self::Hash(hash) => Some(hash),
Self::InMemory(_) => None,
}
}
}
that way, we get to re-use Option's unwrap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad I guess this way we're losing important knowledge about unreachable!()
situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There'll be .unwrap()
on the call-site, so this just moves the impossibility assertion from the definition site to the call-site.
There's no much semantic difference between unreachable!()
and Option::unwrap
-- both mean that the programmer asserts something at runtime, which can't be explained using rust's type system.
But, again, this is at most a minor code-style point, in the grand scheme of things, both options are totally fine/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad sure. My point in using unreachable!()
shows that something is impossible by design, but each unwrap
must be explained why it's safe. It's not about rust, it's about being clear to devs.
(My understanding might be outdated. If it's possible to point that something is not reachable by design another better way (with no using unreachable!()
) please let me know.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand unreachable!
the same way. But, in my mind, .unwrap()
also signals impossibility by design? That is, if we look at the call-site,
hash = *child.unwrap_hash();
/* versus */
hash = *child.as_hash().unwrap();
it seems me that both versions say that, by design, child
is guaranteed to be Hash
in this particular case.
The difference is that, when I see foo.unwrap()
, I immediately know that is signals an impossibility of something. When I see foo.unwrap_hash()
, I need to check the definition of unwrap_hash
to realize that it has an unreachable inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad not sure is it the standard way of understanding in the team. For example, in my latest work, there are explanations of unwrap
used (https://github.com/near/nearcore/compare/master...gc_headers_2?expand=1#diff-c26b0a777803f79360d3f469862cfcd95deb3c390809095670f6db88d00332f4R236-R239) and I found them useful.
Basically, I work with concepts written by others as black boxes. That's why, in my opinion, unwrap
s must be explained, otherwise we may end up with lack of understanding what's possible and why. Seeing unreachable!()
helps me to dive into details much better than guessing can I reuse the same pattern in another place.
One more example - we use tons of unwrap
s in testing because we don't want to process errors in any other way. It doesn't mean the methods used in tests should never return errors. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that unwraps should be explained (or, better still, refactored out of existence). But I would also think that, for the same reason, a call to .unwrap_hash()
should be explained.
Ie, if I see an uncommitted call to .unwrap_hash()
, I understand that the author believes, for some good reason, that unreachable
case won't happen here. But that's exactly the same info I get from seeing .unwrap()
. That is, .unwrap()
and .unwrap_hash()
signal the same level of intentionallyty to me.
After getting all state parts, write them to storage one at a time instead of combining them in memory. Test plan --------- test for trie seek and iteration test that creating parts behavior is the same test that combining parts behavior is the same
After getting all state parts, write them to storage one at a time
instead of combining them in memory.
Test plan
test for trie seek and iteration
test that creating parts behavior is the same
test that combining parts behavior is the same