-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Keep track of dirty stores on remove accounts to clean #17601
Keep track of dirty stores on remove accounts to clean #17601
Conversation
106eaac
to
9a8f1ca
Compare
Codecov Report
@@ Coverage Diff @@
## master #17601 +/- ##
=======================================
Coverage 82.6% 82.6%
=======================================
Files 435 435
Lines 121577 121607 +30
=======================================
+ Hits 100472 100499 +27
- Misses 21105 21108 +3 |
686f75e
to
ef98258
Compare
This allowed me to run a validator on I ran it for a couple days:
snapshots seem similar:
Total memory use around 48GB. |
runtime/src/accounts_db.rs
Outdated
self.dirty_stores | ||
.insert((*slot, store.append_vec_id()), store.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm it's not immediately obvious to me why removing an account from a storage entry should add the entire store to the dirty_stores
list. Maybe a comment would help?
I think the reasoning here is if for example:
- the storage entry for slot
S
contains some pubkeyP
- Pubkey
P
has zero lamports in some slotS' > S
.
Then when the store for slot S
hits count ==0
, then this could potentially unblock the clean of the zero-lamport P
in S'
?
In this case, should we only insert into the dirty stores if the count == 0
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's not just count==0, a shrink which removes an entry could also unblock it. Maybe the logic should be then to add if count==0 and then in shrink add the unref'ed key set there as well. I don't think there are any other cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you have a 0-lamport account which is the newest entry for that pubkey in slot S
for X
. You also have a non-zero lamport update there for Y
. Then in slot S+1
you update Y
to 0. Then the clean will deref Y
from S
and the 0-lamport update for X
will keep the count=1, although now this store can be removed as long as X
reference count in storage is 1 or if any other existing updates for it can be also removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, that's a great catch. Hmmm yeah, feeling that familiar ache in the noggin 🤯 . So right now even though X
is:
- zero lamport in
S
- has only one reference still alive in an storage entry so
account_infos.len() as u64 == *ref_count_from_storage
so this check will return false
We still fail the deletion of X
here:
solana/runtime/src/accounts_db.rs
Lines 1467 to 1469 in 00ee84a
if store_counts.get(&account_info.store_id).unwrap().0 != 0 { | |
no_delete = true; | |
break; |
S
it's in contains another non-zero lamport update for Y
. In this case, is there a reason not to delete X
if account_infos.len() as u64 == *ref_count_from_storage
even though the storage entry count is not zero via purge_exact()
so that:
- The storage entry count can be decremented
- This zero-lamport update can be removed from the index and removed via shrink later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think that's right. I changed to only inserting when count==0 such that the store is removed. That will free up those keys potentially in newer slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you have a 0-lamport account which is the newest entry for that pubkey in slot
S
forX
. You also have a non-zero lamport update there forY
. Then in slotS+1
you updateY
to 0. Then the clean will derefY
fromS
and the 0-lamport update forX
will keep the count=1, although now this store can be removed as long asX
reference count in storage is 1 or if any other existing updates for it can be also removed.
Hmmm, is this above situation resolved
So we had this situation:
<Slot S> <Slot S+1>
X=0, Y=1 Y=0
-
Slot
S
gets rooted, incurs a clean onX
andY
. AccountX
cannot be removed because thecount
of slotS
starts at2
, gets decremented by 1 for the zero lamport accountX
:solana/runtime/src/accounts_db.rs
Line 1834 in 43b09c4
store_count.0 -= 1; calc_delete_dependencies
:solana/runtime/src/accounts_db.rs
Line 1506 in 43b09c4
if store_counts.get(&account_info.store_id).unwrap().0 != 0 { -
S+1
gets rooted and added to thedirty_stores
, which incurs a clean on accountY
. dropping the count of slotS
to 1. However,Y
won't be deleted because its ref count is still 2 (due to the update in slotS
), the ref count is only decremented when either slotS
is dead or shrunk. -
In shrink, there's two issues:
a) We may not shrinkS
because it doesn't meet the page size shrink criteria, even though not performing this shrink prevents the ref count of zero lamport slots from hitting zero, and thus is blocking a lot of zero lamport cleans in later slots.
b) Even if we shrink slotS
, but the store still hascount == 1
(the update to X is the latest update to 0), so we don't hit this case:solana/runtime/src/accounts_db.rs
Lines 2205 to 2207 in 43b09c4
if store.count() == 0 { self.dirty_stores .insert((slot, store.append_vec_id()), store.clone()); X
to the clean list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the fix here is to say if in this check here:
solana/runtime/src/accounts_db.rs
Line 1483 in 43b09c4
let no_delete = if account_infos.len() as u64 != *ref_count_from_storage { |
if account_infos.len() as u64 == *ref_count_from_storage == 1
, then we don't need to make the additional check in this else block: solana/runtime/src/accounts_db.rs
Lines 1496 to 1510 in 43b09c4
} else { | |
let mut no_delete = false; | |
for (_slot, account_info) in account_infos { | |
debug!( | |
"calc_delete_dependencies() | |
storage id: {}, | |
count len: {}", | |
account_info.store_id, | |
store_counts.get(&account_info.store_id).unwrap().0, | |
); | |
if store_counts.get(&account_info.store_id).unwrap().0 != 0 { | |
no_delete = true; | |
break; | |
} | |
} |
A test might be good to illustrate this situation here. I think there are similar ones, but the key difference here is we're cleaning after setting a root on S, and then cleaning after setting a root on S+1, instead of cleaning only once after rooting both S and S + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this test do that?
a) We may not shrink S because it doesn't meet the page size shrink criteria, even though not performing this shrink prevents the ref count of zero lamport slots from hitting zero, and thus is blocking a lot of zero lamport cleans in later slots.
That's true, somewhat the price to pay for not shrinking very aggressively which would increase write amplification. Although, now we have knobs for that, user can shrink as aggressively as they like.
b) Even if we shrink slot S, but the store still has count == 1 (the update to X is the latest update to 0), so we don't hit this case:
I don't think this is the case. When we shrink, the old store(s) should always have count==0 because all live accounts would be updated in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this test do that?
Yeah, this test tests exactly this case, thanks!
I don't think this is the case. When we shrink, the old store(s) should always have count==0 because all live accounts would be updated in it.
Ah yeah, I misread that bit of logic and thought it would only execute if the newly shrunk store had count==0
, not the old unshrunken store, which will always have count == 0
. So this will essentially add all the accounts in the old store for that slot to the candidate clean set.
Is it true that if we shrink unshrunken storage entry U
to a shrunken storage S
, we only need to add the zero-lamport pubkeys in S
to the candidate clean set if the only remaining alive accounts in S
are zero lamport themselves?
i.e. if we declared a
let mut only_zero_lamport_accounts_alive = true;
Had a check here:
solana/runtime/src/accounts_db.rs
Line 2107 in 43b09c4
alive_accounts.push((pubkey, stored_account)); |
if stored_account.account.account_meta.lamports != 0 {
only_zero_lamport_accounts_alive = false;
}
And then add to the dirty stores here:
solana/runtime/src/accounts_db.rs
Line 2205 in 43b09c4
if store.count() == 0 { |
// Add only the zero lamport accounts to be cleaned
if only_zero_lamport_accounts_alive && store.count() > 0 {
self.dirty_stores
.insert((slot, store.append_vec_id()), store.clone());
}
Reasoning being:
- If the post-shrunk store
S
has any non-zero lamport keysK
, then whenK
is updated in later slots andS
becomes dead, these keys will be added to the clean list (so don't need to add them in shrink). - If the post-shrunk store
S
only has zero lamport keys, then we can run into the edge case so we should add them to clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. This could be an optimization on the reducing the clean pubkey set, right? I think that could be a follow-up change. This already has a huge impact on the clean time, and a few extra keys are not that expensive to deal with.
fn construct_candidate_clean_keys( | ||
&self, | ||
max_clean_root: Option<Slot>, | ||
timings: &mut CleanKeyTimings, | ||
) -> Vec<Pubkey> { | ||
let mut zero_lamport_key_clone = Measure::start("zero_lamport_key"); | ||
let pubkeys = self.accounts_index.zero_lamport_pubkeys().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tag @ryoqun, if you're reviewing as well, this seems to be the crux of the change/savings 😃
Ha interesting, so most of the time in clean was spent cloning out the 11m zero-lamport keys (almost 45 seconds? 😮 ) I would have thought most of the time was spent in the par scan:
solana/runtime/src/accounts_db.rs
Lines 1672 to 1675 in 9388aac
let do_clean_scan = || { | |
pubkeys | |
.par_chunks(4096) | |
.map(|pubkeys: &[Pubkey]| { |
clean_accounts_older_than_root
call: solana/runtime/src/accounts_db.rs
Line 1751 in 9388aac
self.clean_accounts_older_than_root(purges_old_accounts, max_clean_root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not the cost of the clone itself. This change reduces the number of keys that we look at to something much smaller than 11 million.
The scan and then the dependency checks where I think where each about 50% of the time, around 10-20 seconds each.
The structure of this change is to reduce the number of keys that is input into this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yup, that makes sense, fewer inputs to the clean.
e4f0a0a
to
aca3d61
Compare
bbb5d18
to
cfeff07
Compare
and not zero_lamport key set
cfeff07
to
43b09c4
Compare
self.dirty_stores | ||
.insert((*slot, store.append_vec_id()), store.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another annoying case to think about is that right now unrooted slots can also prevent zero lamport cleanup due to them counting toward the ref_count_from_storage
, but only rooted slots are included in account_infos.len()
here:
solana/runtime/src/accounts_db.rs
Line 1483 in 43b09c4
let no_delete = if account_infos.len() as u64 != *ref_count_from_storage { |
<Slot S, Rooted> <Slot S+1, Unrooted> <Slot S+2, Rooted>
Y=1 Y=2, Z=1 Y=0
The ref_count_from_storage
of Y
is 2
:
solana/runtime/src/accounts_db.rs
Line 1812 in 43b09c4
*ref_count = self.accounts_index.ref_count_from_storage(&key); |
S
is dead, unrooted slot S+1
is alive and adds to the ref count for Y
. Meanwhile account_infos
only contains the rooted update in slot S+2
, so this check fails solana/runtime/src/accounts_db.rs
Line 1483 in 43b09c4
let no_delete = if account_infos.len() as u64 != *ref_count_from_storage { |
Luckily, later, when the bank for slot S+1
is dropped, cleanup happens in purge_slots()
and calls into purge_slot_storage()
:
solana/runtime/src/accounts_db.rs
Line 3394 in 43b09c4
self.purge_slot_storage(*remove_slot, purge_stats); |
S+1
to the dirty stores: solana/runtime/src/accounts_db.rs
Lines 5136 to 5138 in 43b09c4
self.dirty_stores | |
.insert((*slot, store.append_vec_id()), store.clone()); | |
dead_slots.insert(*slot); |
I think it would be good to add a test here. I think you can copy and paste this one: carllin@c2683cd#diff-1090394420d51617f3233275c2b65ed706b35b53b115fe65f82c682af8134a6fR11025-R11077, which tests exactly this case 😃 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on master as well. I think we can do another change to handle this case. It doesn't seem to be related to this change specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on master as well
Shoot, that's because the test was wrong!
It doesn't seem to be related to this change specifically
Yeah it's not directly related, but the removal of the zero-lamport accounts in the candidate key generation had a chance of making the clean fail here: https://github.com/carllin/solana/blob/5bb5284438f0c5a545a4db53924f28e5759a3591/runtime/src/accounts_db.rs#L11670-L11673 if the slot storage wasn't being correctly added to the dirty_stores
list in the purge_slots()
path, so it was just an additional corner case to consider.
I extracted the test into a separate PR, which passes now on master and with these changes, so not a blocker here 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
* Keep track of dirty stores on remove accounts to clean and not zero_lamport key set * Only dirty when count==0? * Add another clean (cherry picked from commit 3b1738c)
Problem
zero_lamport_keys set grows and slows down clean_accounts which then prevents flushing for many seconds. On mainnet, 11m zero lamport keys are can be present taking 50 seconds to perform the clean operation.
Summary of Changes
On
remove_accounts
keep track of the store which is touched and add that to a dirty set. Then inclean_accounts
use that dirty set to go and look at all keys which are present in the dirty stores and if they can be cleaned.Reduces the clean_accounts operation down to about 4-5 seconds.
Fixes #