-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockstore: use erasure meta index field to find conflicting shreds #1151
blockstore: use erasure meta index field to find conflicting shreds #1151
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1151 +/- ##
=======================================
Coverage 82.1% 82.1%
=======================================
Files 886 886
Lines 236439 236490 +51
=======================================
+ Hits 194252 194306 +54
+ Misses 42187 42184 -3 |
ledger/src/blockstore.rs
Outdated
if !self.has_duplicate_shreds_in_slot(slot) { | ||
let conflicting_shred = self | ||
.find_conflicting_coding_shred(&shred, slot, erasure_meta, just_received_shreds) | ||
.expect("Shred indicated by erasure meta must exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expect
here made me wonder what happens in the scenarios that a shred is purge/pruned/dumped from blockstore between the time erasure_meta
is obtained until here the shred is fetched from blockstore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when replay decides to dump a block it uses clear_unconfirmed_slot
which shares a lock with shred insertion precisely to avoid the situation you mentioned:
agave/ledger/src/blockstore.rs
Lines 1290 to 1291 in bf1b765
pub fn clear_unconfirmed_slot(&self, slot: Slot) { | |
let _lock = self.insert_shreds_lock.lock().unwrap(); |
i'm not an expert on the blockstore automatic cleanup (cc @steviez). while it doesn't share this lock, it looks at blockstore.max_root()
in order to determine which slots to cleanup. We also compare against blockstore.max_root()
earlier
agave/ledger/src/blockstore.rs
Line 1386 in bf1b765
if !Blockstore::should_insert_coding_shred(&shred, self.max_root()) { |
but this seems unsafe as
replay_stage
could have rooted and triggered an automatic cleanup in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, will recheck these blockstore expects under the assumption that cleanup could have happened https://discord.com/channels/428295358100013066/1235788902888116274/1235942333334290442
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct that the BlockstoreCleanupService
doesn't share the insert_shreds_lock
. And right, the service fetches the latest root, and potentially clean data that is strictly older than the root. We never clean data newer than the latest root.
Practically speaking, the value for --limit-ledger-size
is required to be >= 50M
. Assuming 1k shreds per slot, the oldest data is ~50k slots older than the newest data. That being said, I think there is merit to an unwrap()
and expect()
audit like y'all are suggesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, will recheck these blockstore expects under the assumption that cleanup could have happened https://discord.com/channels/428295358100013066/1235788902888116274/1235942333334290442
Given that mainnet is upgrading to v1.18, and we have not have much of soak time, probably better to demote all (or most of) those expect
s to an error log or metric. We can always add them back in if we are confident we are not missing an edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I combed through the expect/unwraps we introduced and demoted the affected ones here #1259 .
I believe these remaining checks to be safe and have not demoted them:
agave/ledger/src/blockstore.rs
Line 1134 in fd8b075
.expect("Erasure meta was just created, initial shred must exist"); |
agave/ledger/src/blockstore.rs
Line 1163 in fd8b075
.expect("Merkle root meta was just created, initial shred must exist"); |
ledger/src/blockstore.rs
Outdated
.store_duplicate_slot(slot, conflicting_shred.clone(), shred.payload().clone()) | ||
.is_err() | ||
{ | ||
warn!("bad duplicate store.."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was old code but nonetheless we need to log the actual err
than just dropping it.
2dd7027
to
c3b08ae
Compare
} | ||
|
||
duplicate_shreds.push(PossibleDuplicateShred::ErasureConflict( | ||
shred.clone(), | ||
conflicting_shred, | ||
)); | ||
} | ||
} else { | ||
datapoint_info!("bad-conflict-shred", ("slot", slot, i64)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe keep this line (or an error log or something similar), for when find_conflicting_coding_shred
returns None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a couple warn!()
's under the // ToDo: ...
line, do think those are sufficient ? Namely, I think self.has_duplicate_shreds_in_slot(slot)
still being false
would indicate that we did not find a conflicting coding shred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm! the warn!
logs will be there regardless of this datapoint.
but this bad-conflict-shred
would specifically indicate that we have a mismatching erasure-meta but can't find the shred which initialized that erasure-meta.
anyways, I am inclined to keep this datapoint just in case we observe these anomalies, but don't feel strongly about that. so either way is fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it back as an error log.
if let Err(e) = self.store_duplicate_slot( | ||
slot, | ||
conflicting_shred.clone(), | ||
shred.payload().clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi-related to the PR (but certainly not blockers):
- Wondering why we didn't use
write_batch
for this function to store the udpates, not that the updates to this column need to occur atomically with any other column - Wondering if we could store the proof without cloning the two
Vec
's; we'd want to serialize with borrowed data but deserialize with owned data. A quick glance since bincode supportsCow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we could definitely be smarter about this:
- write batch to store duplicate proof updates
- cache a map so we don't have to read blockstore every time we check if there's already a duplicate proof
- avoid the clones
I guess duplicate's happen so infrequently that this isn't a major problem, but can definitely look into cleaning this up in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why we didn't use write_batch for this function to store the udpate
I figured this out and thought I edited my comment, but I guess I did not:
store_duplicate_slot()
andhas_duplicate_shreds_in_slot()
both read from the DB- Items written to the
write_batch
are not visible to the DB until the batch is committed
We call has_duplicate_shreds_in_slot()
in the same function, potentially after calling store_duplicate_slot()
... if we used the write_batch
, the read wouldn't see the write immediately before it. And, if insert_shreds_handle_duplicate()
was called with 100 shreds that we're iterating through, a duplicate proof written for the 5th shred would not be visible if we read for the next 95 shreds
This same gotcha is why we have the get_shred_from_just_inserted_or_db()
function in combination with the HashMap
for getting shreds that are in the same "insert batch"
} | ||
|
||
duplicate_shreds.push(PossibleDuplicateShred::ErasureConflict( | ||
shred.clone(), | ||
conflicting_shred, | ||
)); | ||
} | ||
} else { | ||
datapoint_info!("bad-conflict-shred", ("slot", slot, i64)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a couple warn!()
's under the // ToDo: ...
line, do think those are sufficient ? Namely, I think self.has_duplicate_shreds_in_slot(slot)
still being false
would indicate that we did not find a conflicting coding shred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me, but I think it'd be good to confirm that Behzad's concern about the removed datapoint is adequately covered by the comments further down the function I mentioned
Problem
We perform a costly scan in order to find the original coding shred when an erasure config mismatch is detected.
Summary of Changes
Instead use the new field on erasure meta introduced in #961
Revert to the scan if the field is 0 in order to support old blockstores. This can be removed when we no longer support < v1.18.12