-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556
Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556
Conversation
Codecov Report
@@ Coverage Diff @@
## master #33556 +/- ##
=========================================
- Coverage 81.7% 81.7% -0.1%
=========================================
Files 805 805
Lines 218162 218160 -2
=========================================
- Hits 178410 178394 -16
- Misses 39752 39766 +14 |
ledger/src/blockstore.rs
Outdated
let first_available_block = self.get_first_available_block()?; | ||
if slot < first_available_block { | ||
return Ok(None); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we need this check here after the fact. Within get_transaction_status_with_counter()
, we get the lowest_cleanup_slot
and build our iterator in the forward direction from that slot
solana/ledger/src/blockstore.rs
Lines 2331 to 2337 in 64b3613
let index_iterator = self.transaction_status_cf.iter(IteratorMode::From( | |
( | |
transaction_status_cf_primary_index, | |
signature, | |
lowest_available_slot, | |
), | |
IteratorDirection::Forward, |
So, it is the case that if we find a matching sig, then the slot the sig was in is >= lowest_cleanup_slot
. We then grab the transaction status and return it; assuming it is valid, I see no reason to discard it at the end.
Also, get_transaction_status_with_counter()
holds the lowest_cleanup_slot
lock. Imagine the following sequence:
get_transaction_status()
callsget_transaction_status_with_counter()
get_transaction_status_with_counter()
grabs thelowest_cleanup_slot
lock, which is at slotX
get_transaction_status_with_counter()
finds the transaction status for desired sig in slotX + i
get_transaction_status_with_counter()
returns and gives uplowest_cleanup_slot
lockLedgerCleanupService
comes along and does some cleanup, advancinglowest_cleanup_slot
toX + c
wherec > i
- get_transaction_status()
checks the result slot against
get_first_available_block(); it is the case that
get_first_available_block() > X + c` - From 5),
i < c
so the result will get discarded and we'll returnOk(None)
This isn't a dangerous race condition necessarily (and given the timing required, I think this would be incredibly unlikely), but again, I don't see any reason to discard the result if we already looked it up and found it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, maybe hoist this logic to get_transaction_status_with_counter()
, inside the lock, then.
If we don't, the Some()
cases in the before and until blocks in get_confirmed_signatures_for_address2()
could return a slot/lowest_slot > first_available_block, because lowest_cleanup_slot is not necessarily (in fact usually isn't) the same as first_available_block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, I think we already have the race you described between get_transaction_status()
and get_confirmed_signatures_for_address2()
, because we pull the whole block after releasing the lowest_cleanup_slot
lock. But fixing that, if we choose to, can be independent of this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because lowest_cleanup_slot is not necessarily (in fact usually isn't) the same as first_available_block.
Talking this through "out-loud" to help gather my thoughts ... here is the definition of get_first_available_block()
:
solana/ledger/src/blockstore.rs
Lines 2002 to 2013 in ecb1f8a
pub fn get_first_available_block(&self) -> Result<Slot> { | |
let mut root_iterator = self.rooted_slot_iterator(self.lowest_slot_with_genesis())?; | |
let first_root = root_iterator.next().unwrap_or_default(); | |
// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct | |
// to return it as first-available. | |
if first_root == 0 { | |
return Ok(first_root); | |
} | |
// Otherwise, the block at root-index 0 cannot ever be complete, because it is missing its | |
// parent blockhash. A parent blockhash must be calculated from the entries of the previous | |
// block. Therefore, the first available complete block is that at root-index 1. | |
Ok(root_iterator.next().unwrap_or_default()) |
And here is
lowest_slot_with_genesis()
:solana/ledger/src/blockstore.rs
Lines 3385 to 3396 in ecb1f8a
fn lowest_slot_with_genesis(&self) -> Slot { | |
for (slot, meta) in self | |
.slot_meta_iterator(0) | |
.expect("unable to iterate over meta") | |
{ | |
if meta.received > 0 { | |
return slot; | |
} | |
} | |
// This means blockstore is empty, should never get here aside from right at boot. | |
self.last_root() | |
} |
So, in normal conditions, lowest_slot_with_genesis()
will return the first slot with any shreds (not necessarily complete and thus also not necessarily a root). Let's call this slot S_l
. get_first_available_block()
then looks for the first rooted slot from S_l
, and then returns the next rooted slot. So, if S_l
did happen to be a root, the result of get_first_available_block()
will be S_l
's child
lowest_cleanup_slot
is the most recent slot that we purged. So, assuming lowest_cleanup_slot == C
, then
- First root in database
>= C + 1
get_first_available_block()
result>= C + 2
39a24a6
to
162511a
Compare
@@ -8017,6 +8012,7 @@ pub mod tests { | |||
|
|||
if simulate_ledger_cleanup_service { | |||
*blockstore.lowest_cleanup_slot.write().unwrap() = lowest_cleanup_slot; | |||
blockstore.purge_slots(0, lowest_cleanup_slot, PurgeType::CompactionFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did unit test fail without this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because Blockstore::first_available_block()
depends on the rooted-slot iterator, and this "simulation" wasn't adjusting the root list, just writing a new lowest_cleanup_slot. This change seemed defensible to me, since this is exactly what the LedgerCleanupService does, plus some extra data reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My long-winded comment aside, think we're good to here.
Problem
As pointed out here,
Blockstore::get_confirmed_signatures_for_address2()
does more iterating than it needs to.Summary of Changes
Ensure lowest_slot >= first_available_block:
get_transaction_status
to only return data fromfirst_available_block
and onward (this is the expected behavior from the perspective of rpc already)lowest_slot
tofirst_available_block