Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556

CriesofCarrots · 2023-10-06T00:56:41Z

Problem

As pointed out here, Blockstore::get_confirmed_signatures_for_address2() does more iterating than it needs to.

Summary of Changes

Ensure lowest_slot >= first_available_block:

Update get_transaction_status to only return data from first_available_block and onward (this is the expected behavior from the perspective of rpc already)
When a transaction status is not found, set lowest_slot to first_available_block

codecov · 2023-10-06T02:27:29Z

Codecov Report

Merging #33556 (a85792b) into master (6f1922b) will decrease coverage by 0.1%.
Report is 1 commits behind head on master.
The diff coverage is 87.5%.

@@            Coverage Diff            @@
##           master   #33556     +/-   ##
=========================================
- Coverage    81.7%    81.7%   -0.1%     
=========================================
  Files         805      805             
  Lines      218162   218160      -2     
=========================================
- Hits       178410   178394     -16     
- Misses      39752    39766     +14

steviez · 2023-10-06T02:23:31Z

ledger/src/blockstore.rs

+            let first_available_block = self.get_first_available_block()?;
+            if slot < first_available_block {
+                return Ok(None);
+            }


I'm wondering if we need this check here after the fact. Within get_transaction_status_with_counter(), we get the lowest_cleanup_slot and build our iterator in the forward direction from that slot

solana/ledger/src/blockstore.rs

Lines 2331 to 2337 in 64b3613

let index_iterator = self.transaction_status_cf.iter(IteratorMode::From(

(

transaction_status_cf_primary_index,

signature,

lowest_available_slot,

),

IteratorDirection::Forward,

So, it is the case that if we find a matching sig, then the slot the sig was in is >= lowest_cleanup_slot. We then grab the transaction status and return it; assuming it is valid, I see no reason to discard it at the end.

Also, get_transaction_status_with_counter() holds the lowest_cleanup_slot lock. Imagine the following sequence:

get_transaction_status() calls get_transaction_status_with_counter()

get_transaction_status_with_counter() grabs the lowest_cleanup_slot lock, which is at slot X

get_transaction_status_with_counter() finds the transaction status for desired sig in slot X + i

get_transaction_status_with_counter() returns and gives up lowest_cleanup_slot lock

LedgerCleanupService comes along and does some cleanup, advancing lowest_cleanup_slot to X + c where c > i

get_transaction_status()checks the result slot againstget_first_available_block(); it is the case that get_first_available_block() > X + c`

From 5), i < c so the result will get discarded and we'll return Ok(None)

This isn't a dangerous race condition necessarily (and given the timing required, I think this would be incredibly unlikely), but again, I don't see any reason to discard the result if we already looked it up and found it

Good point, maybe hoist this logic to get_transaction_status_with_counter(), inside the lock, then.
If we don't, the Some() cases in the before and until blocks in get_confirmed_signatures_for_address2() could return a slot/lowest_slot > first_available_block, because lowest_cleanup_slot is not necessarily (in fact usually isn't) the same as first_available_block.

Technically, I think we already have the race you described between get_transaction_status() and get_confirmed_signatures_for_address2(), because we pull the whole block after releasing the lowest_cleanup_slot lock. But fixing that, if we choose to, can be independent of this change.

because lowest_cleanup_slot is not necessarily (in fact usually isn't) the same as first_available_block.

Talking this through "out-loud" to help gather my thoughts ... here is the definition of get_first_available_block():

solana/ledger/src/blockstore.rs

Lines 2002 to 2013 in ecb1f8a

pub fn get_first_available_block(&self) -> Result<Slot> {

let mut root_iterator = self.rooted_slot_iterator(self.lowest_slot_with_genesis())?;

let first_root = root_iterator.next().unwrap_or_default();

// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct

// to return it as first-available.

if first_root == 0 {

return Ok(first_root);

}

// Otherwise, the block at root-index 0 cannot ever be complete, because it is missing its

// parent blockhash. A parent blockhash must be calculated from the entries of the previous

// block. Therefore, the first available complete block is that at root-index 1.

Ok(root_iterator.next().unwrap_or_default())

And here is lowest_slot_with_genesis():

solana/ledger/src/blockstore.rs

Lines 3385 to 3396 in ecb1f8a

fn lowest_slot_with_genesis(&self) -> Slot {

for (slot, meta) in self

.slot_meta_iterator(0)

.expect("unable to iterate over meta")

{

if meta.received > 0 {

return slot;

}

}

// This means blockstore is empty, should never get here aside from right at boot.

self.last_root()

}

So, in normal conditions, lowest_slot_with_genesis() will return the first slot with any shreds (not necessarily complete and thus also not necessarily a root). Let's call this slot S_l. get_first_available_block() then looks for the first rooted slot from S_l, and then returns the next rooted slot. So, if S_l did happen to be a root, the result of get_first_available_block() will be S_l's child

lowest_cleanup_slot is the most recent slot that we purged. So, assuming lowest_cleanup_slot == C, then

First root in database >= C + 1

get_first_available_block() result >= C + 2

…available_block

steviez · 2023-10-06T19:23:34Z

ledger/src/blockstore.rs

@@ -8017,6 +8012,7 @@ pub mod tests {

        if simulate_ledger_cleanup_service {
            *blockstore.lowest_cleanup_slot.write().unwrap() = lowest_cleanup_slot;
+            blockstore.purge_slots(0, lowest_cleanup_slot, PurgeType::CompactionFilter);


Did unit test fail without this ?

Yes, because Blockstore::first_available_block() depends on the rooted-slot iterator, and this "simulation" wasn't adjusting the root list, just writing a new lowest_cleanup_slot. This change seemed defensible to me, since this is exactly what the LedgerCleanupService does, plus some extra data reporting.

steviez

My long-winded comment aside, think we're good to here.

Set empty lowest_slot to first_available_block and remove check in loop

a0bbea7

CriesofCarrots requested a review from steviez October 6, 2023 01:12

CriesofCarrots mentioned this pull request Oct 6, 2023

Remove primary index from Blockstore special-column keys #33419

Merged

steviez reviewed Oct 6, 2023

View reviewed changes

Ensure get_transaction_status on_with_counter returns slots >= first_…

162511a

…available_block

CriesofCarrots force-pushed the blockstore-lowest branch from 39a24a6 to 162511a Compare October 6, 2023 03:07

Actually cleanup ledger

a85792b

CriesofCarrots requested a review from steviez October 6, 2023 18:36

steviez reviewed Oct 6, 2023

View reviewed changes

steviez approved these changes Oct 6, 2023

View reviewed changes

CriesofCarrots merged commit f075867 into solana-labs:master Oct 6, 2023
16 checks passed

willhickey mentioned this pull request Mar 28, 2024

v1.18 commits - please ignore anza-xyz/agave#475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556

Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556

CriesofCarrots commented Oct 6, 2023

codecov bot commented Oct 6, 2023 •

edited

Loading

steviez Oct 6, 2023

CriesofCarrots Oct 6, 2023

CriesofCarrots Oct 6, 2023

steviez Oct 6, 2023

steviez Oct 6, 2023

CriesofCarrots Oct 6, 2023

steviez left a comment

	let index_iterator = self.transaction_status_cf.iter(IteratorMode::From(
	(
	transaction_status_cf_primary_index,
	signature,
	lowest_available_slot,
	),
	IteratorDirection::Forward,

	pub fn get_first_available_block(&self) -> Result<Slot> {
	let mut root_iterator = self.rooted_slot_iterator(self.lowest_slot_with_genesis())?;
	let first_root = root_iterator.next().unwrap_or_default();
	// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct
	// to return it as first-available.
	if first_root == 0 {
	return Ok(first_root);
	}
	// Otherwise, the block at root-index 0 cannot ever be complete, because it is missing its
	// parent blockhash. A parent blockhash must be calculated from the entries of the previous
	// block. Therefore, the first available complete block is that at root-index 1.
	Ok(root_iterator.next().unwrap_or_default())

	fn lowest_slot_with_genesis(&self) -> Slot {
	for (slot, meta) in self
	.slot_meta_iterator(0)
	.expect("unable to iterate over meta")
	{
	if meta.received > 0 {
	return slot;
	}
	}
	// This means blockstore is empty, should never get here aside from right at boot.
	self.last_root()
	}

Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556

Blockstore::get_sigs_for_addr2: ensure lowest_slot >= first_available_block #33556

Conversation

CriesofCarrots commented Oct 6, 2023

Problem

Summary of Changes

codecov bot commented Oct 6, 2023 • edited Loading

Codecov Report

steviez Oct 6, 2023

Choose a reason for hiding this comment

CriesofCarrots Oct 6, 2023

Choose a reason for hiding this comment

CriesofCarrots Oct 6, 2023

Choose a reason for hiding this comment

steviez Oct 6, 2023

Choose a reason for hiding this comment

steviez Oct 6, 2023

Choose a reason for hiding this comment

CriesofCarrots Oct 6, 2023

Choose a reason for hiding this comment

steviez left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 6, 2023 •

edited

Loading