Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): unify blocks insertion logic #12694

Merged
merged 4 commits into from
Nov 20, 2024
Merged

Conversation

klkvr
Copy link
Collaborator

@klkvr klkvr commented Nov 20, 2024

Updates UnifiedStorageWriter to reuse code paths from BodyStage for blocks writing. This is done by extending BlockWriter methods with an additional write_transactions_to parameter:

fn insert_block(
&self,
block: SealedBlockWithSenders<Header, Self::Body>,
write_transactions_to: StorageLocation,
) -> ProviderResult<StoredBlockBodyIndices>;

/// An enum that represents the storage location for a piece of data.
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub enum StorageLocation {
/// Write only to static files.
StaticFiles,
/// Write only to the database.
Database,
/// Write to both the database and static files.
Both,
}

  • StorageLocation::Database is a default for insert_block and only writes transactions to database, used by legacy engine and components which need to operate on database without ever commiting anything (like debugging CLI commands)
  • StorageLocation::Both writes to both database and staticfiles, used by new engine right now.
  • StorageLocation::StaticFiles only writes transactions to staticfiles, used by BodyStage. Should be used by new engine as well eventually.

Same can be done with headers and receipts as well

This required a drive-by change of making SealedBlockWithSenders generic over header and body like SealedBlock

@klkvr klkvr changed the title [wip] refactor: unify blocks insertion logic refactor: unify blocks insertion logic Nov 20, 2024
@klkvr klkvr marked this pull request as ready for review November 20, 2024 15:55
@klkvr klkvr changed the title refactor: unify blocks insertion logic refactor(storage): unify blocks insertion logic Nov 20, 2024
@klkvr klkvr added C-debt Refactor of code section that is hard to understand or maintain A-db Related to the database A-sdk Related to reth's use as a library labels Nov 20, 2024
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need @joshieDo input here

all of this makes sense

his required a drive-by change of making SealedBlockWithSenders generic over header and body like SealedBlock

this is fine for me in this pr, but would have preferred a separate one for this.

one question about elapsed insert tracking

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we no longer need these metrics?

EDIT: ah because we now operate on block bodies

makes sense

// write transactions
for transaction in body.transactions() {
let hash = transaction.tx_hash();
let start = Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way too expensive to do per tx

was this the case before?

I'd rather not do this at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the case for live sync but not for the pipeline, should I remove it or just make optional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer removing this entirely, doesn't seem very useful, just needlessly expensive

wdyt @joshieDo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't want to block this pr for this,
but would like to get rid of this detailed time tracking entirely because these additional syscall per tx are likely expensive (400 timesyscalls for a 200txs block)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the per-tx metrics

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense

Comment on lines +17 to +18
/// Write to both the database and static files.
Both,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean data is duplicated?

not obvious where we need this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now we are using insert_block which does this by default and on reorgs we use take_block_transactions_range which relies on all transactions being present in database

to get rid of this we'd need a separate fn for reorgs in new engine which I am planning to do in a follow-up

@klkvr klkvr changed the base branch from main to klkvr/block-with-senders November 20, 2024 17:28
// write transactions
for transaction in body.transactions() {
let hash = transaction.tx_hash();
let start = Instant::now();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't want to block this pr for this,
but would like to get rid of this detailed time tracking entirely because these additional syscall per tx are likely expensive (400 timesyscalls for a 200txs block)

@klkvr
Copy link
Collaborator Author

klkvr commented Nov 20, 2024

pending @joshieDo

Base automatically changed from klkvr/block-with-senders to main November 20, 2024 19:12
@@ -225,6 +227,7 @@ impl_compression_for_compact!(
Bytecode,
AccountBeforeTx,
TransactionSignedNoHash,
TransactionSigned,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ? this should not be written to storage

.database()
.then(|| {
self.tx.cursor_write::<tables::Transactions<
<Self::Body as reth_primitives_traits::BlockBody>::Transaction,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we are writing TransactionSignedNoHash and not TransactionSigned here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransactionSigned Compact impl delegates to TransactionSignedNoHash

#[cfg(any(test, feature = "reth-codec"))]
impl reth_codecs::Compact for TransactionSigned {
fn to_compact<B>(&self, buf: &mut B) -> usize
where
B: bytes::BufMut + AsMut<[u8]>,
{
let tx: TransactionSignedNoHash = self.clone().into();
tx.to_compact(buf)
}
fn from_compact(buf: &[u8], len: usize) -> (Self, &[u8]) {
let (tx, buf) = TransactionSignedNoHash::from_compact(buf, len);
(tx.into(), buf)
}
}

those should be unified soon with #12596

Copy link
Collaborator Author

@klkvr klkvr Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to make sure this happens before we get to readers abstraction though so that we don't do hashing on every storage read

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see we introduced it here: #12539

@klkvr klkvr added this pull request to the merge queue Nov 20, 2024
Merged via the queue into main with commit 0c59841 Nov 20, 2024
34 checks passed
@klkvr klkvr deleted the klkvr/unify-storage branch November 20, 2024 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-db Related to the database A-sdk Related to reth's use as a library C-debt Refactor of code section that is hard to understand or maintain
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants