Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve range sync with PeerDAS #6258

Open
Tracked by #4983
dapplion opened this issue Aug 14, 2024 · 3 comments
Open
Tracked by #4983

Improve range sync with PeerDAS #6258

dapplion opened this issue Aug 14, 2024 · 3 comments
Labels
das Data Availability Sampling

Comments

@dapplion
Copy link
Collaborator

dapplion commented Aug 14, 2024

Description

Currently range sync and backfill sync fetch blocks and blobs from the network with this sequence:

  • Out of the pool of peers in the chain (peers that agree on some fork-choice state) select ONE peer
  • Immediately issue block_by_range and blobs_by_range to the SAME ONE peer
  • If any of those requests error, fail BOTH requests and retry with another peer

This strategy is not optimal but good enough for now. However with PeerDAS, the worst-case number of requests per batch increases from 2 (blocks + blobs) to 2 + DATA_COLUMN_SIDECAR_SUBNET_COUNT / CUSTODY_REQUIREMENT = 2+32 (if not connected to any larger node.

If we extend the current paradigm, a single failure on a columns_by_range request will trigger a retry of all 34 requests. Not optimal 😅

A solution is to make the "components_by_range" request able to retry each individual request. This is what block lookup requests do, where each component (block, blobs, custody) has its own state and retry count.

Your feedback requested here 👇

Going that direction would add a bunch of lines to sync, first of all I want to check if you agree with the problem, and that it's actually necessary to make each request retry-able.

@dapplion dapplion added the das Data Availability Sampling label Aug 14, 2024
@realbigsean
Copy link
Member

realbigsean commented Aug 14, 2024

Going that direction would add a bunch of lines to sync, first of all I want to check if you agree with the problem, and that it's actually necessary to make each request retry-able.

This does seem necessary to me, also because requesting all custody columns from the same peer means you have to find a peer that exactly matches your custody right? so retries would happen frequently

@jimmygchen
Copy link
Member

+1 to the above.

One of the bugs we had earlier on das was that we were spamming blocks requests before we have enough custody subnet peers - so we never make data column requests but we made a bunch of block requests before getting rate limited - and we had to add a workaround to make sure we have peers across all custody subnets before we start requesting both (#6004).

The proposed change here would allow us to start requesting blocks and columns without having to wait for peers to be available across all custody subnets (for supernodes that would mean requests would be delayed until it has peers across all 128 subnets!).

@dapplion
Copy link
Collaborator Author

dapplion commented Oct 10, 2024

Noting another issue with backfill sync:

Backfill sync sources blocks and blobs / data columns from peers with the _by_range RPCs. We bundle those results in an RpcBlock and send it to the processor. Since now the block and column peers may be different we need to attribute fault to the right peer.

Currently process_backfill_blocks checks KZG validity before checking the hash chain. If the block peer sends and invalid block we may hit a KZG or availability error instead of a block hash error. We should check the integrity of the block before checking the columns validity.

Change the function to

fn process_backfill_blocks() {
    check_hash_chain(downloaded_blocks)?;
    check_availability(downloaded_blocks)?;
    import_historical_block_batch(downloaded_blocks)?;
}

Where import_historical_block_batch may no longer need to check that hash chain as it's done ahead of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling
Projects
None yet
Development

No branches or pull requests

3 participants