Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeerDAS #1

Closed
wants to merge 39 commits into from
Closed

PeerDAS #1

wants to merge 39 commits into from

Conversation

hwwhww
Copy link
Owner

@hwwhww hwwhww commented Nov 29, 2023

TODOs

  • Add gossip topic verification rules
  • Add extensions
  • Use KZG_COMMITMENTS_INCLUSION_PROOF_DEPTH rather than KZG_COMMITMENTS_INCLUSION_PROOF_INDEX for consistency
  • [TBD] Change it to only custody columns and add a FAQ note?
  • Tune the parameters and check the security assumption in tests
  • Add description about 1D extension
  • Clean up the circular dependency between das-core.md and p2p-interface.md docs
  • Go through the naming:
    • Sample, Cell, DataColumn
  • Consider changing CUSTODY_REQUIREMENT to refer to subnets instead of columns

Copy link

@jacobkaufmann jacobkaufmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took an initial pass with a focus on the p2p-interface


| Name | SSZ equivalent | Description |
| - | - | - |
| `DataLine` | `ByteList[MAX_BLOBS_PER_BLOCK * BYTES_PER_BLOB]` | The data of each row or column in PeerDAS |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we will want to differentiate between rows and columns. while it would be nice to have a unifying type, it appears to me that we will have to treat rows and columns differently with a 1-dimensional extension.

however, since we already have BlobSidecar, we may only have to introduce one new type to represent a column.

that type could be named something like BlobColumnSidecar or BlobMatrixColumnSidecar. the naming is less important to me, but maybe the suggestions are helpful.

that column type would likely benefit from some custom SSZ type to represent a "cell" or "component" of the column (e.g. vector whose length is equal to column width) along with a custom SSZ type to represent the entire column (e.g. vector of column cells/components).

the cell/component type could be the sampling unit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure on if I'd do a different data-type yet. although I lean in favor.

That said, both could use the full kzg_commitments transmission with root-verification instead of the merkle proof, but would be moderately (but not extremely wrt full size of the sidecar) wasteful as we get to higher blob counts

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to:

  1. Fully reuse BlobSidecar for rows
  2. Add DataColumnSidecar for columns

@jacobkaufmann I stubbed the name DataColumnSidecar because the custom type is DataColumn. Personally I think "DataColumn" makes more sense to distinguish it's not just blobs. But of course, we can discuss it later.

Copy link

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

  • if the line-types are kept as one message, I'd probably remove the merkle proof and just do the full merkle root verification for both line types (which can just be done across all messages of same block_header).
  • that said, I'm leaning in favor of separate types as suggested by @jacobkaufmann because there is a lot of subtly different logic — at least until we have a better handle on functionality, at which point I could see it maybe making sense to smash together.
  • I think we can go without GetCustodyStatus in this initial version. Or at least not do anything with it on the prototype
    • reasoning: this is an optimization to help more quickly (re)seed missing samples to those that are expected to have them but don't. In any case, this does not preclude the node that is missing samples polling their neighbors (who are supposed to be custodying) and potentailly attempting reconstruction themselves. Thus it is in any case, an optimization on top of base polling so can be left for later
  • gossip conditions for the sidecars will look at lot like those from 4844 so that's a good start
  • (not necessary for initial prototype, all can be baseline honest) we need to add a field to the ENR to make a claim of larger custody than baseline honesty requirement

specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved

| Name | SSZ equivalent | Description |
| - | - | - |
| `DataLine` | `ByteList[MAX_BLOBS_PER_BLOCK * BYTES_PER_BLOB]` | The data of each row or column in PeerDAS |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure on if I'd do a different data-type yet. although I lean in favor.

That said, both could use the full kzg_commitments transmission with root-verification instead of the merkle proof, but would be moderately (but not extremely wrt full size of the sidecar) wasteful as we get to higher blob counts

specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved

## Peer scoring

Due to the deterministic custody functions, a node knows exactly what a peer should be able to respond to. In the event that a peer does not respond to samples of their custodied rows/columns, a node may downscore or disconnect from a peer.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think:

  • Having a sample available should be positive
  • Not having a sample (bitfield negative) should be neutral
  • Not having a sample after responding with a positive bitfield value should come with a high penalty

specs/peerdas/das-core.md Outdated Show resolved Hide resolved
hwwhww and others added 3 commits December 1, 2023 21:41
Co-authored-by: dankrad <mail@dankradfeist.de>
Co-authored-by: danny <dannyjryan@gmail.com>
@hwwhww
Copy link
Owner Author

hwwhww commented Dec 1, 2023

Massive revamp:

  1. For simplification, delete NUMBER_OF_ROWS, and use MAX_BLOBS_PER_BLOCK directly.
  2. Apply 1D construction. Sample the "whole column" instead of a cell.
  3. Fully re-use BlobSidecar and blob_sidecar_{subnet_id} subnet topic for rows
  4. Define the DataColumnSidecar container and data_column_{subnet_id} subnet topic for columns
    • verify_column_sidecar helper: @djrtwo I'm not sure what you mean here (PeerDAS #1 (comment)) so I didn't change the Merkle proof part. Could you elaborate?
  5. Remove CustodyStatus message

specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
Copy link

@jacobkaufmann jacobkaufmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! left some questions around p2p interface

specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
specs/peerdas/p2p-interface.md Outdated Show resolved Hide resolved
| `ExtendedData` | `ByteList[MAX_BLOBS_PER_BLOCK * BYTES_PER_BLOB * 2]` | The full data with blobs and 1-D erasure coding extension |
| `DataRow` | `ByteList[BYTES_PER_BLOB * 2]` | The data of each row in PeerDAS |
| `DataColumn` | `ByteList[MAX_BLOBS_PER_BLOCK * BYTES_PER_BLOB * 2 // NUMBER_OF_COLUMNS]` | The data of each column in PeerDAS |
| `LineIndex` | `uint64` | The index of the rows or columns in `ExtendedData` matrix |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can continue to use BlobIndex to specify a particular BlobSidecar (or extended blob), then could we use something like DataColumnIndex here? is there some forward compatibility concern?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of not using BlobIndex is to not bind NUMBER_OF_ROWS == MAX_BLOBS_PER_BLOCK invariant with DataRowByRootAndIndex req. But I can see your point that it's more symmetric to use a DataColumnIndex in DataColumnSidecar. 🤔

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jacobkaufmann on this and there is no NUMBER_OF_ROWS anymore.

hwwhww and others added 2 commits December 8, 2023 00:08
Co-authored-by: Pop Chunhapanya <haxx.pop@gmail.com>
@hwwhww
Copy link
Owner Author

hwwhww commented Dec 7, 2023

@ppopth @jtraglia thank you for your reviews! 🙏 I've applied them to the specs.

Co-authored-by: Pop Chunhapanya <haxx.pop@gmail.com>
specs/_features/peerdas/das-core.md Outdated Show resolved Hide resolved
specs/_features/peerdas/das-core.md Outdated Show resolved Hide resolved
specs/_features/peerdas/das-core.md Outdated Show resolved Hide resolved
specs/_features/peerdas/das-core.md Outdated Show resolved Hide resolved

| Name | Value | Description |
|------------------------------------------|-----------------------------------|---------------------------------------------------------------------|
| `DATA_COLUMN_SIDECAR_SUBNET_COUNT` | `32` | The number of data column sidecar subnets used in the gossipsub protocol. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the value be defined with respect to a previously defined constant like NUMBER_OF_COLUMNS?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against it; just note that it is similar to the BLOB_SIDECAR_SUBNET_COUNT/ATTESTATION_SUBNET_COUNT definitions, and we didn't set them by MAX_BLOBS_PER_BLOCK/MAX_ATTESTATIONS.

I think at least we can add invariant NUMBER_OF_COLUMNS % DATA_COLUMN_SIDECAR_SUBNET_COUNT == 0.

Comment on lines 73 to 77
def get_custody_lines(node_id: NodeID, epoch: Epoch, custody_size: uint64) -> Sequence[LineIndex]:
assert custody_size <= MAX_BLOBS_PER_BLOCK
all_items = list(range(MAX_BLOBS_PER_BLOCK))
line_index = (node_id + epoch) % MAX_BLOBS_PER_BLOCK
return [LineIndex(all_items[(line_index + i) % len(all_items)]) for i in range(custody_size)]
Copy link

@fradamt fradamt Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the custody assignment be stable over the blob retrievability period? Maybe for some portion of the assigned columns it could be stable and for others depend on the epoch?

Suggested change
def get_custody_lines(node_id: NodeID, epoch: Epoch, custody_size: uint64) -> Sequence[LineIndex]:
assert custody_size <= MAX_BLOBS_PER_BLOCK
all_items = list(range(MAX_BLOBS_PER_BLOCK))
line_index = (node_id + epoch) % MAX_BLOBS_PER_BLOCK
return [LineIndex(all_items[(line_index + i) % len(all_items)]) for i in range(custody_size)]
def get_custody_lines(node_id: NodeID, epoch: Epoch, custody_size: uint64) -> Sequence[LineIndex]:
assert custody_size <= MAX_BLOBS_PER_BLOCK
all_items = list(range(MAX_BLOBS_PER_BLOCK))
line_index = (node_id + epoch // MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS) % MAX_BLOBS_PER_BLOCK
return [LineIndex(all_items[(line_index + i) % len(all_items)]) for i in range(custody_size)]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS as defined in Deneb?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah sorry, I misremembered the name of the parameter.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed epoch as @djrtwo suggested: f40c75e

hwwhww and others added 4 commits December 12, 2023 16:04
Co-authored-by: fradamt <104826920+fradamt@users.noreply.github.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>
Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com>
hwwhww and others added 2 commits December 20, 2023 01:13
Co-authored-by: fradamt <104826920+fradamt@users.noreply.github.com>
@hwwhww
Copy link
Owner Author

hwwhww commented Jan 24, 2024

replacing by ethereum#3574

@hwwhww hwwhww closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants