-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] EIP-7594: PeerDAS protocol #3574
Conversation
|
||
*Note:* In the `verify_data_column_sidecar_inclusion_proof(sidecar)` check, for all the sidecars of the same block, it verifies against the same set of `kzg_commitments` of the given beacon beacon. Client can choose to cache the result of the arguments tuple `(sidecar.kzg_commitments, sidecar.kzg_commitments_inclusion_proof, sidecar.signed_block_header)`. | ||
|
||
### The Req/Resp domain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add the deprecation of /eth2/beacon_chain/req/blob_sidecars_by_root/1/
and /eth2/beacon_chain/req/blob_sidecars_by_range/1/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be by range can be deprecated, by root might still be useful to pull a complete blob? but anyway it can be pulled by root and index so i think we are good to deprecate 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a replacement for by_range
for syncing?
I'm not familiar with sync, but I'm guessing we would be doing sampling for every slot as well when syncing - does that mean we request SAMPLES_PER_SLOT
(8) samples via DataColumnSidecarByRoot
for every block?
Sampling during sync:
- We don't need to sample finalized blocks since we know data was available based on the votes
- However, for latest head blocks without enough votes, we might have to sample before attesting.
(Thanks @AgeManning)
Retrieving historic column data:
I guess we'll still need to retrieve all the custody_column
data for the last 4096 epochs (so we can serve them to peers), so would be useful to have an interface for batched requests?
e.g. requesting List[Tuple[Slot, ColumnIndex]]
or List[Tuple[Slot, SubnetId]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to sample finalized blocks since we know data was available based on the votes
When syncing you don't know what's finalized ahead of time. Requiring a majority assumption is not okay IMO, why not sample if the network can serve you the samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense to me I think. I guess it would be the same as Deneb today - we perform data availability check (sampling now instead) for all slots within the last MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS
epochs.
wip Add `TARGET_NUMBER_OF_PEERS` Add networking spec draft fix simplification Rename `DoYouHave` to `GetCustodyStatus` Add DataLineSidecar design Apply suggestions from code review Co-authored-by: dankrad <mail@dankradfeist.de> Co-authored-by: danny <dannyjryan@gmail.com> Revamp after reviews and discussion Remove `CustodyStatus` minor fix Change`DataColumn` to `List[DataCell, MAX_BLOBS_PER_BLOCK]` Move folder Replace `DataColumnByRootAndIndex` with `DataColumnSidecarByRoot` message. Add extended data description Remove `DataRow` Apply suggestions from @jacobkaufmann code review Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com> Represent matrix in `BLSFieldElement` form Add `assert time >= store.time` to `on_tick` Revert the spec. Only handle it in tests Remove extra tick cleanup leftover Add randomized block cases Specify RPC byRoot blocks-sidecars elegibility fix typo Update specs/phase0/p2p-interface.md Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com> Update specs/deneb/p2p-interface.md Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com> add failed on_block condition rephrase Update specs/phase0/p2p-interface.md Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com> apply suggestion Update specs/deneb/p2p-interface.md Co-authored-by: danny <dannyjryan@gmail.com> Update specs/deneb/p2p-interface.md Co-authored-by: danny <dannyjryan@gmail.com> remove the last consider from on_block to state_transition simplify and add a new rule Update specs/phase0/p2p-interface.md Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com> Update specs/deneb/p2p-interface.md Co-authored-by: Mikhail Kalinin <noblesse.knight@gmail.com> Update specs/deneb/p2p-interface.md Co-authored-by: danny <dannyjryan@gmail.com> remove gossip failure rules Apply suggestions from code review bump version to v1.4.0-beta.5 Move `blob_sidecar_{subnet_id}` to `Blob subnets` section Misc minor fix Add linter support Add column subnet validation. Split `verify_column_sidecar` into two functions Fix `get_data_column_sidecars` by using `compute_samples_and_proofs` Apply suggestions from code review Co-authored-by: danny <dannyjryan@gmail.com> Do not assign row custody Apply suggestions from code review Co-authored-by: danny <dannyjryan@gmail.com> Revamp reconstruction section Use depth as the primary preset for inclusion proof. Fix `get_data_column_sidecars` and add tests for merkle proof Change `SAMPLES_PER_SLOT` to 8 and add tests (requirement TBD) Apply PR feedback from @ppopth and @jtraglia Fix `get_data_column_sidecars` Co-authored-by: Pop Chunhapanya <haxx.pop@gmail.com> Apply suggestions from code review Co-authored-by: Pop Chunhapanya <haxx.pop@gmail.com> Apply suggestions from code review Co-authored-by: fradamt <104826920+fradamt@users.noreply.github.com> Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com> Fix `get_data_column_sidecars` and `get_custody_lines` Apply suggestions from code review Co-authored-by: Jacob Kaufmann <jacobkaufmann18@gmail.com> Enhance tests fix typo Co-authored-by: fradamt <104826920+fradamt@users.noreply.github.com> Remove `epoch` from `get_custody_lines` fix fix
|
||
## A note on fork choice | ||
|
||
*Fork choice spec TBD, but it will just be a replacement of `is_data_available()` call in Deneb with column sampling instead of full download. Note the `is_data_available(slot_N)` will likely do a `-1` follow distance so that you just need to check the availability of slot `N-1` for slot `N` (starting with the block proposer of `N`).* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there more information on how the trailing slot fork choice works during network partitions or longer forks ? During any event where you might temporarily have your peers with split views(hard fork, consensus bug), the peers that you are connected to for the particular column subnets will neither have the block or the data. So when sampling for n-1
from these peers, they will return nothing and you would mark the data as unavailable.
How does sampling work when the network is undergoing stress and possible churn ? ( constant peer disconnections and new connections to find peers with the desired column subnets) . Is there a mode where we can fall back to so that a block and its data which was correctly broadcasted and propagated is marked as available rather than unavailable during such a situation.
Co-authored-by: g11tech <develop@g11tech.io> Co-authored-by: Pop Chunhapanya <haxx.pop@gmail.com>
Co-authored-by: Jimmy Chen <jchen.tc@gmail.com>
|
||
#### Messages | ||
|
||
##### DataColumnSidecarsByRoot v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would also need a DataColumnSidecarsByRange
for syncing right?
Nodes would request the columns they are assigned to custody:
(
start_slot: Slot
count: uint64
columns: List[ColumnIndex]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- especially if this retains the "Clients MUST support requesting sidecars since minimum_request_epoch
," statement and thus cannot be used for syncing data prior to the finalized epoch in the normal case.
A syncing node will also need to do DAS since their checkpoint-sync.
I wonder if this should support columns
or column
(many or just one). It's not clear to me what the loadbalancing strategy would be given the multiple things this needs to support (historic sampling vs historic custody gathering) and given how peers have various custodies.
| `MAX_REQUEST_DATA_COLUMN_SIDECARS` | `NUMBER_OF_COLUMNS` | Maximum number of data column sidecars in a single request | | ||
| `MIN_EPOCHS_FOR_DATA_COLUMN_SIDECARS_REQUESTS` | `2**12` (= 4096 epochs, ~18 days) | The minimum epoch range over which a node must serve data column sidecars | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Deneb, we have MAX_REQUEST_BLOB_SIDECARS
(768) calculated from MAX_REQUEST_BLOCKS_DENEB * MAX_BLOBS_PER_BLOCK
Should MAX_REQUEST_DATA_COLUMN_SIDECARS
be derived in a similar way, i.e. maximum of 128 blocks of sidecars in a single request: MAX_REQUEST_BLOCKS_DENEB * NUMBER_OF_COLUMNS
(16384)?
These should probably be added to the config files configs/mainnet.yaml
and configs/minimal.yaml
too.
|
||
Each node downloads and custodies a minimum of `CUSTODY_REQUIREMENT` subnets per slot. The particular subnets that the node is required to custody are selected pseudo-randomly (more on this below). | ||
|
||
A node *may* choose to custody and serve more than the minimum honesty requirement. Such a node explicitly advertises a number greater than `CUSTODY_REQUIREMENT` via the peer discovery mechanism -- for example, in their ENR (e.g. `custody_subnet_count: 4` if the node custodies `4` subnets each slot) -- up to a `DATA_COLUMN_SIDECAR_SUBNET_COUNT` (i.e. a super-full node). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation question - my apologies if it's off-topic for this PR - is there any value for a super-full node to custody the extended columns / rows (besides being able to serve data efficiently)? as it would always be able to reconstruct the full matrix without asking from its peers.
If it stores just the raw blobs, it would only need to store 50% of the samples for 1D extended blobs, or just 25% of the samples of the 2D extended blobs:
Max Blobs \ Data Stored | Storing raw blob only | 1D extended | 2D extended |
---|---|---|---|
6 | 96 GB | 192 GB | 384 GB |
32 | 512 GB | 1 TB | 2 TB |
64 | 1TB | 2 TB | 4 TB |
128 | 2TB | 4 TB | 8 TB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation question - my apologies if it's off-topic for this PR - is there any value for a super-full node to custody the extended columns / rows (besides being able to serve data efficiently)? as it would always be able to reconstruct the full matrix without asking from its peers.
Reconstructing a column when you have only row encoding is quite expensive. Say you need column j
. Then for every segment i,j
in row i
, you would need to read half of the row i
to do the reconstruction. This basically means reading in the whole data amount to reconstruct a single row.
So while you are always able to reconstruct, it can be costly, reading in and doing erasure coding on lots of data. I think it should be an implementation decision, but for the 1D extension I would keep the whole extended data for recent blocks, and only maybe discard half for old ones.
For the 2D extension, I would most probably store half of it, but in a more intelligent way than keeping the K columns or K rows. Basically what you want to do is to keep the upper left corner (original data in the systematic encoding), and the lower right corner ("double-parity" data) of the encoding. This allows you to efficiently restore any row, column or individual segment, while keeping only half of the data. Again, for old data that most probably won't be queried, you might want to fall back to keeping only the original (or some other K half-rows), while paying the price of a full restore when queried.
We might even go further, and instead of keeping the upper left corner and lower right corner, select a set A
of K row IDs and a set B
of K column IDs. Then you store chunk i,j
iff
This allows the same fast reconstruction, but randomises who stores what. This can be done with local randomness to improve resilience, or with peerID based randomness to reduce load of queries from good behaving nodes who can query nodes where they know the node can serve cheap without reconstruction.
Of course, a super-full node would have to balance the above with its own needs to access the original data. If you actually want to read the original data frequently for your own use, it would be stupid not to store the original part (upper left corner) of the systematic encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation question - my apologies if it's off-topic for this PR - is there any value for a super-full node to custody the extended columns / rows (besides being able to serve data efficiently)? as it would always be able to reconstruct the full matrix without asking from its peers.
If it stores just the raw blobs, it would only need to store 50% of the samples for 1D extended blobs, or just 25% of the samples of the 2D extended blobs:
Max Blobs \ Data Stored Storing raw blob only 1D extended 2D extended
6 96 GB 192 GB 384 GB
32 512 GB 1 TB 2 TB
64 1TB 2 TB 4 TB
128 2TB 4 TB 8 TB
All these are off by a factor of 2, assuming that target blobs are 1/2 of max blobs. (The EIP-1559 style pricing mechanism makes it impossible to significantly exceed the target for such a long period by increasing the price exponentially)
| Name | Value | Description | | ||
| - | - | - | | ||
| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumn` random samples a node queries per slot | | ||
| `CUSTODY_REQUIREMENT` | `1` | Minimum number of subnets an honest node custodies and serves samples from | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @dankrad, I think this should be higher than 1, because it is the most important parameter in determining how strong are the pre-sampling guarantees we get, i.e., how hard it is for an unavailable block to get a majority of the votes and become canonical, even when employing the trailing fork-choice. Ideally, we would want to ensure that only a small fraction of the validators can be tricked into voting for an unavailable block.
Imo, a suitable value would be >= 6. We might then also want to increase the number of subnets to 64, to keep the bandwidth consumption low. For reference, (CUSTODY_REQUIREMENT = 6, DATA_COLUMN_SIDECAR_SUBNET_COUNT = 64)
and everything else unchanged means that the total bandwidth consumption with MAX_BLOBS_PER_BLOCK = 16
is 60% of 4844. With 32 blobs, it's 120% of 4844. It can also be lowered by increasing the subnet count further if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is currently subnet custody which has many columns depending on the parametrization (4 currently if the parametrization is 128 columns across 32 subnets (128/32))
I find the way we are doing this parametrization a bit confusing or at least difficult to parse (as seen by the above misunderstanding)
The name should at least say explicitly if it's COLUMN_
or SUBNET_CUSTODY_REQUIREMENT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the name should be SUBNET_CUSTODY_REQUIREMENT
. Still, custodying a single subnet is not the same as custodying 4 random columns, and in fact gives you almost no pre-sampling guarantees.
For example, someone can make 49% of the subnets available (not making a subnet available here means not publishing any of the columns for that subnet), and 49% of validators will still succeed in downloading all columns they need. If on the other end the custody requirement had been 4 random columns, making 49% of columns available would only trick 1/16 of the validators. This is also the case if validators custody 4 subnets.
Another way to think about it is: whatever unit we choose for custody ends up being the relevant sampling unit for the purpose of pre-peer-sampling security. If we choose subnet custody, we are doing subnet sampling during distribution and column sampling later. If we want subnet sampling to be meaningfully secure by itself, we need the subnet custody requirement to be high enough.
…_BLOB_COMMITMENTS_PER_BLOCK`
@@ -156,5 +159,6 @@ WHISK_EPOCHS_PER_SHUFFLING_PHASE: 256 | |||
WHISK_PROPOSER_SELECTION_GAP: 2 | |||
|
|||
# EIP7594 | |||
EIP7594_FORK_VERSION: 0x06000001 | |||
EIP7594_FORK_EPOCH: 18446744073709551615 | |||
NUMBER_OF_COLUMNS: 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is derived from other preset values, would it make more sense to keep it in the preset? (i.e. modifying this NUMBER_OF_COLUMNS
in config would require a corresponding change on the FIELD_ELEMENTS_PER_BLOB
/FIELD_ELEMENTS_PER_CELL
values?
# `uint64((FIELD_ELEMENTS_PER_BLOB * 2) // FIELD_ELEMENTS_PER_CELL)` (= 128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it should be a config value, but I think it can be. I think it's okay for the config values to be derived from the preset values, but not the other way around. Preset values shouldn't be derived from config values.
@single_phase | ||
def test_invariants(spec): | ||
assert spec.FIELD_ELEMENTS_PER_BLOB % spec.FIELD_ELEMENTS_PER_CELL == 0 | ||
assert spec.FIELD_ELEMENTS_PER_BLOB * 2 % spec.config.NUMBER_OF_COLUMNS == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert spec.FIELD_ELEMENTS_PER_BLOB * 2 % spec.config.NUMBER_OF_COLUMNS == 0 | |
assert spec.FIELD_ELEMENTS_PER_EXT_BLOB % spec.config.NUMBER_OF_COLUMNS == 0 |
Low priority. Feel free to merge without.
|
||
| Name | Value | Description | | ||
| - | - | - | | ||
| `NUMBER_OF_COLUMNS` | `uint64((FIELD_ELEMENTS_PER_BLOB * 2) // FIELD_ELEMENTS_PER_CELL)` (= 128) | Number of columns in the extended data matrix. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `NUMBER_OF_COLUMNS` | `uint64((FIELD_ELEMENTS_PER_BLOB * 2) // FIELD_ELEMENTS_PER_CELL)` (= 128) | Number of columns in the extended data matrix. | | |
| `NUMBER_OF_COLUMNS` | `uint64(FIELD_ELEMENTS_PER_EXT_BLOB // FIELD_ELEMENTS_PER_CELL)` (= 128) | Number of columns in the extended data matrix. | |
Low priority, feel free to merge without this.
| `DataColumn` | `List[Cell, MAX_BLOB_COMMITMENTS_PER_BLOCK]` | The data of each column in EIP-7594 | | ||
| `ExtendedMatrix` | `List[Cell, MAX_BLOBS_PER_BLOCK * NUMBER_OF_COLUMNS]` | The full data of one-dimensional erasure coding extended blobs (in row major format) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ExtendedMatrix
use MAX_BLOB_COMMITMENTS_PER_BLOCK
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of minor nits/questions.
Getting there :)
| - | - | - | | ||
| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumn` random samples a node queries per slot | | ||
| `CUSTODY_REQUIREMENT` | `1` | Minimum number of subnets an honest node custodies and serves samples from | | ||
| `TARGET_NUMBER_OF_PEERS` | `70` | Suggested minimum peer count | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `TARGET_NUMBER_OF_PEERS` | `70` | Suggested minimum peer count | | |
| `TARGET_NUMBER_OF_PEERS` | `70` | Suggested minimum peer count to support peer sampling | |
#### `get_custody_columns` | ||
|
||
```python | ||
def get_custody_columns(node_id: NodeID, custody_subnet_count: uint64) -> Sequence[ColumnIndex]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in support of moving in this direction. Open a PR to this one. We'll likely get this merged in the meantime and can review the mapping proposal in isolation
i = 0 | ||
while len(subnet_ids) < custody_subnet_count: | ||
subnet_id = ( | ||
bytes_to_uint64(hash(uint_to_bytes(uint64(node_id + i)))[0:8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just sanity checking that this 256-bit to 64-bit casting is well-defined in the spec (or at least done somewhere else)
My initial check shows that we are just using the ssz-typing lib here which will be opinionated but not in a way that is actually defined in the SSZ spec (no mention of casting as it's a serialization spec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cast this down to 64-bit? why not just pass in the 256-bit value to hash
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we're doing it under the hood in places with bytes_to_uint64 so I guess it's well defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cast this down to 64-bit? why not just pass in the 256-bit value to
hash
?
I think it's because you need to do the + i
thing. You need to cast it to 64-bit first to do the math. I guess doing the math under uint256
is not supported by all the programming languages.
However, I think it should be uint64(node_id) + i
rather than uint64(node_id + i)
if subnet_id not in subnet_ids: | ||
subnet_ids.append(subnet_id) | ||
i += 1 | ||
assert len(subnet_ids) == len(set(subnet_ids)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it shouldn't be here. It's an assertion we should have in our tests, but that it's not a possible condition here given the if subnet_id not in subnet_ids
condition above for set append. and we generally don't have non-triggerable asserts in our functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this. This assertion is always false. The intention is only to catch a bug.
| Name | Value | Description | | ||
| - | - | - | | ||
| `SAMPLES_PER_SLOT` | `8` | Number of `DataColumn` random samples a node queries per slot | | ||
| `CUSTODY_REQUIREMENT` | `1` | Minimum number of subnets an honest node custodies and serves samples from | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is currently subnet custody which has many columns depending on the parametrization (4 currently if the parametrization is 128 columns across 32 subnets (128/32))
I find the way we are doing this parametrization a bit confusing or at least difficult to parse (as seen by the above misunderstanding)
The name should at least say explicitly if it's COLUMN_
or SUBNET_CUSTODY_REQUIREMENT
|
||
### Configuration | ||
|
||
*[New in Deneb:EIP4844]* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*[New in Deneb:EIP4844]* | |
*[New in EIP7594]* |
Verify if the proofs are correct | ||
""" | ||
assert sidecar.index < NUMBER_OF_COLUMNS | ||
assert len(sidecar.column) == len(sidecar.kzg_commitments) == len(sidecar.kzg_proofs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mix assert
and bool
return value. All failures in relation to the input data in this function should likely affect the bool return value rather than some throwing and some returning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if these values should be checked in a different manner, potentially have them in an elevated pre-processing function or in the p2p gossip conditions or something. I understand the desire for the bool to just be on the proof verification and not on the well-structuredness of the data, but still mixing asserts and bool return make this a difficult function to test/parse
##### `compute_subnet_for_data_column_sidecar` | ||
|
||
```python | ||
def compute_subnet_for_data_column_sidecar(column_index: ColumnIndex) -> SubnetID: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ppopth that the name here is misleading. I would suspect it to take in a sidecar not a column_index value. I might argue for a retro-patch to the naming convenction in deneb
|
||
#### Messages | ||
|
||
##### DataColumnSidecarsByRoot v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- especially if this retains the "Clients MUST support requesting sidecars since minimum_request_epoch
," statement and thus cannot be used for syncing data prior to the finalized epoch in the normal case.
A syncing node will also need to do DAS since their checkpoint-sync.
I wonder if this should support columns
or column
(many or just one). It's not clear to me what the loadbalancing strategy would be given the multiple things this needs to support (historic sampling vs historic custody gathering) and given how peers have various custodies.
|
||
##### `custody_subnet_count` | ||
|
||
A new field is added to the ENR under the key `custody_subnet_count` to facilitate custody data column discovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a MUST field or is the value assumed to be MINIMUM
if it is not found there. and what if someone puts in the field and puts a value less than MINIMUM
, e.g. 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it should be a MUST field. If the field is absent, the node should assume that the ENR is invalid according to the current fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it makes a huge difference
However I'd prefer it to be optional and specify extra subnet count. The way of finding voluntary nodes would be different from finding regular (obligatory) nodes. So putting there anything when a node doesn't commit to serving extra subnets looks a bit pointless to me.
Anyway it needs to specified explicitly here
|
||
```python | ||
def bls_field_to_bytes(x: BLSFieldElement) -> Bytes32: | ||
return int.to_bytes(x % BLS_MODULUS, 32, KZG_ENDIANNESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't BLSFieldElement
type imply that x < BLS_MODULUS
?
What is the meaning of % BLS_MODULUS
operation then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should, but the way the typing is set up a BLSFieldElement
is any possible uint256
and so the reduction is necessary
Previous work and discussion history: hwwhww#1
Now rebased on PR #3557 branch.