-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eip4844 gossip #4444
eip4844 gossip #4444
Changes from 4 commits
7ea35b6
769b251
976f699
e6096de
feb9b5f
549ef9f
40e1bda
aa533c6
c03e57a
ec44beb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,8 @@ import | |
when defined(posix): | ||
import system/ansi_c | ||
|
||
from ./spec/datatypes/eip4844 import SignedBeaconBlock | ||
|
||
from | ||
libp2p/protocols/pubsub/gossipsub | ||
import | ||
|
@@ -1067,25 +1069,23 @@ proc updateGossipStatus(node: BeaconNode, slot: Slot) {.async.} = | |
|
||
let forkDigests = node.forkDigests() | ||
|
||
discard $eip4844ImplementationMissing & "nimbus_beacon_node.nim:updateGossipStatus check EIP4844 removeMessageHandlers" | ||
const removeMessageHandlers: array[BeaconStateFork, auto] = [ | ||
removePhase0MessageHandlers, | ||
removeAltairMessageHandlers, | ||
removeAltairMessageHandlers, # with different forkDigest | ||
removeAltairMessageHandlers, # bellatrix (altair handlers, with different forkDigest) | ||
removeCapellaMessageHandlers, | ||
removeCapellaMessageHandlers | ||
removeCapellaMessageHandlers # eip4844 (capella handlers, different forkDigest) | ||
] | ||
|
||
for gossipFork in oldGossipForks: | ||
removeMessageHandlers[gossipFork](node, forkDigests[gossipFork]) | ||
|
||
discard $eip4844ImplementationMissing & "nimbus_beacon_node.nim:updateGossipStatus check EIP4844 message addMessageHandlers" | ||
const addMessageHandlers: array[BeaconStateFork, auto] = [ | ||
addPhase0MessageHandlers, | ||
addAltairMessageHandlers, | ||
addAltairMessageHandlers, # with different forkDigest | ||
addAltairMessageHandlers, # bellatrix (altair handlers, with different forkDigest) | ||
addCapellaMessageHandlers, | ||
addCapellaMessageHandlers | ||
addCapellaMessageHandlers # eip4844 (capella handlers, different forkDigest) | ||
] | ||
|
||
for gossipFork in newGossipForks: | ||
|
@@ -1373,14 +1373,28 @@ proc installMessageValidators(node: BeaconNode) = | |
# subnets are subscribed to during any given epoch. | ||
let forkDigests = node.dag.forkDigests | ||
|
||
template installBeaconBlocksValidator(digest: auto, phase: auto) = | ||
node.network.addValidator( | ||
getBeaconBlocksTopic(digest), | ||
proc (signedBlock: phase.SignedBeaconBlock): ValidationResult = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this The current codebase doesn't do this, I'm not sure if it's documented, and Nim does some strange things with it, though: let x = 3
template f(y: auto) = echo y.x # Builds, and
f(system) # outputs "3"; same with f(y: untyped)
# template g(y: untyped) = echo y.3 # Error: identifier expected, but got '3'
# echo system.3 # Error: identifier expected, but got '3'
#echo system.x # Error: undeclared identifier: 'x' Also, Nim 1.6 still doesn't properly typecheck these situations -- nim-lang/Nim#1027 was only fixed a couple months ago in Nim There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One approach used is to pass the entire type ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm partly concerned because the local testnets in CI keep failing (e.g., https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Flinux%2Fx86_64/detail/PR-4444/7/pipeline https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Fmacos%2Faarch64/detail/PR-4444/7/pipeline https://ci.status.im/job/nimbus-eth2/job/platforms/job/macos/job/x86_64/job/PR-4444/7/display/redirect). Block receive gossip just stops working, regardless of fork or slot, entirely with this PR. So, blocks get either sent by a node, which sends them through the block processing pipeline locally or sent and ... apparently not really processed properly. This works for a while, because nodes manage to request backfill blocks in time via req/resp, e.g.:
Where node 3 initially sent that:
This is the entirety of "Block received" via gossip, across all nodes and all slots:
There should be 3 block received logs for every slot (not the one which sent the block,, of the 4 nodes). The trouble with relying on req/resp, aside from that block gossip shouldn't be failing to begin with, is that since the attestations get dropped, these blocks don't necessarily get enough votes to keep the head moving. By the time req/resp gets processed, it's too late for the nodes to count votes in a similar way. So, the combination of those Nim issues/bugs and this particular failure mode causes me to be skeptical that this refactoring is working as intended.
where that variation runs the minimal preset version in a loop until it fails for some reason.
from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for digging into this and for the explanations. I've removed the template. I figured the refactoring was straightforward enough; lesson learned. In any case, it belonged at the least in a separate commit and probably in a different PR. As a sidenote, this also changed the order in which validators were installed. I did that that to group related validators for clarity, assuming the change had to be innocuous, but given the specific order in which the validators were being installed, am now wondering if there might have been a reason for that order. Anyway, order restored. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure. I agree, in principle, with the refactoring, but also that's it probably shouldn't be part of this PR.
The order was not intended to be important. They'll get called by |
||
if node.shouldSyncOptimistically(node.currentSlot): | ||
toValidationResult( | ||
node.optimisticProcessor.processSignedBeaconBlock(signedBlock)) | ||
else: | ||
toValidationResult(node.processor[].processSignedBeaconBlock( | ||
MsgSource.gossip, signedBlock))) | ||
|
||
|
||
installBeaconBlocksValidator(forkDigests.phase0, phase0) | ||
installBeaconBlocksValidator(forkDigests.altair, altair) | ||
installBeaconBlocksValidator(forkDigests.bellatrix, bellatrix) | ||
installBeaconBlocksValidator(forkDigests.capella, capella) | ||
|
||
node.network.addValidator( | ||
getBeaconBlocksTopic(forkDigests.phase0), | ||
proc (signedBlock: phase0.SignedBeaconBlock): ValidationResult = | ||
if node.shouldSyncOptimistically(node.currentSlot): | ||
toValidationResult( | ||
node.optimisticProcessor.processSignedBeaconBlock(signedBlock)) | ||
else: | ||
toValidationResult(node.processor[].processSignedBeaconBlock( | ||
getBeaconBlockAndBlobsSidecarTopic(forkDigests.eip4844), | ||
proc (signedBlock: eip4844.SignedBeaconBlockAndBlobsSidecar): ValidationResult = | ||
# TODO: take into account node.shouldSyncOptimistically(node.currentSlot) | ||
toValidationResult(node.processor[].processSignedBeaconBlockAndBlobsSidecar( | ||
MsgSource.gossip, signedBlock))) | ||
|
||
template installPhase0Validators(digest: auto) = | ||
|
@@ -1433,38 +1447,6 @@ proc installMessageValidators(node: BeaconNode) = | |
installPhase0Validators(forkDigests.capella) | ||
installPhase0Validators(forkDigests.eip4844) | ||
|
||
node.network.addValidator( | ||
getBeaconBlocksTopic(forkDigests.altair), | ||
proc (signedBlock: altair.SignedBeaconBlock): ValidationResult = | ||
if node.shouldSyncOptimistically(node.currentSlot): | ||
toValidationResult( | ||
node.optimisticProcessor.processSignedBeaconBlock(signedBlock)) | ||
else: | ||
toValidationResult(node.processor[].processSignedBeaconBlock( | ||
MsgSource.gossip, signedBlock))) | ||
|
||
node.network.addValidator( | ||
getBeaconBlocksTopic(forkDigests.bellatrix), | ||
proc (signedBlock: bellatrix.SignedBeaconBlock): ValidationResult = | ||
if node.shouldSyncOptimistically(node.currentSlot): | ||
toValidationResult( | ||
node.optimisticProcessor.processSignedBeaconBlock(signedBlock)) | ||
else: | ||
toValidationResult(node.processor[].processSignedBeaconBlock( | ||
MsgSource.gossip, signedBlock))) | ||
|
||
node.network.addValidator( | ||
getBeaconBlocksTopic(forkDigests.capella), | ||
proc (signedBlock: capella.SignedBeaconBlock): ValidationResult = | ||
if node.shouldSyncOptimistically(node.currentSlot): | ||
toValidationResult( | ||
node.optimisticProcessor.processSignedBeaconBlock(signedBlock)) | ||
else: | ||
toValidationResult(node.processor[].processSignedBeaconBlock( | ||
MsgSource.gossip, signedBlock))) | ||
|
||
discard $eip4844ImplementationMissing & ": add validation here, but per https://github.com/ethereum/consensus-specs/blob/v1.3.0-alpha.1/specs/eip4844/p2p-interface.md#beacon_block it's not beacon_block but beacon_block_and_blobs_sidecar" | ||
|
||
template installSyncCommitteeeValidators(digest: auto) = | ||
for subcommitteeIdx in SyncSubcommitteeIndex: | ||
closureScope: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codewise, it makes more sense to use
isErr
here and return early so as to keep the same flow for all error returns and have the success condition at the end..also, I wonder if we should structure the code more towards the blob being decoupled from each other - not in this PR perhaps, but probably we want to run blob/sidecar validation in parallel (in a decoupled design, we'd need a quarantine / holding area for unmatched blob/block pairs etc but it would be interesting to explore, code-wise, what the impact is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fully agreed, done in c03e57a
Is the motivation to limit validation latency? I guess I don't have much intuition at this point of how much extra sequential processing time blob validation will take, but if the additional latency is notable then yes it makes sense.
Or maybe this is this future-proofing for a world where gossip ends up being reworked to have separate blocks and blobs topics? I agree with that benefit (ofc depends on if/when that gossip change would be made).