Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eip4844 gossip #4444
eip4844 gossip #4444
Changes from 1 commit
7ea35b6
769b251
976f699
e6096de
feb9b5f
549ef9f
40e1bda
aa533c6
c03e57a
ec44beb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codewise, it makes more sense to use
isErr
here and return early so as to keep the same flow for all error returns and have the success condition at the end..also, I wonder if we should structure the code more towards the blob being decoupled from each other - not in this PR perhaps, but probably we want to run blob/sidecar validation in parallel (in a decoupled design, we'd need a quarantine / holding area for unmatched blob/block pairs etc but it would be interesting to explore, code-wise, what the impact is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fully agreed, done in c03e57a
Is the motivation to limit validation latency? I guess I don't have much intuition at this point of how much extra sequential processing time blob validation will take, but if the additional latency is notable then yes it makes sense.
Or maybe this is this future-proofing for a world where gossip ends up being reworked to have separate blocks and blobs topics? I agree with that benefit (ofc depends on if/when that gossip change would be made).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this
phase.SignedBeaconBlock
method works reliably, that'd be useful more broadly.The current codebase doesn't do this, I'm not sure if it's documented, and Nim does some strange things with it, though:
Also, Nim 1.6 still doesn't properly typecheck these situations -- nim-lang/Nim#1027 was only fixed a couple months ago in Nim
devel
. I've asked for a backport of its fix nim-lang/Nim#20631 to 1.6, since Nimbus uses this pattern extensively.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One approach used is to pass the entire type (
phase0.SignedBeaconBlock
,altair.SignedBeaconBlock
, et cetera) as a type or generics parameter to a template or function. Repeats itself slightly, but the semantics are better-defined -- those things are in themselves types, while it's not clear at least to me what Nim thinks is happening here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm partly concerned because the local testnets in CI keep failing (e.g., https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Flinux%2Fx86_64/detail/PR-4444/7/pipeline https://ci.status.im/blue/organizations/jenkins/nimbus-eth2%2Fplatforms%2Fmacos%2Faarch64/detail/PR-4444/7/pipeline https://ci.status.im/job/nimbus-eth2/job/platforms/job/macos/job/x86_64/job/PR-4444/7/display/redirect).
Block receive gossip just stops working, regardless of fork or slot, entirely with this PR.
So, blocks get either sent by a node, which sends them through the block processing pipeline locally or sent and ... apparently not really processed properly. This works for a while, because nodes manage to request backfill blocks in time via req/resp, e.g.:
Where node 3 initially sent that:
This is the entirety of "Block received" via gossip, across all nodes and all slots:
There should be 3 block received logs for every slot (not the one which sent the block,, of the 4 nodes).
The trouble with relying on req/resp, aside from that block gossip shouldn't be failing to begin with, is that since the attestations get dropped, these blocks don't necessarily get enough votes to keep the head moving. By the time req/resp gets processed, it's too late for the nodes to count votes in a similar way.
So, the combination of those Nim issues/bugs and this particular failure mode causes me to be skeptical that this refactoring is working as intended.
where that variation runs the minimal preset version in a loop until it fails for some reason.
from the
nimbus-eth2
directory runs the local testnet a single time, and the various options can of course be adjusted.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this and for the explanations. I've removed the template. I figured the refactoring was straightforward enough; lesson learned. In any case, it belonged at the least in a separate commit and probably in a different PR.
As a sidenote, this also changed the order in which validators were installed. I did that that to group related validators for clarity, assuming the change had to be innocuous, but given the specific order in which the validators were being installed, am now wondering if there might have been a reason for that order. Anyway, order restored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I agree, in principle, with the refactoring, but also that's it probably shouldn't be part of this PR.
The order was not intended to be important. They'll get called by
nim-libp2p
in whatever order messages come in, regardless. Might as well keep the order though.