Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-propagation of seen aggregated attestations dramatically increases bandwidth #9935

Closed
terencechain opened this issue Nov 24, 2021 · 5 comments
Labels
Networking P2P related items Priority: High High priority item

Comments

@terencechain
Copy link
Member

The re-propagation of seen aggregated attestations attestion_aggregate_proof was enabled in the consensus networking spec in Altair (See: rationale and pr)

Prysm implemented it in 9830 and included it in the release v2.0.3. Post release, we received multiple reports the outbound traffic quadrupled in certain situations for nodes then we root caused down to to the large increases in attestion_aggregate_proof re-propagation

Worth noting that Prysm may be the only client with re-propagation enabled, some other clients implementations are still in PR mode: lighthouse's pr

Opening this PR to document further findings, resolutions... etc

@terencechain terencechain added the Priority: High High priority item label Nov 24, 2021
@prestonvanloon
Copy link
Member

I observed a 100% (x2) increase in outbound network traffic on a small serving beacon-chain/validator client.
I did not observe quadruple increase. Testnets also did not reflect significant increase in outbound traffic, at least from the data we recorded. I'd like to verify that our testnet data sampling is accurately capturing p2p traffic, but I have no reason to believe it is not at this time.

@prestonvanloon prestonvanloon added the Networking P2P related items label Nov 24, 2021
@mohamedmansour
Copy link
Contributor

I observed 2x CPU usage as well, is this related?
image

Network is 2x as well:
image

@prestonvanloon
Copy link
Member

is this related?

Very likely.

v2.0.4 release candidate is 1d53fd2.
If you are up to it, please run from that commit. We are still testing / evaluating it with prod serving validators at this time, but believe CPU use is lower with the changes at 1d53fd2.

@mohamedmansour
Copy link
Contributor

@prestonvanloon okay that fixed the cpu and network (on mainnet):
image
image

@prestonvanloon
Copy link
Member

This is resolved in v2.0.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Networking P2P related items Priority: High High priority item
Projects
None yet
Development

No branches or pull requests

3 participants