Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor notification protocols #2269

Merged
merged 23 commits into from
Nov 18, 2024
Merged

refactor notification protocols #2269

merged 23 commits into from
Nov 18, 2024

Conversation

turuslan
Copy link
Contributor

@turuslan turuslan commented Nov 11, 2024

Referenced issues

Closes #2261

Description of the Change

  • peer count metric and telemetry count "/announce" streams, not connections.
  • protocols are independent.
    • but "/announce" protocol controls "/grandpa" and "/transaction" protocols.
    • authority discovery controls "/validation" protocol.
  • protocol error doesn't close connection to peer, only that protocol stream.
  • protocols decide when to open streams.
  • "/collation" and "/validation" protocols handle ViewUpdate messages, delegating other messages to observer.

Possible Drawbacks

  • check peer reputation before connecting (how can peer reputation grow if there are no streams to raise reputation?)

Signed-off-by: turuslan <turuslan.devbox@gmail.com>
turuslan and others added 18 commits November 11, 2024 10:37
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>

# Conflicts:
#	core/parachain/validator/impl/parachain_processor.cpp
#	core/parachain/validator/parachain_processor.hpp
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
core/network/notifications/protocol.hpp Show resolved Hide resolved
core/telemetry/peer_count.hpp Show resolved Hide resolved
core/utils/lru.hpp Show resolved Hide resolved
core/utils/with_type.hpp Show resolved Hide resolved
Signed-off-by: turuslan <turuslan.devbox@gmail.com>
Copy link
Contributor

@Lederstrumpf Lederstrumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't yet confirm whether this PR addresses the stalling issue faced in #2261 since it OOMs too regularly (~ every 10 minutes) on this PR, which masks the stalling issue (OOM only happens every hour or so for me on current master, see #2276).

But it certainly improves consensus participation since the Kusama validator actually gets hundreds of era points now.

peer count metric and telemetry count "/announce" streams, not connections.

Note that according to telemetry, nodes on this branch all have 0 peers via this metric - I don't know whether this PR is not tallying "/announce" streams correctly, or whether that's the incorrect metric to use.

2024-11-17T13:14:50,136342050+01:00
2024-11-17T13:23:05,685666400+01:00

@kamilsa
Copy link
Contributor

kamilsa commented Nov 17, 2024

@Lederstrumpf interesting. Our node was running for more than a day with no crashes and no significance consensus stalls.

Can you please share the flags that you used to run the node?

Signed-off-by: turuslan <turuslan.devbox@gmail.com>
@Lederstrumpf
Copy link
Contributor

@Lederstrumpf interesting. Our node was running for more than a day with no crashes and no significance consensus stalls.

Can you please share the flags that you used to run the node?

Sure, it's the flags listed in the #2276 issue report, i.e.:

kagome --chain kusama -d [...] --validator --listen-addr [...] --public-addr [...] --name [...]  --rpc-port [...] --telemetry-url [...] --telemetry-url [...] --node-key-file [...]

@turuslan turuslan merged commit c3d19b6 into master Nov 18, 2024
11 of 12 checks passed
@turuslan turuslan deleted the refactor/notifications branch November 18, 2024 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Validator eventually stalls on Kusama
4 participants