Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossipsub: feature-request: Optional reason code for why a peer was pruned #555

Open
MarcoPolo opened this issue Jun 29, 2023 · 8 comments

Comments

@MarcoPolo
Copy link
Contributor

It would be very helpful for debugging and health monitoring of the network to know why a peer pruned us. Even if they only tell we were pruned because our score became negative, that would be helpful.

I don't think there's a security issue here since a node can essentially infer it is misbehaving if many peers prune at once. This just makes that explicit.

This came up debugging filecoin-project/lotus#10906, and @shrenujbansal suggested this. It would be useful to know that a peer gave us a negative score because it would hint that we did something wrong.

@vyzo
Copy link
Contributor

vyzo commented Jun 29, 2023

I dont think this is particulalry useful, most likely you got pruned because of a negative score.

@shrenujbansal
Copy link

Correct me if I'm wrong but I believe you can also get pruned if an existing peer has too many peers (above the high watermark) and prunes a bunch of peers as a result

Being able to tell if several peers were pruning you due to a negative score, as a result of some activity would become very useful in debugging the sort of issues like #10906 where your node is not receiving blocks and losing sync as a result
If we're able to see this number tick up in a grafana dashboard, it immediately gives us more clues as to what is going on, rather than figuring this out via lots of speculation, additional logging and experiments

@vyzo
Copy link
Contributor

vyzo commented Jun 29, 2023 via email

@vyzo
Copy link
Contributor

vyzo commented Jun 29, 2023 via email

@shrenujbansal
Copy link

The metric you care about is number of prunes and you already have that. Your thinking that all peers are pruning you because of oversubscription is probably going into the realm of the extremely unlikely. If you see a massive prune spike in your metrics, you should be thinking score.

One thing I wanna confirm is when you say "number of prunes", do you mean the number of prunes by the current node or number of prunes of the current node by others? We want to try to see the latter if possible

@MarcoPolo @vyzo mentions that we already have the number of prunes metric. Is this something also immediately visible on grafana or can be made visible easily?

@vyzo
Copy link
Contributor

vyzo commented Jun 29, 2023 via email

@shrenujbansal
Copy link

shrenujbansal commented Jun 30, 2023

vyzo mentions we already have the number of prunes metric. Is this something also immediately visible on grafana or can be made visible easily?

@MarcoPolo do you have any idea?

@MarcoPolo
Copy link
Contributor Author

It would be here: https://github.com/filecoin-project/lotus/blob/master/node/modules/lp2p/pubsub.go#L562 like how there is stats.Record calls. I don't think this is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

3 participants