-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: possible netsplits and chokes #2635
Conversation
Bring current
…hould add their seed nodes to init.go. Inbound peer settings have been significantly increased because I theorize that the network is choked (symptom: low peer counts & slow sync speed)
update docs for 5.0.7 with embedded seeds
@mmulji-ic sir, something is wierd here. The changes I've made in this branch should not affect the tests that it seems to break. Is it possible you can take a look? I may have found it -- I added the other debug commands, but the CI output above may be worth having a peek at, because I can't figure out why adding other debug commands would ever make that happen. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This PR removes depguard and cleans up root.go
Related issues:
(I'm serious about the related issues -- they all relate back to the reality that the cosmos hub network is choked by default settings)
Recent tweet that could be related unless all of these nodes are run by the same VaaS:
Complimented by
This PR is complimented by iavl fast node because enabling iavl allows for better query handling and fixes a condition that is well known in tendermint: you can make p2p stop by hammering on the rpc endpoint, demonstrated here:
There's another related issue I opened on the tendermint repository before it was moved to comet, please inquire privately about that.
Combined, these could be used to destabilize the hub's p2p network.
Issues discovered while working on this
note
I am unsure why this is making the E2E tests fail. It seems like it's just one of them, and if somebody could have a peek that would be great, because I am back on baby duty 🙂.
other possibilities
It's perfectly possible that all of this was caused by a failure at a single VAS provider or by an interruption in the internet service links between the validators who experience downtime, and the rest of the world. In all cases, increasing the default number of inbound peers will improve performance on Gaia.