Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

[CI] Add bootnode checking CI jobs #6889

Merged
merged 29 commits into from
Mar 21, 2023
Merged

Conversation

s3krit
Copy link
Contributor

@s3krit s3krit commented Mar 15, 2023

Fixes paritytech/ci_cd#735
This PR creates two new Github workflows that check whether some given set of bootnodes are contactable. It does this by rewriting the chainspec for a given runtime such that it only includes the bootnode we wish to test. It then runs a polkadot node using that new chainspec, waits a little while and polls the local node's health_check endpoint to make sure it has been able to peer with other nodes.

The two tests are:

  • One that runs at release-time (i.e., when a release candidate branch is pushed) that iterates over all of the bootnodes in the kusama, westend and polkadot runtimes and ensures that each of those bootnodes are contactable.
  • One that runs every time a chainspec file in node/service/chain-specs/ is updated that only tests the connectivity of the newly-added bootnodes.

P.S., @paritytech/release-engineering I wasn't sure what to call the on-release workflow file, I noticed you've got some pattern for your github workflows, so lmk what I should change it to (i.e., what number I should prepend the file with).

EDIT: Oh also, here are some examples of the jobs being run against bootnodes. Notice that it is already finding a couple of uncontactable bootnodes (verified manually so we know it's working):

For all three runtimes, the bootnodes by polkadotters.com and metaspan.io are unresponsive. Also for kusama, one of our own bootnodes (/dns/kusama-bootnode-0.paritytech.net/tcp/30334/ws/p2p/12D3KooWSueCPH3puP2PcvqPJdNaDNF3jMZjtJtDiSy35pWrbt5h) is also uncontactable.

Currently failing bootnodes:

Westend:

[!] Bad bootnodes found for westend:
    /dns/westend-bootnode.polkadotters.com/tcp/30333/p2p/12D3KooWHPHb64jXMtSRJDrYFATWeLnvChL8NtWVttY67DCH1eC5
    /dns/boot-westend.metaspan.io/tcp/33016/wss/p2p/12D3KooWNTau7iG4G9cUJSwwt2QJP1W88pUf2SgqsHjRU2RL8pfa

Kusama:

 [!] Bad bootnodes found for kusama:
    /dns/kusama-bootnode-0.paritytech.net/tcp/30334/ws/p2p/12D3KooWSueCPH3puP2PcvqPJdNaDNF3jMZjtJtDiSy35pWrbt5h
    /dns/kusama-bootnode.polkadotters.com/tcp/30333/p2p/12D3KooWHB5rTeNkQdXNJ9ynvGz8Lpnmsctt7Tvp7mrYv6bcwbPG
    /dns/boot-kusama.metaspan.io/tcp/23016/wss/p2p/12D3KooWE1tq9ZL9AAxMiUBBqy1ENmh5pwfWabnoBPMo8gFPXhn6

Polkadot:

[!] Bad bootnodes found for polkadot:
    /dns/polkadot-bootnode.polkadotters.com/tcp/30333/p2p/12D3KooWPAVUgBaBk6n8SztLrMk8ESByncbAfRKUdxY1nygb9zG3
    /dns/boot-polkadot.metaspan.io/tcp/13016/wss/p2p/12D3KooWRjHFApinuqSBjoaDjQHvxwubQSpEVy5hrgC9Smvh92WF

Don't know if we want a process for removing them, should we remove them as part of another PR or contact the owners of said bootnodes first?

@s3krit s3krit requested a review from a team March 15, 2023 16:55
@s3krit s3krit added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Mar 15, 2023
@s3krit s3krit self-assigned this Mar 15, 2023
@s3krit s3krit requested a review from PierreBesson March 15, 2023 17:25
@s3krit s3krit marked this pull request as ready for review March 15, 2023 17:25
@s3krit s3krit requested review from a team and chevdor as code owners March 15, 2023 17:25
scripts/ci/common/lib.sh Show resolved Hide resolved
scripts/ci/common/lib.sh Outdated Show resolved Hide resolved
scripts/ci/github/check_bootnodes.sh Outdated Show resolved Hide resolved
s3krit added 2 commits March 16, 2023 15:10
Begin polling the RPC node right after spawning, allowing us to break
early on detecting peers
@senseless
Copy link
Contributor

Thanks for this, we were trying to figure out a way to roll our own decentralized test and hadn't considered using custom chain spec files. We'll be rolling this into the decentralized IBP monitor to monitor our own nodes.

In the case of polkadotters, I believe they had an issue with their network security key. There is an open PR from them to correct this. In the case of metaspan, I reached out to let them know that their bootnodes were failing.

@dcolley
Copy link

dcolley commented Mar 16, 2023

I corrected an error in the proxy config for the wss connection. Using this command I'm able now to get a test node to start syncing:

docker run --rm parity/polkadot \
  --base-path /data \
  --no-hardware-benchmarks --no-mdns \
  --chain westend \
  --reserved-only \
  --reserved-nodes "/dns/boot-westend.metaspan.io/tcp/33016/wss/p2p/12D3KooWNTau7iG4G9cUJSwwt2QJP1W88pUf2SgqsHjRU2RL8pfa"

@s3krit
Copy link
Contributor Author

s3krit commented Mar 17, 2023

I corrected an error in the proxy config for the wss connection. Using this command I'm able now to get a test node to start syncing:

docker run --rm parity/polkadot \
  --base-path /data \
  --no-hardware-benchmarks --no-mdns \
  --chain westend \
  --reserved-only \
  --reserved-nodes "/dns/boot-westend.metaspan.io/tcp/33016/wss/p2p/12D3KooWNTau7iG4G9cUJSwwt2QJP1W88pUf2SgqsHjRU2RL8pfa"

Awesome! I reran our job and the metaspan bootnode no longer appears uncontactable for the three runtimes we're checking, cheers!

@PierreBesson
Copy link
Contributor

In my independent testing the Parity bootnode is UP: /dns/kusama-bootnode-0.paritytech.net/tcp/30334/ws/p2p/12D3KooWSueCPH3puP2PcvqPJdNaDNF3jMZjtJtDiSy35pWrbt5h

@s3krit
Copy link
Contributor Author

s3krit commented Mar 20, 2023

In my independent testing the Parity bootnode is UP: /dns/kusama-bootnode-0.paritytech.net/tcp/30334/ws/p2p/12D3KooWSueCPH3puP2PcvqPJdNaDNF3jMZjtJtDiSy35pWrbt5h

So testing with --no-hardware-benchmarks --tmp --no-mdns --chain kusama --reserved-only --reserved-nodes "/dns/kusama-bootnode-0.paritytech.net/tcp/30334/ws/p2p/12D3KooWSueCPH3puP2PcvqPJdNaDNF3jMZjtJtDiSy35pWrbt5h", this will indeed connect to it as a peer (and the only peer), but when creating a chainspec with only that bootnode, we will never find any peers. Sounds to me like this bootnode is not well connected, which indicates to me a problem with the bootnode (and also confirms that Basti's suggested approach of alterning the chainspec to just include the bootnode we want to test was the correct approach to take).

@paritytech-ci paritytech-ci requested a review from a team March 21, 2023 10:38
@s3krit
Copy link
Contributor Author

s3krit commented Mar 21, 2023

bot rebase

@paritytech-processbot
Copy link

Rebased

@s3krit
Copy link
Contributor Author

s3krit commented Mar 21, 2023

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 6efc49f into master Mar 21, 2023
@paritytech-processbot paritytech-processbot bot deleted the mp-bootnode-checker branch March 21, 2023 12:36
ordian added a commit that referenced this pull request Mar 21, 2023
* master:
  kusama: enable dispute slashes (#5974)
  Introduce OpenGov into Polkadot (#6701)
  introduce new well known key (#6915)
  [CI] Add bootnode checking CI jobs (#6889)
  Bump parity-db (#6921)
  Handling timers for repeat dispute participation requests (#6901)
  [Companion #13634] keystore overhaul (iter2) (#6913)
  tweak some pattern matches to address a new clippy warning
  Bump ci-linux image for rust 1.68
  Revert "Update orchestra to the recent version (#6854)" (#6916)
  Deprecate Currency: Companion for #12951 (#6780)
  changelog: template fixup (#6907)
  [Companion #13615] Keystore overhaul (#6892)
  update weights (#6897)
  Fix approval voting test (#6898)
  parachains-runtime: Less cloning! (#6896)
  Testing Reversion Speed on Dispute Concluded Against (#6880)
  remove duplicated arm and fix version index (#6884)
ordian added a commit that referenced this pull request Mar 21, 2023
* master:
  kusama: enable dispute slashes (#5974)
  Introduce OpenGov into Polkadot (#6701)
  introduce new well known key (#6915)
  [CI] Add bootnode checking CI jobs (#6889)
  Bump parity-db (#6921)
  Handling timers for repeat dispute participation requests (#6901)
  [Companion #13634] keystore overhaul (iter2) (#6913)
  tweak some pattern matches to address a new clippy warning
  Bump ci-linux image for rust 1.68
  Revert "Update orchestra to the recent version (#6854)" (#6916)
  Deprecate Currency: Companion for #12951 (#6780)
  changelog: template fixup (#6907)
  [Companion #13615] Keystore overhaul (#6892)
  update weights (#6897)
  Fix approval voting test (#6898)
  parachains-runtime: Less cloning! (#6896)
  Testing Reversion Speed on Dispute Concluded Against (#6880)
  remove duplicated arm and fix version index (#6884)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants