Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check "system peers" polkadot api to detect proxy servers down and suspend archipel earthbeat in this case #265

Open
branciard opened this issue Dec 6, 2020 · 1 comment

Comments

@branciard
Copy link
Member

branciard commented Dec 6, 2020

Proposal for detecting proxy server down.

After a tcp ping test on the deployed proxy, it appears that tcp ports are not open. That leads me to find another solution, here a proposal :

  • add a function in polkadot js file that periodically checks the current connnect peers on each polkadot node. Using this kind of code :
  async checkPeers ( containerName ) {
       console.log("checkPeers start");
        // Construct command to check system_peers
        const commandSystemPeers = ['curl', 'http://localhost:' + config.polkadotRpcPort, '-H', 'Content-Type:application/json;charset=utf-8', '-d',
        `{
          "jsonrpc":"2.0",
          "id":1,
          "method":"system_peers"
        }`];

        // Call system_health command in docker container
        const resultSystemPeers = await this.docker.dockerExecute(containerName, commandSystemPeers);
        console.log(resultSystemPeers);
  }
  • Modify js orchestrator to compare the expected peers connected ( from given config file) versus the actual connected peers ( from api calls details above ) .

  • propagate into the substrate runtime the potential disconnected peers ( potential proxy down) from a given point of view. Example of new substrate runtime function to add.
    signalPeersConnected( Peers ID connect, from peers ID point of view, blocktime, bool disconnect)

  • Modify js orchestrator to analyse 'signalPeersConnected' runtime data collected. If our current node is suspected by more than x peers of the archipel ( 9 nodes ). It must certainly mean that the current proxy server of our node is down.
    Action in this case => suspend our current heartbeat and switch to passive node, call give up leadership also
    I will allow another validator in our current group to take the leadership. Other validators of the group target another proxy server by design. Manual action must be down on this suspended node ( as it is today for a STONITH Failure)

link to #241 #260

@branciard
Copy link
Member Author

A simpler solution will be a libp2p rust deamon pinging and alerting proxy port.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant