-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Sentinel, and in particular, the python-bitcoinrpc that it relies on, have hard-coded timeouts of 30 seconds to receive the results of dash rpc calls. This is probably fine. Maybe the timeouts should be configurable but regardless, the RPC command may take a long time or even a very long time to run. If sentinel times out waiting for the command to complete, it won't report in and a masternode will eventually transition into WATCHDOG_EXPIRED even though it's trying to follow the rules.
In my test, dash-cli gobject list took 40 seconds to run. Over several hours, all three of my 12.1 testnet masternodes entered WATCHDOG_EXPIRED.
An increase to 50 second-timeout resolved the problem but that is a short term fix. Testnet is tiny compared to mainnet. As governance object increase in number, so too will the time it takes to run dash-cli gobject list to completion.
I am concerned that we may be looking at a scalability problem. How to fix? There are options. I propose looking at improving the speed of the RPC command and permitting customizable timeouts.