You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The consensus has 2 built-in timers for automatic restart in case of disconnection from the majority of nodes.
STUCK_RESTART_INTERVAL_MS - triggers after 3 hours from the last mined block.
HEALTHCHECK_ON_START_RETRY_TIME_SEC - starts after the Skaled launch and lasts for 1500 seconds.
If the consensus loses the majority and restarts after 3 hours, the second Skaled start will be triggered by HEALTHCHECK_ON_START_RETRY_TIME_SEC. This complicates the chain recovery procedure in case of a crash - it may happen that downloading a large snapshot physically becomes impossible within 25 minutes, but it is possible within 3 hours.
The result of https://github.com/skalenetwork/internal-support/issues/51
Note:
All Skaled, that have been restarted without majority of nodes automatically will be restarted in 3 hours.
All Skaled, that have been restarted with the majority of nodes after /issues/51 - will be restarted every 25 minutes
Preconditions:
Active schain medium type (16 nodes)
At least 1 chain on node
Version
skalenetwork/schain:3.17.1
skalenetwork/schain:3.18.0-beta.0
Steps to reproduce
Stop 6 containers on schain
Wait for 3 hours and restart the one of 10 active container on Node A
Wait for 25 minutes and check skaled logs on the restarted container from node A
Expected behavior
Consensus should wait 3 hours before restarting himself if the majority of active nodes.
Actual state:
Consensus restarts after 25 minutes on node A when no majority on nodes.
oleksandrSydorenkoJ
changed the title
Сonsensus restarts after 25 minutes if it fails to connect 2/3 peers since the last start
Сonsensus restarts after 25 minutes if it fails to connect 2/3 peers when 11 active nodes
Feb 9, 2024
oleksandrSydorenkoJ
changed the title
Сonsensus restarts after 25 minutes if it fails to connect 2/3 peers when 11 active nodes
Сonsensus restarts after 25 minutes instead of the 3-hour interval after start
Feb 9, 2024
Describe the bug
The consensus has 2 built-in timers for automatic restart in case of disconnection from the majority of nodes.
If the consensus loses the majority and restarts after 3 hours, the second Skaled start will be triggered by HEALTHCHECK_ON_START_RETRY_TIME_SEC. This complicates the chain recovery procedure in case of a crash - it may happen that downloading a large snapshot physically becomes impossible within 25 minutes, but it is possible within 3 hours.
The result of https://github.com/skalenetwork/internal-support/issues/51
Note:
All Skaled, that have been restarted without majority of nodes automatically will be restarted in 3 hours.
All Skaled, that have been restarted with the majority of nodes after /issues/51 - will be restarted every 25 minutes
Preconditions:
Active schain medium type (16 nodes)
At least 1 chain on node
Version
skalenetwork/schain:3.17.1
skalenetwork/schain:3.18.0-beta.0
Steps to reproduce
Expected behavior
Consensus should wait 3 hours before restarting himself if the majority of active nodes.
Actual state:
Consensus restarts after 25 minutes on node A when no majority on nodes.
message (40).txt
The text was updated successfully, but these errors were encountered: