Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus couldn't catchup blocks after node rotation on staying node #1488

Closed
oleksandrSydorenkoJ opened this issue Feb 17, 2023 · 3 comments · Fixed by #1548
Closed

Consensus couldn't catchup blocks after node rotation on staying node #1488

oleksandrSydorenkoJ opened this issue Feb 17, 2023 · 3 comments · Fixed by #1548
Assignees
Labels

Comments

@oleksandrSydorenkoJ
Copy link

oleksandrSydorenkoJ commented Feb 17, 2023

Version:
skalenetwork/schain:3.16.0-beta.8

Environment
17 active nodes
at least 1 active schain medium type

Preconditions
All nodes have enough balance for node rotation

Steps to reproduce

  1. Run init exit on node A
  2. Wait for DKG round and check skaled received exitTimestapmp from skale_admin
  3. stop skaled container on nodes B and C
  4. Wait for the rotation delay
  5. Immieatelly right after rotation delay restart skaled on node B
  6. wait for 20 minutes after rotation delay and restart skaled on node C
  7. Check logs of both containers

Expected behavior:
Skaled on both nodes B and C should catchup blocks with rotation's delay timestamp, gracefully self-stopped, and wait for sale admin regenerates schain config with new BLS key and recreated skaled container.

Actual state:
Skaled from node B catching up all blocks before 1st mined block with new BLS keys
Skaled from node C failed to catching up the batch of blocks that contains blocks with previous public keys and new public keys.

Note
Discussed with @kladkogex and @olehnikolaiev - there are 2 solution options:

  1. consensus should try to cathup the full batch of blocks and if receive an exception on the same block, consensus should decrease the range of batch to the latest valid block number, not retry to download the same batch.
  2. Consensus will divide the range of batch on half every retry when the skaled received the exception.

Logs:
timestamp and block number for the rotation delay

2023-02-17 15:08:54.097590   Block stats:BN:475970:BTS:1676646279:TXS:0:HDRS:11:LOGS:40:SENGS:1:TXRS:0:BLCKS:3:ACCS:101:BQS:1:BDS:270:TSS:0:UTX:0:VTX:0:CMM:10068
2023-02-17 15:08:54.097725   Block 475971 DB usage is 1113. Piece DB usage is 246914615 bytes
2023-02-17 15:08:54.098032   Setting ExitTimeReached = true
2023-02-17 15:08:54.098047   Skaled status: setExitState: ExitTimeReached to true

logs from skaled

[2023-02-17 15:25:46.060] [15:main] [error] 475732:Successfully deserialized 239 blocks, got exception on block 475972
[2023-02-17 15:25:46.061] [15:catchup] [error] 475732:Catchupc step 3: can not read missing blocks
[2023-02-17 15:25:46.061] [15:main] [error] 475732:!Exception: CatchupClientAgent:Catchupc step 3: can not read missing blocks
[2023-02-17 15:25:46.061] [15:main] [error] 475732: !Caused by: CatchupClientAgent:Could not process block list
[2023-02-17 15:25:46.061] [15:main] [error] 475732:  !Caused by: CommittedBlockList:Could not create block list. 
LIST_SIZE:303:SERIALIZED_BLOCK_SIZE:1342104:OFFSET:0:COUNTER:239:INDEX:1061764:END_INDEX:1062252
[2023-02-17 15:25:46.061] [15:main] [error] 475732:   !Caused by: CommittedBlock Block threshold signature did not verify in deserialization:deserialize
[2023-02-17 15:25:46.061] [15:main] [error] 475732:    !Caused by: CryptoManager:verifyBlockSig
[2023-02-17 15:25:46.061] [15:main] [error] 475732:     !Caused by: CryptoManager:verifyThresholdSig
[2023-02-17 15:25:46.061] [15:main] [error] 475732:      !Caused by: CryptoManager:Check failed::blsKeys.second->VerifySig( make_shared<array<uint8_t, HASH_LEN> >(_hash.getHash()), libBlsSig) /home/ubuntu/actions-runner1/_work/skaled/skaled/libconsensus/crypto/CryptoManager.cpp:844:BLS sig verification failed using both current and previous key
[2023-02-17 15:25:49.130] [15:main] [info] 475732:CONSENSUS_STARTED:PROPOSING: 0000000000000000
[2023-02-17 15:25:55.089] [15:main] [error] 475732:Could not BLS verify signature:14136490408187771616157173397249572114863920536399819247950720288726999262523:12327222773368221786348312496646904415277892683222395001430927628551340001549:13381812999991995073542236781353627566568166120727470789684700983315080579621:0:KEY:10688806556865876949117049039988288414418956095594422437949267687198989734787:HASH:8487f89b7c3105005983b9941e918d318ae7421d5fd8a42fca98afa0e1db9dbb
[2023-02-17 15:25:55.097] [15:main] [error] 475732:Successfully deserialized 239 blocks, got exception on block 475972
[2023-02-17 15:25:55.097] [15:catchup] [error] 475732:Catchupc step 3: can not read missing blocks
[2023-02-17 15:25:55.098] [15:main] [error] 475732:!Exception: CatchupClientAgent:Catchupc step 3: can not read missing blocks
[2023-02-17 15:25:55.098] [15:main] [error] 475732: !Caused by: CatchupClientAgent:Could not process block list
[2023-02-17 15:25:55.098] [15:main] [error] 475732:  !Caused by: CommittedBlockList:Could not create block list. 
LIST_SIZE:304:SERIALIZED_BLOCK_SIZE:1346550:OFFSET:0:COUNTER:239:INDEX:1061764:END_INDEX:1062252
[2023-02-17 15:25:55.098] [15:main] [error] 475732:   !Caused by: CommittedBlock Block threshold signature did not verify in deserialization:deserialize
[2023-02-17 15:25:55.098] [15:main] [error] 475732:    !Caused by: CryptoManager:verifyBlockSig
[2023-02-17 15:25:55.098] [15:main] [error] 475732:     !Caused by: CryptoManager:verifyThresholdSig
[2023-02-17 15:25:55.098] [15:main] [error] 475732:      !Caused by: CryptoManager:Check failed::blsKeys.second->VerifySig( make_shared<array<uint8_t, HASH_LEN> >(_hash.getHash()), libBlsSig) /home/ubuntu/actions-runner1/_work/skaled/skaled/libconsensus/crypto/CryptoManager.cpp:844:BLS sig verification failed using both current and previous key
@oleksandrSydorenkoJ oleksandrSydorenkoJ added bug Something isn't working proposal Proposal for next release labels Feb 17, 2023
@kladkogex
Copy link
Collaborator

Marked to 2.2

@kladkogex kladkogex reopened this Feb 20, 2023
@DmytroNazarenko DmytroNazarenko transferred this issue from skalenetwork/skale-consensus Apr 3, 2023
@DmytroNazarenko DmytroNazarenko moved this from Ready For Pickup to To Do in SKALE Engineering 🚀 Apr 4, 2023
@DmytroNazarenko DmytroNazarenko added release:2.2 and removed proposal Proposal for next release labels Apr 10, 2023
@DmytroNazarenko DmytroNazarenko moved this from To Do to Ready For Pickup in SKALE Engineering 🚀 Apr 10, 2023
@DmytroNazarenko DmytroNazarenko moved this from Ready For Pickup to Code Review in SKALE Engineering 🚀 Jun 16, 2023
@github-project-automation github-project-automation bot moved this from Code Review to Ready For Release Candidate in SKALE Engineering 🚀 Jun 20, 2023
@DmytroNazarenko
Copy link
Collaborator

skaled: 3.17.0-beta.1

@DmytroNazarenko DmytroNazarenko moved this from Ready For Release Candidate to Merged To Release Candidate in SKALE Engineering 🚀 Jun 21, 2023
@EvgeniyZZ EvgeniyZZ moved this from Merged To Release Candidate to QA in SKALE Engineering 🚀 Jun 21, 2023
@oleksandrSydorenkoJ
Copy link
Author

Verified on regression network
skale_schain_rapping-fum-al-samakah
skalenetwork/schain:3.17.0-beta.7

skaled_catchin_up_after_node_rotation.log

@EvgeniyZZ EvgeniyZZ moved this from QA to Done in SKALE Engineering 🚀 Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
3 participants