Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish threading rules in skaled-consensus interaction #1425

Closed
dimalit opened this issue Feb 17, 2023 · 8 comments · Fixed by #1535 or skalenetwork/skale-consensus#788
Closed

Establish threading rules in skaled-consensus interaction #1425

dimalit opened this issue Feb 17, 2023 · 8 comments · Fixed by #1535 or skalenetwork/skale-consensus#788
Assignees

Comments

@dimalit
Copy link
Contributor

dimalit commented Feb 17, 2023

  1. Current implementation is based on the assumption that consensus always calls pendingTransactions and createBlock from the same thread.This is not the case for very long time already.
  2. Threading rules for startAll/bootstrapAll should be defined.
  3. Threading rules for consensus and skaled termination should be defined. When these functions can be called? From which threads they can be called?
  4. In ConsensusExtFace::terminateApplication() should consensus' exitGracefully() be also called?
  5. Evaluate this statement: Consensus guarantees that it will not do anything for a particular block ID, until pendingTransactions(...) for this block id returns. (ConsensusInterface.h)
  6. Consensus should have 2 states: 'created' and 'started'. After 1st we can allow RPC. See https://github.com/skalenetwork/internal-support/issues/753

Need to decide these rules, document and follow them.

@dimalit
Copy link
Contributor Author

dimalit commented Mar 6, 2023

@dimalit dimalit changed the title Esteblish threading rules in skaled-consensus interaction Establish threading rules in skaled-consensus interaction Mar 6, 2023
@DmytroNazarenko DmytroNazarenko moved this from To Do to Ready For Pickup in SKALE Engineering 🚀 Apr 4, 2023
@dimalit dimalit moved this from Ready For Pickup to In Progress in SKALE Engineering 🚀 Apr 19, 2023
@dimalit
Copy link
Contributor Author

dimalit commented Apr 21, 2023

@dimalit
Copy link
Contributor Author

dimalit commented Apr 21, 2023

Key proposed changes to consensus:

  1. Do not hang, but return from ConsensusInterface::bootstrapAll() (lower priority issue).
  2. As proposed, exitGracefully() would possibly be called at any time moment (not just at time when block is under processing by skaled - as it was before), so maybe it requires some changes in consensus code.
  3. When need to exit, consensus just need to call ConsensusExtFace::terminateApplication() and RELY ON SKALED to terminate everything nicely (check if this is implemented in this way in consensus).
  4. If exitGracefully() call arrived while block N is under processing in skaled, consensus should prevent itself from proposing block N+1.

@dimalit
Copy link
Contributor Author

dimalit commented Apr 28, 2023

Intermediate testing results: consensus doesn't wait for block creation to exit.
Details:

  1. 4-node chain, node 4 already exited, node 3 has consensus "on pause".
  2. Exit was requested on node 3 after creation of block 12.
  3. Soon after exit request, skaled got transaction through JSON-RPC.
  4. Consensus started working on block 13, didn't wait for its completion, and exited in 6 sec
2023-04-28 20:46:41.669364   Block sealed #12 (#bbbcbc35…)
2023-04-28 20:46:41.669468   Block stats:BN:11:BTS:1682714777:TXS:0:HDRS:11:LOGS:40:SENGS:1:TXRS:1:BLCKS:3:ACCS:69:BQS:1:BDS:44:TSS:0:UTX:0:VTX:0:CMM:17846
2023-04-28 20:46:41.669823   Block 12 DB usage is 1104. Piece DB usage is 13957 bytes
2023-04-28 20:46:41.669857      Imported and best 0 (#12). Has 0 siblings.
2023-04-28 20:46:41.669989   Insterted block with 0 transactions
2023-04-28 20:46:41.670327   Post state changed.
2023-04-28 20:46:41.670369   noteChanged: {chain}
2023-04-28 20:46:41.670397   Total unsafe time so far = 0 seconds
2023-04-28 20:46:41.670424   Successfully imported 0 of 0 transactions

2023-04-28 20:46:41.670450    sent_to_consensus = 1 got_from_consensus = 1 m_transaction_cache = 0 m_tq = 0 m_bcast_counter = 1
2023-04-28 20:46:41.672053   TRACEPOINT fetch_transactions 148
2023-04-28 20:46:41.727946   http://127.0.0.1:36710 >>> {"id":34,"jsonrpc":"2.0","method":"eth_blockNumber","params":[]}
2023-04-28 20:46:41.728309   http://127.0.0.1:36710 <<< {"id":34,"jsonrpc":"2.0","result":"0xc"}
2023-04-28 20:46:41.772355   TRACEPOINT drop_bad_transactions 148
[2023-04-28 20:46:41.773] [3:main] [trace] 12:Read JSON header{"blockID":12,"nodeID":2,"schainID":1,"status":5,"substatus":3,"type":"BlckCatchupReq"}
[2023-04-28 20:46:41.773] [3:main] [debug] 12:Catchups: sChain->getCommittedBlockID() <= block_id(blockID)
[2023-04-28 20:46:41.773] [3:catchup] [debug] 12:Server step 2: sent catchup response header
[2023-04-28 20:46:41.773] [3:catchup] [debug] 12:Server step 3: response completed: no blocks sent
2023-04-28 20:46:41.780230   http://127.0.0.1:36710 >>> {"id":35,"jsonrpc":"2.0","method":"eth_blockNumber","params":[]}
2023-04-28 20:46:41.781051   http://127.0.0.1:36710 <<< {"id":35,"jsonrpc":"2.0","result":"0xc"}
2023-04-28 20:46:41.783572   Skaled status: setExitState: StartAgain to false
2023-04-28 20:46:41.783735   THREADS timer started
2023-04-28 20:46:41.784120   Skaled status: setExitState: StartFromSnapshot to false
2023-04-28 20:46:41.784214   SKALE NETWORK BROWSER refreshing(stop) did cleared/forgot dispatch/job
2023-04-28 20:46:41.823834   TRACEPOINT receive_transaction 2
2023-04-28 20:46:41.823877   m_received = 1

2023-04-28 20:46:41.824084   Mined gas: 1

2023-04-28 20:46:41.824305   Queued vaguely legit-looking transaction #6d67ce0a…
2023-04-28 20:46:41.824378   TRACEPOINT receive_transaction_success 2
2023-04-28 20:46:41.824404   Successfully received through broadcast 6d67ce0a70b37357e3af0dcfa2529f01d3996f0178dae1cbc89861252d4e1f16
2023-04-28 20:46:41.824391    sent_to_consensus = 1 got_from_consensus = 1 m_transaction_cache = 0 m_tq = 1 m_bcast_counter = 1
2023-04-28 20:46:41.824455   TRACEPOINT sent_txn_new 2
2023-04-28 20:46:41.824473   TRACEPOINT broadcast_already_have 2
2023-04-28 20:46:41.824500    sent_to_consensus = 1 got_from_consensus = 1 m_transaction_cache = 1 m_tq = 1 m_bcast_counter = 2
2023-04-28 20:46:41.824559   TRACEPOINT sent_txn 2
2023-04-28 20:46:41.824584   Sent txn: 6d67ce0a70b37357e3af0dcfa2529f01d3996f0178dae1cbc89861252d4e1f16

2023-04-28 20:46:41.824597   1 before exitGracefully()
2023-04-28 20:46:41.824607    sent_to_consensus = 2 got_from_consensus = 1 m_transaction_cache = 1 m_tq = 1 m_bcast_counter = 2
2023-04-28 20:46:41.824637   Exit requested with signal 15 and exit code 0
2023-04-28 20:46:41.824653   TRACEPOINT send_to_consensus 2
[2023-04-28 20:46:41.824] [config] [info] Consensus engine exiting: blocking exit exitGracefully called by skaled
2023-04-28 20:46:41.840783   Skaled status: setExitState: ClearDataDir to false
[2023-04-28 20:46:41.844] [3:pending] [trace] 12:Created proposal, transactions:1
[2023-04-28 20:46:41.844] [3:proposal] [trace] 12:addBlockProposal blockID_=13 proposerIndex=3
[2023-04-28 20:46:41.848] [3:main] [debug] 12:PROPOSING BLOCK NUMBER:13
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Adding sigshare
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 0: Starting block proposal
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 0: Starting block proposal
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 1: wrote proposal header
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 1: wrote proposal header
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"}
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 2: read proposal response
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"}
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal step 2: read proposal response
[2023-04-28 20:46:41.850] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10
[2023-04-28 20:46:41.886] [3:main] [trace] 12:Read JSON header{"blockID":13,"hash":"c6b29e06ebc1281109bcb6b852a6d48ba746c7e4e6fabc9dc20440a660a2a9f7","proposerIndex":1,"proposerNodeID":1,"schainID":1,"sig":"c6b29e06ebc1281109bcb6b852a6d48ba746c7e4e6fabc9dc20440a660a2a9f7","sr":"46320509353513273106582423493727320152202237096314791991810382902766530930981","status":5,"substatus":3,"timeStamp":1682714801,"timeStampMs":883,"txCount":1,"type":"BlckPrpslReq"}
[2023-04-28 20:46:41.889] [3:main] [debug] 12:Calculating missing hashes
[2023-04-28 20:46:41.890] [3:main] [debug] 12:Server: No missing partial hashes
[2023-04-28 20:46:41.890] [3:main] [debug] 12:Storing block proposal
[2023-04-28 20:46:41.890] [3:proposal] [trace] 12:addBlockProposal blockID_=13 proposerIndex=1
[2023-04-28 20:46:42.028] [3:main] [trace] 12:Read JSON header{"blockID":12,"nodeID":1,"schainID":1,"status":5,"substatus":3,"type":"BlckCatchupReq"}
[2023-04-28 20:46:42.028] [3:main] [debug] 12:Catchups: sChain->getCommittedBlockID() <= block_id(blockID)
[2023-04-28 20:46:42.028] [3:catchup] [debug] 12:Server step 2: sent catchup response header
[2023-04-28 20:46:42.028] [3:catchup] [debug] 12:Server step 3: response completed: no blocks sent
[2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 0: Starting block proposal
[2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 0: Starting block proposal
[2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 1: wrote proposal header
[2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 1: wrote proposal header
[2023-04-28 20:46:42.351] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"}
[2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal step 2: read proposal response
[2023-04-28 20:46:42.351] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10
[2023-04-28 20:46:42.352] [3:main] [trace] 12:Read JSON header{"status":1,"substatus":0,"type":"BlckPrpslRsp"}
[2023-04-28 20:46:42.352] [3:main] [trace] 12:Proposal step 2: read proposal response
[2023-04-28 20:46:42.352] [3:main] [trace] 12:Proposal step 3: sent partial hashes
[2023-04-28 20:46:42.353] [3:main] [trace] 12:Read JSON header{"count":0,"status":5,"substatus":3,"type":"MsngTxsReq"}
[2023-04-28 20:46:42.353] [3:proposal] [trace] 12:Push agent processed missing transactions header
[2023-04-28 20:46:42.353] [3:main] [trace] 12:Proposal complete::no missing transactions
[2023-04-28 20:46:42.353] [3:main] [trace] 12:Read JSON header{"pk":"","pks":"","sig":"8cc4ad67f57c2cf7ce383926fc0691edccf75cff723d276e5e797b20e85ebefe","sss":"a740d2fa53d554c6929aefe75357c54aba4cef8232f9ede2a18e35b12debb54e","status":0,"substatus":0,"type":"SigShareRsp"}
[2023-04-28 20:46:42.353] [3:main] [trace] 12:Adding sigshare
[2023-04-28 20:46:42.719] [3:consensus] [debug] 12:2ROUND_COMPLETE:BLOCK:11:ROUND:6
[2023-04-28 20:46:42.719] [3:consensus] [debug] 12::PROPOSER:2:DECIDED VALUE1:ROUND:6
[2023-04-28 20:46:42.719] [3:consensus] [debug] 12:Decided value: 1 for blockid:11 proposer:2
[2023-04-28 20:46:42.719] [3:consensus] [debug] 12:Got child message 11:2
[2023-04-28 20:46:42.851] [3:main] [trace] 12:Proposal step 0: Starting block proposal
[2023-04-28 20:46:42.852] [3:main] [trace] 12:Proposal step 1: wrote proposal header
[2023-04-28 20:46:42.852] [3:main] [trace] 12:Read JSON header{"status":3,"substatus":10,"type":"BlckPrpslRsp"}
[2023-04-28 20:46:42.852] [3:main] [trace] 12:Proposal step 2: read proposal response
[2023-04-28 20:46:42.852] [3:main] [trace] 12:Proposal Server terminated proposal push:3:10
2023-04-28 20:46:42.902553   2 after exitGracefully()
[2023-04-28 20:46:42.902] [config] [info] Consensus engine exiting: exitGracefullyAsync called by skaled
[2023-04-28 20:46:42.902] [config] [info] exitGracefullyAsync running
[2023-04-28 20:46:42.902] [config] [info] Node exit called
[2023-04-28 20:46:42.902] [config] [info] Node::exit() requested
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: close all sockets called
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: server conditional vars notified
[2023-04-28 20:46:42.902] [config] [trace] Notifying TCP cond93969017793424
[2023-04-28 20:46:42.902] [config] [trace] Notifying TCP cond93969017584192
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: agent conditional vars notified
[2023-04-28 20:46:42.902] [3:main] [info] 12:Consensus exiting: Node:Exit requested
[2023-04-28 20:46:42.902] [3:main] [info] 12:Consensus exiting: Node:Exit requested
[2023-04-28 20:46:42.902] [3:main] [info] 12:Consensus exiting: Node:Exit requested
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: block proposal socket touched
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: catchup socket touched
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: SGXZMQClient exiting
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: SGXZMQClient exited
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: closeAndCleanupAll consensus zmq sockets
[2023-04-28 20:46:42.902] [config] [info] Cleaning up ZMQ sockets
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: closing ZMQ send sockets
[2023-04-28 20:46:42.902] [config] [debug] main zmq debug in closeSend(): closing 140397045099856
[2023-04-28 20:46:42.902] [config] [debug] main zmq debug in closeSend(): closing 140397045122736
[2023-04-28 20:46:42.902] [config] [debug] main zmq debug in closeSend(): closing 140397045129312
[2023-04-28 20:46:42.902] [config] [info] consensus engine exiting: closing receive sockets
[2023-04-28 20:46:42.902] [config] [info] Closing ZMQ context
[2023-04-28 20:46:42.903] [3:net] [debug] 12:NtwkRdLoop zmq debug: closing = 140397179327456
[2023-04-28 20:46:42.903] [config] [info] Closed ZMQ
[2023-04-28 20:46:42.903] [config] [info] Node exit completed
[2023-04-28 20:46:43.353] [3:main] [info] 12:Consensus exiting: IO:Exit requested
[2023-04-28 20:46:43.610] [3:main] [info] 12:Exited Oracle worker thread 1
2023-04-28 20:46:43.904617   SKALE NETWORK BROWSER refreshing(stop) did cleared/forgot dispatch/job
2023-04-28 20:46:43.909974   Will stop WS server on address 127.0.0.3 and port 7003=0x00001b5b/nfo...
2023-04-28 20:46:43.910106   Will stop on port 7003=0x00001b5b...
2023-04-28 20:46:44.620251   OK, server stopped on port 7003=0x00001b5b
2023-04-28 20:46:44.620458   OK, stopped WS server on address 127.0.0.3 and port 7003=0x00001b5b/nfo
2023-04-28 20:46:44.620639   Will stop proxygen/HTTP server on address 127.0.0.3 and port 1234=0x000004d2/nfo...
2023-04-28 20:46:44.620843   OK, stopped proxygen/HTTP server on address 127.0.0.3 and port 1234=0x000004d2/nfo
2023-04-28 20:46:44.620953   Will stop proxygen/HTTP server on address 127.0.0.3 and port 1240=0x000004d8/std...
2023-04-28 20:46:44.621069   OK, stopped proxygen/HTTP server on address 127.0.0.3 and port 1240=0x000004d8/std
2023-04-28 20:46:44.622179   Skaled status: setSubsystemRunning: Rpc to false

[2023-04-28 20:46:46.373] [3:catchup] [debug] 12:Catchupc step 0: requesting blocks after 12
[2023-04-28 20:46:46.474] [3:main] [error] 12:!Exception: CatchupClientAgent:Connecting to dead node 4:Input/output error
2023-04-28 20:46:46.513963   3 after wait loop
2023-04-28 20:46:46.514116   4 before dtor
2023-04-28 20:46:46.514276   1 before exitGracefully()
2023-04-28 20:46:46.514358   Exit requested with signal 15 and exit code 0
[2023-04-28 20:46:46.514] [3:consensus] [info] 12:Consensus engine exiting: blocking exit exitGracefully called by skaled
2023-04-28 20:46:46.514436   Closing blockchain DB
2023-04-28 20:46:46.514898   Blockchain is closed
2023-04-28 20:46:46.515034   Worker stopping 4690 ms
2023-04-28 20:46:47.602574   2 after exitGracefully()
2023-04-28 20:46:47.602609   3 after wait loop
[2023-04-28 20:46:47.602] [config] [info] Consensus engine exiting: exitGracefullyAsync called by skaled
2023-04-28 20:46:47.602622   4 before dtor
2023-04-28 20:46:47.602656   Closing blockchain DB
2023-04-28 20:46:47.602687   Blockchain is closed
2023-04-28 20:46:47.602702   Skaled status: setSubsystemRunning: Blockchain to false

2023-04-28 20:46:47.602889   Closing blockchain DB
2023-04-28 20:46:47.602914   Blockchain is closed
2023-04-28 20:46:47.603046   Closing blockchain DB

@OlgaSkale OlgaSkale moved this from In Progress to Code Review in SKALE Engineering 🚀 May 4, 2023
@kladkogex kladkogex linked a pull request May 4, 2023 that will close this issue
@DmytroNazarenko DmytroNazarenko moved this from Code Review to In Progress in SKALE Engineering 🚀 May 12, 2023
@dimalit
Copy link
Contributor Author

dimalit commented May 15, 2023

related #1525

@dimalit
Copy link
Contributor Author

dimalit commented May 23, 2023

related #1532

@OlgaSkale OlgaSkale moved this from In Progress to Blocked in SKALE Engineering 🚀 May 25, 2023
@dimalit dimalit linked a pull request Jun 1, 2023 that will close this issue
@DmytroNazarenko DmytroNazarenko moved this from Blocked to In Progress in SKALE Engineering 🚀 Jun 16, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Ready For Release Candidate in SKALE Engineering 🚀 Jun 16, 2023
@DmytroNazarenko
Copy link
Collaborator

skaled: 3.17.0-beta.1

@DmytroNazarenko DmytroNazarenko moved this from Ready For Release Candidate to Merged To Release Candidate in SKALE Engineering 🚀 Jun 21, 2023
@EvgeniyZZ EvgeniyZZ moved this from Merged To Release Candidate to QA in SKALE Engineering 🚀 Jun 21, 2023
@oleksandrSydorenkoJ
Copy link

Part of #1348

@EvgeniyZZ EvgeniyZZ moved this from QA to Done in SKALE Engineering 🚀 Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
4 participants