You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Under load, the cosmos and cosmic-swingset layers may block for a significant amount of time when processing blocks, in particular in the EndBlock (where swingset actually executes messages), and Commit (when the DB changes are large) phases.
It has been documented such loads negatively impact the performance of the chain. One of the earliest report is #5507, but more recently there has been numerous block production slowdowns where the chain appears to stall.
The surprising part is that on these slowdowns, our follower is able to execute and commit relatively quickly compared to the consensus block time: it might take 30s locally, yet the chain won't produce a new block for over 2 minutes. In theory once 67% of the voting power has processed the block, the chain should be able to make forward progress, however such a discrepancy in timing indicate something else may be at play here.
Furthermore, I have observed that in our instagoric networks, the non-primary nodes are not capable of making progress beyond the primary validator. This is possibly due to the these nodes not having direct p2p connectivity to the rest of the network, but instead being connected through the primary which is the one with public network connectivity.
There are a series of issues linked from cometbft/cometbft#3245 that seem to imply that the p2p layer is dependent on the layers above not blocking.
Description of the Design
Setup a 3 node chain as follow:
2 large / overprovisioned nodes A & C, which together have >= 67% of the voting power
1 resource constrained node B
A & C do not have direct connectivity, and can only connect to B
Place some load on the chain (real or synthetic)
We should verify that A & C clearly commit/vote on the block much faster than B. In that case we want to observe whether the chain makes forward progress as soon as A&C complete their block, or if B is somehow in the critical path.
Security Considerations
None, investigation only
Scaling Considerations
The chain should be able to make forward progress without slower nodes impeding as long as it has 67% of the voting power, regardless of the p2p topology of the nodes forming that voting power.
Test Plan
See above
Upgrade Considerations
None
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
Under load, the cosmos and cosmic-swingset layers may block for a significant amount of time when processing blocks, in particular in the EndBlock (where swingset actually executes messages), and Commit (when the DB changes are large) phases.
It has been documented such loads negatively impact the performance of the chain. One of the earliest report is #5507, but more recently there has been numerous block production slowdowns where the chain appears to stall.
The surprising part is that on these slowdowns, our follower is able to execute and commit relatively quickly compared to the consensus block time: it might take 30s locally, yet the chain won't produce a new block for over 2 minutes. In theory once 67% of the voting power has processed the block, the chain should be able to make forward progress, however such a discrepancy in timing indicate something else may be at play here.
Furthermore, I have observed that in our instagoric networks, the non-primary nodes are not capable of making progress beyond the primary validator. This is possibly due to the these nodes not having direct p2p connectivity to the rest of the network, but instead being connected through the primary which is the one with public network connectivity.
There are a series of issues linked from cometbft/cometbft#3245 that seem to imply that the p2p layer is dependent on the layers above not blocking.
Description of the Design
Setup a 3 node chain as follow:
We should verify that A & C clearly commit/vote on the block much faster than B. In that case we want to observe whether the chain makes forward progress as soon as A&C complete their block, or if B is somehow in the critical path.
Security Considerations
None, investigation only
Scaling Considerations
The chain should be able to make forward progress without slower nodes impeding as long as it has 67% of the voting power, regardless of the p2p topology of the nodes forming that voting power.
Test Plan
See above
Upgrade Considerations
None
The text was updated successfully, but these errors were encountered: