Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full rework of the BlockFetch logic for bulk sync mode #1179

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

Niols
Copy link
Contributor

@Niols Niols commented Jul 3, 2024

Integrates a new implementation of the BulkSync mode, where blocks are downloaded from alternative peers as soon as the node has no more blocks to validate while there are longstanding requests in flight.

This PR depends on the new implementation of the BulkSync mode (IntersectMBO/ouroboros-network#4919). cabal.project is made to point to a back-port of the BulkSync implementation on ouroboros-network-0.16.1.1.

CSJ Changes

CSJ is involved because the new BulkSync mode requires to change the dynamo if it is also serving blocks, and it is not sending them promptly enough. The dynamo choice has an influence in the blocks that are chosen to be downloaded by BlockFetch.

For this sake, b93c379 gives the ability to order the ChainSync clients, so the dynamo role can be rotated among them whenever BlockFetch requests it.

b1c0bf8 provides the implementation of the rotation operation.

BlockFetch tests

c4bfa37 allows to specify in tests in which order to start the peers, which has an effect on what peer is chosen as initial dynamo.

c594c09 in turn adds a new BlockFetch test to show that syncing isn't slowed down by peers that don't send blocks.

Integration of BlockFetch changes

The collection of ChainSync client handles now needs to be passed between BlockFetch and ChainSync so dynamo rotations can be requested by BlockFetch.

The parameter bfcMaxConcurrencyBulkSync has been removed since blocks are not coordinated to be downloaded concurrently.

These changes are in 6926278.

ChainSel changes

Now BlockFetch requires the ability to detect if ChainSel has run out of blocks to validate. This motivates 73187ba, which implements a mechanism to measure if ChainSel is waiting for more blocks (starves), and determines for how long.

The above change is not sufficient to measure starvation. The queue to send blocks for validation used to allow only for one block to sit in the queue. This would interfere with the ability to measure starvation since BlockFetch would block waiting for the queue to become empty, and the queue would quickly become empty after taking just 1 block. For download delays to be amortized, a larger queue capacity was needed. This is the reason why a fix similar to IntersectMBO/ouroboros-network#2721 is part of 0d3fc28.

Miscellaneous fixes

CSJ jump size adjustment

When syncing from mainnet, we discovered that CSJ wouldn't sync the blocks from the Byron era. This was because the jump size was set to the length of the genesis window of the Shelley era, which is much larger than Byron's. When the jump size is larger than the genesis window, the dynamo will block on the forecast horizon before offering a jump that allows the chain selection to advance. In this case, CSJ and chain selection will deadlock.

For this reason we set the default jump size to the size of Byron's genesis window in 028883a. This didn't show an impact on syncing time in our measures. Future work (as part of deploying Genesis) might involve allowing the jump size to vary between different eras.

GDD rate limit

GDD evaluation showed an overhead of 10% if run after every header arrives via ChainSync. Therefore, in b7fa122 we limited how often it could run, so multiple header arrivals could be handled by a single GDD evaluation.

Candidate fragment comparison in the ChainSync client

We stumbled upon a test case where the candidate fragments of the dynamo and an objector were no longer than the current selection (both peers were adversarial). This was problematic because BlockFetch would refuse to download blocks from these candidates, and ChainSync in turn would wait for the selection to advance in order to download more headers.

The fix in e27a73c is to have the ChainSync client disconnect a peer which is about to block on the forecast horizon if its candidate isn't better than the selection.

Candidate fragment truncations

At the moment, it is possible for a candidate fragment to be truncated by CSJ when a jumper jumps to a point that is not younger than the tip of its current candidate fragment. We encountered tests where the jump point could be so old that it would fall behind the immutable tip, and GDD would ignore the peer when computing the Limit on Eagerness. This in turn would cause the selection to advance into potentially adversarial chains.

The fix in dc5f6f7 is to have GDD never drop candidates. When the candidate does not intersect the current selection, the LoE is not advanced. This is a situation guaranteed to be unblocked by the ChainSync client since it will either disconnect the peer or bring the candidate to intersect with the current selection.

@Niols Niols added the Genesis PRs related to Genesis testing and implementation label Jul 3, 2024
@amesgen amesgen force-pushed the blockfetch/milestone-1 branch from bab4a6a to 670ae32 Compare August 5, 2024 12:34
@amesgen amesgen changed the base branch from niols/blockfetch-leashing to main August 5, 2024 12:34
@facundominguez facundominguez force-pushed the blockfetch/milestone-1 branch 2 times, most recently from da90985 to b3434e8 Compare August 5, 2024 21:05
Copy link
Member

@dnadales dnadales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: please remove any pending REVIEW comments.

@amesgen amesgen force-pushed the blockfetch/milestone-1 branch from b3aa800 to 7d82cf1 Compare December 7, 2024 19:16
@amesgen amesgen requested a review from geo2a as a code owner December 7, 2024 19:16
@jasagredo
Copy link
Contributor

The releasing branch has been rebased on top of this one and there is now a test failure there. Is this expected? @Niols https://github.com/IntersectMBO/ouroboros-consensus/pull/1314/checks?check_run_id=34400813212

@Niols
Copy link
Contributor Author

Niols commented Dec 16, 2024

Looking at the failure case:

      time limited leashing attack:                              FAIL (6.48s)
        *** Failed! Falsified (after 203 tests and 12 shrinks):
        GenesisTest:
          gtSecurityParam: 1
          gtGenesisWindow: 8
          gtForecastRange: 8
          gtDelay: 0
          gtSlotLength: SlotLength 20s
          gtCSJParams: CSJParams {csjpJumpSize = SlotNo 8}
          gtChainSyncTimeouts: 
            canAwait = Nothing
            intersect = Just 10s
            mustReply = Nothing
            idle = Nothing
          gtBlockFetchTimeouts: 
            busy = Just 60s
            streaming = Just 60s
          gtLoPBucketParams: 
            lbpCapacity = 50 tokens
            lbpRate = 10 % 1 ≅ 10.00 tokens per second
          gtBlockTree:
            G | 1-3 2-7 3-11 4-15
            G | 1-3 2-5[4] 3-12 4-15 5-16 6-18
            G | 1-3 2-8[3] 3-18
            G | 1-0[1] 2-8 3-9 4-10
            slots:    0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
            trunk:  ───────────1───────────2───────────3───────────4
                               ╰─────2────────────────────3────────4──5─────6
                               ╰──────────────2─────────────────────────────3
                   ╰──1───────────────────────2──3──4
          gtSchedule:
            psSchedule =
               0: honest     : TP 4-15          | HP G           | BP G    @ 0.000000
               1: adversary  : TP 6-18[0,4,4x0] | HP G           | BP G    @ 0.000000
               2: adversary 2: TP 3-18[0,3,0]   | HP G           | BP G    @ 0.000000
               3: honest     : TP 4-15          | HP 3-11        | BP G    @ 0.019353
               4: honest     : TP 4-15          | HP 3-11        | BP 1-3  @ 0.032457
               5: honest     : TP 4-15          | HP 4-15        | BP 1-3  @ 0.040614
               6: adversary 2: TP 3-18[0,3,0]   | HP 2-8[0,3]    | BP G    @ 0.057771
               7: adversary  : TP 6-18[0,4,4x0] | HP 3-12[0,4,0] | BP G    @ 0.080588
               8: honest     : TP 4-15          | HP 4-15        | BP 2-7  @ 0.083783
               9: adversary 4: TP G             | HP 4-10[1,3x0] | BP G    @ 0.129071
              10: honest     : TP 4-15          | HP 4-15        | BP 4-15 @ 0.135964
            psStartOrder = []
            psMinEndTime = Time 77.1s
        SelectedChain: 1-3 | 2-7
        TipBlock: 2-7
        PeerSimulatorResults:
          - AdversarialPeer 1 (ChainSyncClient  - Interrupted) : DensityTooLow                                                                                                                    
          - AdversarialPeer 1 (ChainSyncServer  - Interrupted) : AsyncCancelled                                                                                                                   
          - AdversarialPeer 1 (BlockFetchClient - Interrupted) : AsyncCancelled                                                                                                                   
          - AdversarialPeer 1 (BlockFetchServer - Interrupted) : AsyncCancelled                                                                                                                   
          - AdversarialPeer 4 (ChainSyncClient  - Interrupted) : CandidateTooSparse Origin (Tip (SlotNo 3) (testHashFromList [0]) (BlockNo 1)) (Tip (SlotNo 0) (testHashFromList [1]) (BlockNo 1))
          - AdversarialPeer 4 (ChainSyncServer  - Interrupted) : AsyncCancelled                                                                                                                   
          - AdversarialPeer 4 (BlockFetchClient - Interrupted) : AsyncCancelled                                                                                                                   
          - AdversarialPeer 4 (BlockFetchServer - Interrupted) : AsyncCancelled                                                                                                                   
        
        
        Some peers were disconnected: adversary 4, adversary
        The immutable tip is too old: 12
        12 > 9
        Use --quickcheck-replay="(SMGen 17435934804358042701 8268706673786196345,2)" to reproduce.
        Use -p '/time limited leashing attack/' to rerun this test only.

The test looks reasonable, except k = 1? Not sure if that could lead to problems, or why such a case was generated. Somehow, despite k = 1, the client seems blocked on the fork between the honest chain and the chain served by adversary 2, unable to pass the intersection 1-3.

The genesis window is of size 8, the honest chain has 2 blocks in there while adversary 2 has only one. However, adversary 2 never serves its block at 3-18, so maybe that doesn't trigger the GDD? This should be covered by the LoP, in that case, because the adversary shouldn't be allowed to disarm it, and there is more than enough time. I would need to look at the full logs to see why that is not happening.

So either the test is flakey and we didn't catch that before and we're unlucky, or it's actually showing a subtle problem somewhere that we are not aware of, or maybe in the release branch some things changed in a way that impacts this. Will have a closer look.

@Niols
Copy link
Contributor Author

Niols commented Dec 16, 2024

Checked out this repo on branch neilmayhew/release-srp; got to a commit ef51726, then ran:

cabal run ouroboros-consensus-diffusion:consensus-test -- --quickcheck-replay="(SMGen 17435934804358042701 8268706673786196345,2)"

in the development environment and got:

Resolving dependencies...
Error: [Cabal-7107]
Could not resolve dependencies:
[__0] next goal: ouroboros-network (user goal)
[__0] rejecting: ouroboros-network-0.18.0.0 (constraint from user target requires ==0.16.1.1)
[__0] rejecting: ouroboros-network; 0.17.1.2, 0.17.1.1, 0.17.1.0, 0.17.0.0, 0.16.1.1, 0.16.1.0, 0.16.0.0, 0.15.0.0, 0.14.0.0, 0.13.1.0, 0.13.0.0, 0.12.0.0, 0.11.0.0, 0.10.2.2, 0.10.2.1, 0.10.1.0, 0.10.0.1, 0.10.0.0, 0.9.2.0, 0.9.1.0, 0.9.0.0, 0.8.2.0, 0.8.1.1, 0.8.1.0, 0.8.0.1, 0.8.0.0, 0.7.0.1, 0.7.0.0, 0.6.0.0, 0.5.0.0, 0.4.0.1, 0.4.0.0, 0.3.0.2, 0.3.0.1, 0.3.0.0, 0.2.0.0, 0.1.0.1, 0.1.0.0, 0.10.2.0 (constraint from user target requires ==0.18.0.0)
[__0] fail (backjumping, conflict set: ouroboros-network)
After searching the rest of the dependency tree exhaustively, these were the goals I've had most trouble fulfilling: ouroboros-network

I'm not super up-to-date on how to run this test suite, is there anything I'm missing here?

@Niols
Copy link
Contributor Author

Niols commented Dec 18, 2024

Since the previous messages, I have been in contact with @jasagredo on the side. We have tracked down this test failure and concluded the following two things:

  1. It is a bug in the peer simulator itself, and therefore a false positive — network safety is not threatened by this.
  2. It is just an extremely rare occurrence, which was present already in blockfetch/milestone-1 as of 7d82cf1.

For posterity, I will add some details in the following messages, but that was the gist of it. In the coming days, I will push a small fix for this peer simulator bug, probably just by re-enabling timeouts for the canAwait state of the ChainSync client. I will also try to measure the probability for this test failure to appear, so as to gain confidence that it is indeed fixed.

@Niols
Copy link
Contributor Author

Niols commented Dec 18, 2024

Now, on neilmayhew/release-srp as of ef51726, running

cabal run ouroboros-consensus-diffusion:consensus-test -- -p '/time limited leashing attack/' --quickcheck-replay="(SMGen 14174175684299620761 11115350297627667705,4)"

yields:

ouroboros-consensus
  Genesis tests
    uniform
      time limited leashing attack: FAIL (1.55s)
        *** Failed! Falsified (after 1 test and 21 shrinks):
        GenesisTest:
          gtSecurityParam: 2
          gtGenesisWindow: 5
          gtForecastRange: 5
          gtDelay: 1
          gtSlotLength: SlotLength 20s
          gtCSJParams: CSJParams {csjpJumpSize = SlotNo 5}
          gtChainSyncTimeouts: 
            canAwait = Nothing
            intersect = Just 10s
            mustReply = Nothing
            idle = Nothing
          gtBlockFetchTimeouts: 
            busy = Just 60s
            streaming = Just 60s
          gtLoPBucketParams: 
            lbpCapacity = 50 tokens
            lbpRate = 10 % 1 ≅ 10.00 tokens per second
          gtBlockTree:
            G | 1-0 2-1 3-4 4-5 5-6 6-7 7-9
            G | 1-0 2-2[1] 3-8 4-9
            slots:    0  1  2  3  4  5  6  7  8  9
            trunk:  ──1──2────────3──4──5──6─────7
                      ╰─────2─────────────────3──4
          gtSchedule:
            psSchedule =
              0: honest     : TP 7-9          | HP G          | BP G   @ 0.000000
              1: adversary 3: TP 4-9[0,1,2x0] | HP G          | BP G   @ 0.000000
              2: honest     : TP 7-9          | HP G          | BP 3-4 @ 0.029276
              3: honest     : TP 7-9          | HP 7-9        | BP 3-4 @ 0.040478
              4: adversary 3: TP 4-9[0,1,2x0] | HP 3-8[0,1,0] | BP G   @ 0.070241
              5: honest     : TP 7-9          | HP 7-9        | BP 7-9 @ 0.081851
            psStartOrder = []
            psMinEndTime = Time 61.7s
        SelectedChain: 1-0 | 2-1 3-4
        TipBlock: 3-4
        PeerSimulatorResults:
        
        
        No peers were disconnected
        The immutable tip is too old: 9
        9 > 7
        Use --quickcheck-replay="(SMGen 14174175684299620761 11115350297627667705,4)" to reproduce.
...and here are the full logs, obtained by enabling tracing.
GenesisTest:
    gtSecurityParam: 2
    gtGenesisWindow: 5
    gtForecastRange: 5
    gtDelay: 1
    gtSlotLength: SlotLength 20s
    gtCSJParams: CSJParams {csjpJumpSize = SlotNo 5}
    gtChainSyncTimeouts:
    canAwait = Nothing
    intersect = Just 10s
    mustReply = Nothing
    idle = Nothing
    gtBlockFetchTimeouts:
    busy = Just 60s
    streaming = Just 60s
    gtLoPBucketParams:
    lbpCapacity = 50 tokens
    lbpRate = 10 % 1 ≅ 10.00 tokens per second
    gtBlockTree:
    G | 1-0 2-1 3-4 4-5 5-6 6-7 7-9
    G | 1-0 2-2[1] 3-8 4-9
    slots:    0  1  2  3  4  5  6  7  8  9
    trunk:  ──1──2────────3──4──5──6─────7
                ╰─────2─────────────────3──4
    gtSchedule:
    psSchedule =
        0: honest     : TP 7-9          | HP G          | BP G   @ 0.000000
        1: adversary 3: TP 4-9[0,1,2x0] | HP G          | BP G   @ 0.000000
        2: honest     : TP 7-9          | HP G          | BP 3-4 @ 0.029276
        3: honest     : TP 7-9          | HP 7-9        | BP 3-4 @ 0.040478
        4: adversary 3: TP 4-9[0,1,2x0] | HP 3-8[0,1,0] | BP G   @ 0.070241
        5: honest     : TP 7-9          | HP 7-9        | BP 7-9 @ 0.081851
    psStartOrder = []
    psMinEndTime = Time 61.7s
00:00.000   Initiating node startup
00:00.000 BlockFetchLogic | PeersFetch []
00:00.000 GDD | Window: 0 -> 4
00:00.000       Selection: G
00:00.000       Candidates:
00:00.000       Candidate suffixes (bounds):
00:00.000       Density bounds:
00:00.000       New candidate tips:
00:00.000       Losing peers: []
00:00.000       Setting loeFrag: G
00:00.000   Node startup complete with selection G
Running point schedule ...
┌──────────────────────────────────────────────────────────────────────────────┐
└─ 00:00.000
Tick:
    number: 0
    duration: 0s
    peer: honest
    state: TP 7-9 | HP G | BP G
    current chain: G
    candidate fragment: Nothing
    jumping states:

            ChainSyncServer honest       | handling MsgFindIntersect
            ChainSyncServer honest       |   state is TP 7-9 | HP G | BP G
            ChainSyncServer honest       | done handling MsgFindIntersect
            ChainSyncServer honest       |   intersection found: G
            ChainSyncClient honest       | Found intersection at: G
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            ChainSyncClient honest       | Received instruction: RunNormally
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP G | BP G
            ChainSyncServer honest       |   last intersection is G
            ChainSyncServer honest       |   intersection is exactly our header point
            ChainSyncServer honest       |   cannot serve at this point; waiting for node state and starting again
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP G | BP G
            ChainSyncServer honest       |   last intersection is G
            ChainSyncServer honest       |   intersection is exactly our header point
            ChainSyncServer honest       |   cannot serve at this point; waiting for node state and starting again
            GDD | Window: 0 -> 4
                Selection: G
                Candidates:
                    honest: G
                    adversary 3: G
                Candidate suffixes (bounds):
                    honest: G
                Density bounds:
                    honest: 0/5[ ], point: G, latest: unknown
                New candidate tips:
                    honest: G
                    adversary 3: G
                Losing peers: []
                Setting loeFrag: G
            BlockFetchLogic | PeersFetch [TraceLabelPeer (HonestPeer 1) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible)]
┌──────────────────────────────────────────────────────────────────────────────┐
└─ 00:00.000
Tick:
    number: 1
    duration: 0.029276489858s
    peer: adversary 3
    state: TP 4-9[0,1,2x0] | HP G | BP G
    current chain: G
    candidate fragment: G
    jumping states:
    honest: Dynamo DynamoStarted G
    adversary 3: Jumper _ Happy FreshJumper Nothing

            ChainSyncServer adversary 3  | handling MsgFindIntersect
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP G | BP G
            ChainSyncServer adversary 3  | done handling MsgFindIntersect
            ChainSyncServer adversary 3  |   intersection found: G
            ChainSyncClient adversary 3  | Found intersection at: G
            ChainSyncClient adversary 3  | Waiting for next instruction from the jumping governor
            GDD | Window: 0 -> 4
                Selection: G
                Candidates:
                    honest: G
                    adversary 3: G
                Candidate suffixes (bounds):
                    honest: G
                    adversary 3: G
                Density bounds:
                    honest: 0/5[ ], point: G, latest: unknown
                    adversary 3: 0/5[ ], point: G, latest: unknown
                New candidate tips:
                    honest: G
                    adversary 3: G
                Losing peers: []
                Setting loeFrag: G
┌──────────────────────────────────────────────────────────────────────────────┐
└─ 00:00.029
Tick:
    number: 2
    duration: 0.011201112595s
    peer: honest
    state: TP 7-9 | HP G | BP 3-4
    current chain: G
    candidate fragment: G
    jumping states:
    honest: Dynamo DynamoStarted G
    adversary 3: Jumper _ Happy FreshJumper Nothing

            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP G | BP 3-4
            ChainSyncServer honest       |   last intersection is G
            ChainSyncServer honest       |   intersection is exactly our header point
            ChainSyncServer honest       |   cannot serve at this point; waiting for node state and starting again
┌──────────────────────────────────────────────────────────────────────────────┐
└─ 00:00.040
Tick:
    number: 3
    duration: 0.029763889298s
    peer: honest
    state: TP 7-9 | HP 7-9 | BP 3-4
    current chain: G
    candidate fragment: G
    jumping states:
    honest: Dynamo DynamoStarted G
    adversary 3: Jumper _ Happy FreshJumper Nothing

            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncServer honest       |   last intersection is G
            ChainSyncServer honest       |   intersection is before our header point
            ChainSyncServer honest       |   fragment ahead: G | 1-0 2-1 3-4 4-5 5-6 6-7 7-9
            ChainSyncServer honest       | done handling MsgRequestNext
            ChainSyncServer honest       |   gotta serve 1-0
            ChainSyncServer honest       |   tip is      7-9
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncServer honest       |   last intersection is 1-0
            ChainSyncClient honest       | Downloaded header: 1-0
            ChainSyncServer honest       |   intersection is before our header point
            ChainSyncServer honest       |   fragment ahead: 1-0 | 2-1 3-4 4-5 5-6 6-7 7-9
            ChainSyncServer honest       | done handling MsgRequestNext
            ChainSyncServer honest       |   gotta serve 2-1
            ChainSyncServer honest       |   tip is      7-9
            GDD | Window: 0 -> 4
                Selection: G
                Candidates:
                    honest: G
                    adversary 3: G
                Candidate suffixes (bounds):
                    honest: G
                    adversary 3: G
                Density bounds:
                    honest: 0/5[ ], point: G, latest: 0
                    adversary 3: 0/5[ ], point: G, latest: unknown
                New candidate tips:
                    honest: G
                    adversary 3: G
                Losing peers: []
                Setting loeFrag: G
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncClient honest       | Validated header: 1-0
            ChainSyncServer honest       |   last intersection is 2-1
            ChainSyncServer honest       |   intersection is before our header point
            ChainSyncServer honest       |   fragment ahead: 2-1 | 3-4 4-5 5-6 6-7 7-9
            ChainSyncClient honest       | Gave LoP token to 1-0 compared to BlockNo 0
            ChainSyncServer honest       | done handling MsgRequestNext
            ChainSyncServer honest       |   gotta serve 3-4
            ChainSyncServer honest       |   tip is      7-9
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            BlockFetchLogic | PeersFetch [TraceLabelPeer (HonestPeer 1) (Right [At (Block {blockPointSlot = SlotNo 0, blockPointHash = (testHashFromList [0])})]),TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible)]
            ChainSyncClient honest       | Received instruction: RunNormally
            BlockFetchServer honest      | handling BlockFetch
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | Starting batch for slice G | 1-0
            ChainSyncClient honest       | Downloaded header: 2-1
            BlockFetchServer honest      | done handling BlockFetch
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | Sending 1-0
            BlockFetchServer honest      | done handling SendBlocks
            ChainSyncClient honest       | Validated header: 2-1
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncClient honest       | Gave LoP token to 2-1 compared to BlockNo 1
            BlockFetchServer honest      | Batch is done
            BlockFetchServer honest      | done handling SendBlocks
            ChainDB                      | ChainSel starvation ended at 00:00.040 thanks to 1-0
            BlockFetchLogic | PeersFetch [TraceLabelPeer (HonestPeer 1) (Right [At (Block {blockPointSlot = SlotNo 1, blockPointHash = (testHashFromList [0,0])})]),TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible)]
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            ChainSyncClient honest       | Received instruction: RunNormally
            BlockFetchServer honest      | handling BlockFetch
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | Starting batch for slice 1-0 | 2-1
            BlockFetchServer honest      | done handling BlockFetch
            ChainSyncClient honest       | Downloaded header: 3-4
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | Sending 2-1
            BlockFetchServer honest      | done handling SendBlocks
            ChainDB                      | Current chain: G
            ChainSyncClient honest       | Validated header: 3-4
            ChainDB                      | LoE fragment: G
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | Batch is done
            ChainSyncClient honest       | Gave LoP token to 3-4 compared to BlockNo 2
            BlockFetchServer honest      | done handling SendBlocks
            ChainDB                      | Added to current chain; now: G | 1-0
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            ChainSyncClient honest       | Received instruction: RunNormally
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncServer honest       |   last intersection is 3-4
            ChainSyncServer honest       |   intersection is before our header point
            ChainSyncServer honest       |   fragment ahead: 3-4 | 4-5 5-6 6-7 7-9
            ChainSyncServer honest       | done handling MsgRequestNext
            ChainSyncServer honest       |   gotta serve 4-5
            ChainSyncServer honest       |   tip is      7-9
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncServer honest       |   last intersection is 4-5
            ChainSyncClient honest       | Downloaded header: 4-5
            ChainSyncServer honest       |   intersection is before our header point
            ChainSyncServer honest       |   fragment ahead: 4-5 | 5-6 6-7 7-9
            ChainDB                      | Current chain: G | 1-0
            ChainSyncServer honest       | done handling MsgRequestNext
            ChainDB                      | LoE fragment: G
            ChainSyncServer honest       |   gotta serve 5-6
            ChainSyncServer honest       |   tip is      7-9
            ChainSyncClient adversary 3  | Received instruction: JumpTo 3-4
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncClient adversary 3  | Offering jump to 3-4
            ChainSyncClient honest       | Validated header: 4-5
            ChainSyncServer honest       |   last intersection is 5-6
            ChainDB                      | Added to current chain; now: G | 1-0 2-1
            ChainSyncServer honest       |   intersection is before our header point
            ChainSyncServer honest       |   fragment ahead: 5-6 | 6-7 7-9
            ChainSyncServer honest       | done handling MsgRequestNext
            ChainSyncClient honest       | Gave LoP token to 4-5 compared to BlockNo 3
            ChainSyncServer honest       |   gotta serve 6-7
            ChainSyncServer honest       |   tip is      7-9
            ChainSyncServer adversary 3  | handling MsgFindIntersect
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP G | BP G
            ChainSyncServer adversary 3  | done handling MsgFindIntersect
            ChainSyncServer adversary 3  |   no intersection found
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            ChainDB                      | ChainSel starvation started at 00:00.040
            ChainSyncClient adversary 3  | Rejected jump to 3-4
            ChainSyncClient honest       | Received instruction: RunNormally
            ChainSyncClient adversary 3  | Waiting for next instruction from the jumping governor
            ChainSyncClient honest       | Downloaded header: 5-6
            ChainSyncClient adversary 3  | Received instruction: JumpTo 2-1
            ChainSyncClient adversary 3  | Offering jump to 2-1
            ChainSyncClient honest       | Validated header: 5-6
            ChainSyncServer adversary 3  | handling MsgFindIntersect
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP G | BP G
            ChainSyncServer adversary 3  | done handling MsgFindIntersect
            ChainSyncClient honest       | Gave LoP token to 5-6 compared to BlockNo 4
            ChainSyncServer adversary 3  |   no intersection found
            ChainSyncClient adversary 3  | Rejected jump to 2-1
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            ChainSyncClient adversary 3  | Waiting for next instruction from the jumping governor
            ChainSyncClient honest       | Received instruction: RunNormally
            ChainSyncClient adversary 3  | Received instruction: JumpTo 1-0
            ChainSyncClient adversary 3  | Offering jump to 1-0
            ChainSyncClient honest       | Downloaded header: 6-7
            ChainSyncServer adversary 3  | handling MsgFindIntersect
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP G | BP G
            ChainSyncServer adversary 3  | done handling MsgFindIntersect
            ChainSyncClient honest       | Waiting for SlotNo 7 beyond forecast horizon
            ChainSyncServer adversary 3  |   intersection found: 1-0
            ChainSyncClient adversary 3  | Accepted jump to 1-0
            ChainSyncClient adversary 3  | Waiting for next instruction from the jumping governor
            ChainSyncClient adversary 3  | Received instruction: JumpToGoodPoint 1-0
            ChainSyncClient adversary 3  | Offering jump to 1-0
            ChainSyncServer adversary 3  | handling MsgFindIntersect
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP G | BP G
            ChainSyncServer adversary 3  | done handling MsgFindIntersect
            ChainSyncServer adversary 3  |   intersection found: 1-0
            ChainSyncClient adversary 3  | Accepted jump to good point: 1-0
            ChainSyncClient adversary 3  | Waiting for next instruction from the jumping governor
            ChainSyncClient adversary 3  | Received instruction: RunNormally
            ChainSyncServer adversary 3  | handling MsgRequestNext
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP G | BP G
            ChainSyncServer adversary 3  |   last intersection is 1-0
            ChainSyncServer adversary 3  |   intersection is further than our header point
            ChainSyncServer adversary 3  |   cannot serve at this point; waiting for node state and starting again
            GDD | Window: 1 -> 5
                Selection: G | 1-0 2-1
                Candidates:
                    honest: 5-6
                    adversary 3: 1-0
                Candidate suffixes (bounds):
                    honest: 1-0 | 2-1 3-4 4-5
                    adversary 3: 1-0
                Density bounds:
                    honest: 3/3[+], point: 2-1, latest: 7, has header after sgen
                    adversary 3: 0/5[ ], point: 1-0, latest: 0
                New candidate tips:
                    honest: 5-6
                    adversary 3: 1-0
                Losing peers: []
                Setting loeFrag: 1-0
            BlockFetchLogic | PeersFetch [TraceLabelPeer (HonestPeer 1) (Right [At (Block {blockPointSlot = SlotNo 4, blockPointHash = (testHashFromList [0,0,0])}),At (Block {blockPointSlot = SlotNo 5, blockPointHash = (testHashFromList [0,0,0,0])}),At (Block {blockPointSlot = SlotNo 6, blockPointHash = (testHashFromList [0,0,0,0,0])})]),TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible)]
            ChainDB                      | Requested ChainSel run
            BlockFetchServer honest      | handling BlockFetch
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainDB                      | Current chain: G | 1-0 2-1
            BlockFetchServer honest      | Starting batch for slice 2-1 | 3-4 4-5 5-6
            ChainDB                      | LoE fragment: G | 1-0
            BlockFetchServer honest      | done handling BlockFetch
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | Sending 3-4
            BlockFetchServer honest      | done handling SendBlocks
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | BP is behind
            BlockFetchServer honest      |   cannot serve at this point; waiting for node state and starting again
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | BP is behind
            BlockFetchServer honest      |   cannot serve at this point; waiting for node state and starting again
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | BP is behind
            BlockFetchServer honest      |   cannot serve at this point; waiting for node state and starting again
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 3-4
            BlockFetchServer honest      | BP is behind
            BlockFetchServer honest      |   cannot serve at this point; waiting for node state and starting again
            ChainDB                      | Current chain: G | 1-0 2-1
            ChainDB                      | LoE fragment: G | 1-0
            ChainDB                      | Added to current chain; now: G | 1-0 2-1 3-4
            ChainSyncClient honest       | Accessing SlotNo 7, previously beyond forecast horizon
            ChainSyncClient honest       | Validated header: 6-7
            ChainSyncClient honest       | Gave LoP token to 6-7 compared to BlockNo 5
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            ChainSyncClient honest       | Received instruction: RunNormally
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncServer honest       |   last intersection is 6-7
            ChainSyncServer honest       |   intersection is before our header point
            ChainSyncServer honest       |   fragment ahead: 6-7 | 7-9
            ChainSyncServer honest       | done handling MsgRequestNext
            ChainSyncServer honest       |   gotta serve 7-9
            ChainSyncServer honest       |   tip is      7-9
            ChainSyncClient honest       | Downloaded header: 7-9
            ChainSyncClient honest       | Validated header: 7-9
            ChainSyncClient honest       | Gave LoP token to 7-9 compared to BlockNo 6
            ChainSyncClient honest       | Waiting for next instruction from the jumping governor
            ChainSyncClient honest       | Received instruction: RunNormally
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
            ChainSyncServer honest       |   last intersection is 7-9
            ChainSyncServer honest       |   chain has been fully served
            ChainSyncServer honest       | done handling MsgRequestNext
            GDD | Window: 1 -> 5
                Selection: 1-0 | 2-1 3-4
                Candidates:
                    honest: 7-9
                    adversary 3: 1-0
                Candidate suffixes (bounds):
                    honest: 1-0 | 2-1 3-4 4-5
                    adversary 3: 1-0
                Density bounds:
                    honest: 3/3[+], point: 2-1, latest: 9, has header after sgen, idling
                    adversary 3: 0/5[ ], point: 1-0, latest: 0
                New candidate tips:
                    honest: 7-9
                    adversary 3: 1-0
                Losing peers: []
                Setting loeFrag: 1-0
            BlockFetchLogic | PeersFetch [TraceLabelPeer (HonestPeer 1) (Right [At (Block {blockPointSlot = SlotNo 7, blockPointHash = (testHashFromList [0,0,0,0,0,0])}),At (Block {blockPointSlot = SlotNo 9, blockPointHash = (testHashFromList [0,0,0,0,0,0,0])})]),TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible)]
┌──────────────────────────────────────────────────────────────────────────────┐
└─ 00:00.070
Tick:
    number: 4
    duration: 0.011609304825s
    peer: adversary 3
    state: TP 4-9[0,1,2x0] | HP 3-8[0,1,0] | BP G
    current chain: 1-0 | 2-1 3-4
    candidate fragment: G | 1-0
    jumping states:
    honest: Disengaged DisengagedDone
    adversary 3: Dynamo 1-0 SlotNo 0

            ChainSyncServer adversary 3  | handling MsgRequestNext
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP 3-8[0,1,0] | BP G
            ChainSyncServer adversary 3  |   last intersection is 1-0
            ChainSyncServer adversary 3  |   intersection is before our header point
            ChainSyncServer adversary 3  |   fragment ahead: 1-0 | 2-2[1] 3-8
            ChainSyncServer adversary 3  | done handling MsgRequestNext
            ChainSyncServer adversary 3  |   gotta serve 2-2[0,1]
            ChainSyncServer adversary 3  |   tip is      4-9[0,1,2x0]
            ChainSyncServer adversary 3  | handling MsgRequestNext
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP 3-8[0,1,0] | BP G
            ChainSyncServer adversary 3  |   last intersection is 2-2[0,1]
            ChainSyncClient adversary 3  | Downloaded header: 2-2[0,1]
            ChainSyncServer adversary 3  |   intersection is before our header point
            ChainSyncServer adversary 3  |   fragment ahead: 2-2[0,1] | 3-8
            ChainSyncServer adversary 3  | done handling MsgRequestNext
            ChainSyncServer adversary 3  |   gotta serve 3-8[0,1,0]
            ChainSyncServer adversary 3  |   tip is      4-9[0,1,2x0]
            GDD | Window: 1 -> 5
                Selection: 1-0 | 2-1 3-4
                Candidates:
                    honest: 7-9
                    adversary 3: 1-0
                Candidate suffixes (bounds):
                    honest: 1-0 | 2-1 3-4 4-5
                    adversary 3: 1-0
                Density bounds:
                    honest: 3/3[+], point: 2-1, latest: 9, has header after sgen, idling
                    adversary 3: 0/5[ ], point: 1-0, latest: 2
                New candidate tips:
                    honest: 7-9
                    adversary 3: 1-0
                Losing peers: []
                Setting loeFrag: 1-0
            ChainSyncServer adversary 3  | handling MsgRequestNext
            ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP 3-8[0,1,0] | BP G
            ChainSyncClient adversary 3  | Validated header: 2-2[0,1]
            ChainSyncServer adversary 3  |   last intersection is 3-8[0,1,0]
            ChainSyncServer adversary 3  |   intersection is exactly our header point
            ChainSyncServer adversary 3  |   cannot serve at this point; waiting for node state and starting again
            ChainSyncClient adversary 3  | Gave LoP token to 2-2[0,1] compared to BlockNo 1
            BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineInFlightThisPeer)]
            ChainSyncClient adversary 3  | Waiting for next instruction from the jumping governor
            ChainSyncClient adversary 3  | Received instruction: JumpToGoodPoint 1-0
┌──────────────────────────────────────────────────────────────────────────────┐
└─ 00:00.081
Tick:
    number: 5
    duration: 0.1s
    peer: honest
    state: TP 7-9 | HP 7-9 | BP 7-9
    current chain: 1-0 | 2-1 3-4
    candidate fragment: 1-0 | 2-1 3-4 4-5 5-6 6-7 7-9
    jumping states:
    honest: Disengaged DisengagedDone
    adversary 3: Dynamo DynamoStarted SlotNo 0

            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 7-9
            ChainSyncServer honest       | handling MsgRequestNext
            ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 7-9
            BlockFetchServer honest      | Sending 4-5
            ChainSyncServer honest       |   last intersection is 7-9
            BlockFetchServer honest      | done handling SendBlocks
            ChainSyncServer honest       |   chain has been fully served
            ChainSyncServer honest       | done handling MsgRequestNext
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 7-9
            BlockFetchServer honest      | Sending 5-6
            BlockFetchServer honest      | done handling SendBlocks
            ChainDB                      | ChainSel starvation ended at 00:00.081 thanks to 4-5
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 7-9
            BlockFetchServer honest      | Batch is done
            BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineInFlightThisPeer)]
            BlockFetchServer honest      | done handling SendBlocks
            BlockFetchServer honest      | handling BlockFetch
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 7-9
            BlockFetchServer honest      | Starting batch for slice 5-6 | 6-7 7-9
            BlockFetchServer honest      | done handling BlockFetch
            BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineInFlightThisPeer)]
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 7-9
            BlockFetchServer honest      | Sending 6-7
            BlockFetchServer honest      | done handling SendBlocks
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 7-9
            BlockFetchServer honest      | Sending 7-9
            BlockFetchServer honest      | done handling SendBlocks
            ChainDB                      | Current chain: 1-0 | 2-1 3-4
            BlockFetchServer honest      | handling SendBlocks
            BlockFetchServer honest      |   state is TP 7-9 | HP 7-9 | BP 7-9
            ChainDB                      | LoE fragment: 1-0
            BlockFetchServer honest      | Batch is done
            BlockFetchServer honest      | done handling SendBlocks
            ChainDB                      | Current chain: 1-0 | 2-1 3-4
            ChainDB                      | LoE fragment: 1-0
            ChainDB                      | Current chain: 1-0 | 2-1 3-4
            ChainDB                      | LoE fragment: 1-0
            ChainDB                      | Current chain: 1-0 | 2-1 3-4
            ChainDB                      | LoE fragment: 1-0
            ChainDB                      | ChainSel starvation started at 00:00.081
            BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineAlreadyFetched)]
00:10.081 BlockFetchLogic | PeerStarvedUs (HonestPeer 1)
00:10.081 BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineAlreadyFetched)]
00:20.081 BlockFetchLogic | PeerStarvedUs (HonestPeer 1)
00:20.081 BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineAlreadyFetched)]
00:30.081 BlockFetchLogic | PeerStarvedUs (HonestPeer 1)
00:30.081 BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineAlreadyFetched)]
00:40.081 BlockFetchLogic | PeerStarvedUs (HonestPeer 1)
00:40.081 BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineAlreadyFetched)]
00:50.081 BlockFetchLogic | PeerStarvedUs (HonestPeer 1)
00:50.081 BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineAlreadyFetched)]
01:00.081 BlockFetchLogic | PeerStarvedUs (HonestPeer 1)
01:00.081 BlockFetchLogic | PeersFetch [TraceLabelPeer (AdversarialPeer 3) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (HonestPeer 1) (Left FetchDeclineAlreadyFetched)]
╶──────────────────────────────────────────────────────────────────────────────╴
Finished running point schedule
SelectedChain: 1-0 | 2-1 3-4
TipBlock: 3-4
PeerSimulatorResults:


No peers were disconnected
The immutable tip is too old: 9
9 > 7
Use --quickcheck-replay="(SMGen 14174175684299620761 11115350297627667705,4)" to reproduce.

The story is the following:

  • Two peers, honest and adversary.
  • Honest serves its chain entirely, while the adversary only serves some of its headers.
  • The honest peer is chosen as the dynamo for CSJ, which we can see in tick 1:
    jumping states:
        honest: Dynamo DynamoStarted G
        adversary 3: Jumper _ Happy FreshJumper Nothing
    
  • The honest peer serves all of its chain, and therefore gets disengaged from CSJ. We can see that in tick 3:
    ChainSyncServer honest       | handling MsgRequestNext
    ChainSyncServer honest       |   state is TP 7-9 | HP 7-9 | BP 3-4
    ChainSyncServer honest       |   last intersection is 7-9
    ChainSyncServer honest       |   chain has been fully served
    ChainSyncServer honest       | done handling MsgRequestNext
    
    and in the begining of tick 4:
    jumping states:
      honest: Disengaged DisengagedDone
      adversary 3: Dynamo 1-0 SlotNo 0
    
  • In fact, at this point, the adversary is just starting as a dynamo, and we first have to reset the state of its ChainSync client to the known intersection. This happens in the same tick (4):
    ChainSyncClient adversary 3  | Waiting for next instruction from the jumping governor
    ChainSyncClient adversary 3  | Received instruction: JumpToGoodPoint 1-0
    
  • And then... nothing, the ChainSync client goes quiet.

@jasagredo and I tracked it down, and it turns out that the ChainSyncClient for the adversary is stuck in the drainThePipe function. This is because its corresponding ScheduledChainSyncServer is just not answering the MsgRequestNext, by design: indeed, the next header to be served is ahead of the header point, so it cannot be served, but it is not the tip point either, so we cannot say MsgAwaitReply. We could actually see that in tick 4, slightly before the instruction to JumpToGoodPoint 1-0:

ChainSyncServer adversary 3  | handling MsgRequestNext
ChainSyncServer adversary 3  |   state is TP 4-9[0,1,2x0] | HP 3-8[0,1,0] | BP G
ChainSyncServer adversary 3  |   last intersection is 3-8[0,1,0]
ChainSyncServer adversary 3  |   intersection is exactly our header point
ChainSyncServer adversary 3  |   cannot serve at this point; waiting for node state and starting again

So far so good. The test fails because nothing unblocks the situation, so we stay stuck at this intersection forever. In fact, the ChainSync mini-protocol of the adversary stays in StCanAwait for ever. It should be dislodged from it by timeouts, but those have been disabled! (See the very beginning of the first output I pasted.)

Reminder of the ChainSync mini-protocol state machine

Now at this point I am not entirely sure whether there was a deep reason to disable those timeouts or if it was a bit of an oversight. Here is the corresponding test case in Uniform.hs:

-- | Test that the leashing attacks do not delay the immutable tip after. The
-- immutable tip needs to be advanced enough when the honest peer has offered
-- all of its ticks.
--
-- See Note [Leashing attacks]
prop_leashingAttackTimeLimited :: Property
prop_leashingAttackTimeLimited =
  forAllGenesisTest

    (disableCanAwaitTimeout . disableBoringTimeouts <$>
      genChains (QC.choose (1, 4)) `enrichedWith` genTimeLimitedSchedule
    )

    defaultSchedulerConfig
      { scTrace = True
      , scEnableLoE = True
      , scEnableLoP = True
      , scEnableCSJ = True
      , scEnableBlockFetchTimeouts = False
      }

    shrinkPeerSchedules

    theProperty

Note in particular the disableBoringTimeouts and disableCanAwaitTimeout. The latter is probably self-explanatory. The former is:

disableBoringTimeouts :: GenesisTest blk schedule -> GenesisTest blk schedule
disableBoringTimeouts gt =
    gt
      { gtChainSyncTimeouts =
          (gtChainSyncTimeouts gt)
            { mustReplyTimeout = Nothing
            , idleTimeout = Nothing
            }
      }

The conclusion is double:

  • I think there is a mix-up between canAwait and mustReply; at least, there was one in my head and it is consistent with this piece of code: it is important to disable the mustReply timeout because, in the context of the peer simulator tests, the chains are finite, so an honest node will serve all of it, then send MsgAwaitReply, and then will not be able to do anything else, and it should not get disconnected for this, of course. I think that's the sense of this disableCanAwaitTimeout function, but really it should be disableMustReplyTimeout (look at the state machine again if need be!). I suppose the mix-up comes from the fact that the important message is MsgAwaitReply. I think I might be behind this mistake. It is an easy fix; I will add some documentation in the process.

  • We cannot just disable the timeouts like this. In particular, the canAwait timeout must be enabled to a value that has an observable effect in the duration of the test (so maybe something like 10s is enough, maybe even much shorter than that). This part of the problem is slightly more delicate to fix, because there might be a reason why they were disabled in the first place, and the people that wrote this test are on PTO for the winter, and because choosing a good value for the timeouts, that is not too high but also doesn't just break good tests, is non-trivial. Still, I will whip something up!

@Niols
Copy link
Contributor Author

Niols commented Dec 19, 2024

Come to think of it, I think the mix-up is that “can await” sounds much nicer and less urgent than “must reply”, at least in my head, even though those terms don't per se carry any time meaning. In practice, in StCanAwait, we expect a very prompt message, but it can be a message that say that we will take more time (MsgAwaitReply), and in StMustReply, the message has to be a proper reply (MsgRoll*), but it can take much longer to come back.

@Niols
Copy link
Contributor Author

Niols commented Dec 19, 2024

For the probability of the test failure, I now measured over night 22 test failures over 2,000,000 test cases. This means that it is very unlikely to happen, but that if my fix does pass the same amount of test cases over the next night, then we can be pretty confident that it was a good fix.

Niols added a commit that referenced this pull request Dec 19, 2024
- Always disable `mustReplyTimeout`; explain why
- Always disable `idleTimeout`; explain why
- Keep the others by default in all the tests

This should fix the bug discussed in
#1179
Niols added a commit that referenced this pull request Dec 19, 2024
- Always disable `mustReplyTimeout`; explain why
- Always disable `idleTimeout`; explain why
- Keep the others by default in all the tests

This should fix the bug discussed in #1179
- Always disable `mustReplyTimeout`; explain why
- Always disable `idleTimeout`; explain why
- Keep the others by default in all the tests

This should fix the bug discussed in #1179
@Niols
Copy link
Contributor Author

Niols commented Dec 19, 2024

Approximate probability that the bug appears at least once after a certain amount of tests:

image

This is just f(x) = 100 * (1 - (1 - 22 / 2_000_000) ** x).

jasagredo pushed a commit that referenced this pull request Dec 26, 2024
- Always disable `mustReplyTimeout`; explain why
- Always disable `idleTimeout`; explain why
- Keep the others by default in all the tests

This should fix the bug discussed in #1179
lehins added a commit that referenced this pull request Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Genesis PRs related to Genesis testing and implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants