Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sim tests failures #4777

Closed
nazarhussain opened this issue Nov 17, 2022 · 8 comments
Closed

Sim tests failures #4777

nazarhussain opened this issue Nov 17, 2022 · 8 comments

Comments

@nazarhussain
Copy link
Contributor

Describe the bug

Observed a few failures in the sim tests. This issue is to keep track of and collect all such edge cases.

Expected behavior

Sim tests should be very stable.

Steps to Reproduce

Run sim tests and be lucky ;)

@nazarhussain
Copy link
Contributor Author

#4774 (comment)

file:///home/runner/work/lodestar/lodestar/packages/utils/src/sleep.ts:24
      reject(new ErrorAborted());
             ^
ErrorAborted: Aborted 
    at AbortSignal.onAbort (file:///home/runner/work/lodestar/lodestar/packages/utils/src/sleep.ts:24:14)
    at AbortSignal.[nodejs.internal.kHybridDispatch] (node:internal/event_target:736:20)
    at AbortSignal.dispatchEvent (node:internal/event_target:678:26)
    at abortSignal (node:internal/abort_controller:292:10)
    at AbortController.abort (node:internal/abort_controller:323:5)
    at SimulationEnvironment.stop (file:///home/runner/work/lodestar/lodestar/packages/cli/test/utils/simulation/SimulationEnvironment.ts:210:29)
    at Timeout._onTimeout (file:///home/runner/work/lodestar/lodestar/packages/cli/test/utils/simulation/SimulationEnvironment.ts:140:20)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)

@nazarhussain
Copy link
Contributor Author

nazarhussain commented Nov 17, 2022

Could not complete the sync within expected time.

Simulation environment "multi-fork" is stopping: On timeout

https://github.com/ChainSafe/lodestar/actions/runs/3489274146/jobs/5839160874
debug-test-logs-cli (9).zip

Debugging
There was the following error when node tries to sync.

Eph 6/0 0.687[checkpoint-sync-node-cl-lodestar/network] �[36mverbose�[39m: peer connected peer=16...qSDmVg, direction=inbound, status=OPEN
...
Eph 6/0 0.896[checkpoint-sync-node-cl-lodestar/sync] �[34mdebug�[39m: Sync peer joined peer=16Uiu2HAmKN9tPKTJiK5fjdNfZwGXEXRCKhMoATDDnBg9z6qSDmVg, syncType=Finalized, startEpoch=0, targetSlot=32, targetRoot=0x339c82c03b151cf62089ee79d76b453dcea2ab275ffb9e3887c6c04de5bf4c54
Eph 6/0 0.897[checkpoint-sync-node-cl-lodestar/sync] �[34mdebug�[39m: SyncChain added syncType=Finalized, firstEpoch=0, targetSlot=32, targetRoot=0x339c82c03b151cf62089ee79d76b453dcea2ab275ffb9e3887c6c04de5bf4c54
Eph 6/0 0.897[checkpoint-sync-node-cl-lodestar/sync] �[34mdebug�[39m: SyncChain startSyncing localFinalizedEpoch=0, lastEpochWithProcessBlocks=0, targetSlot=32
Eph 6/0 0.900[checkpoint-sync-node-cl-lodestar/network] �[34mdebug�[39m: Req  dialing peer method=beacon_blocks_by_range, encoding=ssz_snappy, client=Unknown, peer=16...qSDmVg, requestId=1
...

2022-11-17T14:50:27.725Z libp2p:upgrader:error could not create new stream Error: stream ended before 1 bytes became available

Eph 6/0 1.060[range-sync-node-cl-lodestar/sync] �[36mverbose�[39m: Batch process error id=Finalized, startEpoch=0, status=Processing Cannot read properties of undefined (reading 'block')
TypeError: Cannot read properties of undefined (reading 'block')
    at assertLinearChainSegment (file:///home/runner/work/lodestar/lodestar/packages/beacon-node/src/chain/blocks/utils/chainSegment.ts:11:33)
    at BeaconChain.processBlocks (file:///home/runner/work/lodestar/lodestar/packages/beacon-node/src/chain/blocks/index.ts:57:5)
    at JobItemQueue.BlockProcessor.jobQueue.JobItemQueue.maxLength [as itemProcessor] (file:///home/runner/work/lodestar/lodestar/packages/beacon-node/src/chain/blocks/index.ts:27:30)
    at Timeout.JobItemQueue.runJob [as _onTimeout] (file:///home/runner/work/lodestar/lodestar/packages/beacon-node/src/util/queue/itemQueue.ts:92:33)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)
...
2022-11-17T14:50:28.095Z libp2p:tcp:listener socket error Error: write EPIPE
    at afterWriteDispatched (node:internal/stream_base_commons:160:15)
    at writeGeneric (node:internal/stream_base_commons:151:3)
    at Socket._writeGeneric (node:net:905:11)
    at Socket._write (node:net:917:8)
    at writeOrBuffer (node:internal/streams/writable:391:12)
    at _write (node:internal/streams/writable:332:10)
    at Socket.Writable.write (node:internal/streams/writable:336:10)
    at /home/runner/work/lodestar/lodestar/node_modules/stream-to-it/sink.js:71:20
    at sink (file:///home/runner/work/lodestar/lodestar/node_modules/@libp2p/tcp/src/socket-to-conn.ts:100:9) {
  errno: -32,
  code: 'EPIPE',
  syscall: 'write'
}

@dapplion
Copy link
Contributor

@nazarhussain The error message is extremely unhelpful here. The error should clearly say something like:

unknown-block sync test timeout after 3600 seconds, node head: 0 0x0000

@nazarhussain
Copy link
Contributor Author

nazarhussain commented Nov 17, 2022

@dapplion Yes that error message can be improved. We don't have a timeout on individual sync case, rather timeout on the whole tests suite. See the findings above. I never seen this error before Batch process error. Is this code related to the PR you are working?

@dapplion
Copy link
Contributor

@dapplion Yes that error message can be improved. We don't have a timeout on individual sync case, rather timeout on the whole tests suite. See the findings above. I never seen this error before Batch process error. Is this code related to the PR you are working?

Yes related, will investigate

@dapplion
Copy link
Contributor

dapplion commented Nov 17, 2022

Failed sim test on un-related PR

├────────── Errors (8) ──────────┤
├─ Slot: 55
├── Assertion: syncCommitteeParticipation
├─ Slot: 56
├── Assertion: attestationParticipation
├──── node has low avg sync committee participation for epoch. {"id":"node-1-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.84375,"expectedMinSyncParticipationRate":0.9}
├──── node has low avg sync committee participation for epoch. {"id":"node-2-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.84375,"expectedMinSyncParticipationRate":0.9}
├──── node has low avg sync committee participation for epoch. {"id":"node-3-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.84375,"expectedMinSyncParticipationRate":0.9}
├──── node has low avg sync committee participation for epoch. {"id":"node-4-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.84375,"expectedMinSyncParticipationRate":0.9}
├──── node has low participation rate on head. {"id":"node-1-cl-lodestar","epoch":6,"participation":0.78125,"expectedMinParticipationRate":0.8}
├──── node has low participation rate on head. {"id":"node-2-cl-lodestar","epoch":6,"participation":0.78125,"expectedMinParticipationRate":0.8}
├──── node has low participation rate on head. {"id":"node-3-cl-lodestar","epoch":6,"participation":0.78125,"expectedMinParticipationRate":0.8}
├──── node has low participation rate on head. {"id":"node-4-cl-lodestar","epoch":6,"participation":0.78125,"expectedMinParticipationRate":0.8}
error Command failed with exit code 1.

@nazarhussain
Copy link
Contributor Author

nazarhussain commented Nov 29, 2022

https://github.com/ChainSafe/lodestar/actions/runs/3572977414/jobs/6006501097

├────────── Errors (4) ──────────┤
├─ Slot: 55
├── Assertion: syncCommitteeParticipation
├──── node has low avg sync committee participation for epoch. {"id":"node-1-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.82421875,"expectedMinSyncParticipationRate":0.9}
├──── node has low avg sync committee participation for epoch. {"id":"node-2-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.82421875,"expectedMinSyncParticipationRate":0.9}
├──── node has low avg sync committee participation for epoch. {"id":"node-3-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.82421875,"expectedMinSyncParticipationRate":0.9}
├──── node has low avg sync committee participation for epoch. {"id":"node-4-cl-lodestar","epoch":6,"syncCommitteeParticipationAvg":0.82421875,"expectedMinSyncParticipationRate":0.9}


@dapplion
Copy link
Contributor

Closing issues for specific incidents on old versions of Lodestar, please re-open if it happens with latest stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants