Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node notifier error headState does not exist #4523

Closed
philknows opened this issue Sep 8, 2022 · 6 comments
Closed

Node notifier error headState does not exist #4523

philknows opened this issue Sep 8, 2022 · 6 comments
Assignees

Comments

@philknows
Copy link
Member

philknows commented Sep 8, 2022

Is your feature request related to a problem? Please describe.

When the execution layer client is not synced up yet and running Lodestar, the user experiences a Node notifier error: headState does not exist and output journal entries stop. Which is not useful to the end user.

Describe the solution you'd like

Is there a better log output that is more relevant to a user of Lodestar? Can we output something that tells the user to wait until their EL client is finished syncing? Output resumes once the EL is synced, but during this time there is no context given to the user.

Additional context

This was noted when Somer was developing a Lodestar guide and provided this UX improvement suggestion for user friendliness.

UPDATE:
could be related to

@philknows philknows added the scope-ux Issues for CLI UX or general consumer UX. label Sep 8, 2022
@SomerEsat
Copy link

SomerEsat commented Sep 9, 2022

Follow up question - other Consensus clients will continue to sync against the beacon chain even if the Execution Client is not synced. Is this also the behavior of Lodestar? If not, it would be helpful (if possible) to have the beacon node sync against the Beacon Chain while waiting for the Execution Client to reach a synced state. If that is possible, the log output should indicate that beacon sync is proceeding, and the Execution Client is found, but not yet synced. Something like that...

@g11tech
Copy link
Contributor

g11tech commented Sep 9, 2022

@SomerEsat Lodestar should continue syncing even if your EC/EL is still syncing.
The error reported is not directly related with EC/EL sync. Our logging module: notifier every 12 seconds (slot duration) gets the head state of the chain to print out the nice logs, seems like that headState get is erroring.

This shouldn't happen and am curious as to how I can reproduce this.

@g11tech g11tech self-assigned this Sep 9, 2022
@g11tech
Copy link
Contributor

g11tech commented Sep 9, 2022

@SomerEsat by any luck were you running this with --logFile --logFileLevel debug, in that case it might give us a change to figure out what went wrong here though we might have a theory

Or if you are able to repro it, do let know the steps

@philknows philknows removed the scope-ux Issues for CLI UX or general consumer UX. label Sep 9, 2022
@philknows philknows changed the title Friendlier UX error log for: Node notifier error headState does not exist Node notifier error headState does not exist Sep 9, 2022
@philknows
Copy link
Member Author

@SomerEsat Lodestar should continue syncing even if your EC/EL is still syncing. The error reported is not directly related with EC/EL sync. Our logging module: notifier every 12 seconds (slot duration) gets the head state of the chain to print out the nice logs, seems like that headState get is erroring.

This shouldn't happen and am curious as to how I can reproduce this.

I confirmed that when you get a checkpoint state and Lodestar cannot peer for a sync, on the 96th skipped block, the notifier error of headState not existing stalls Lodestar.

Sep-09 19:14:23.029[]                 info: Lodestar network=goerli, version=v1.0.0/65b38ee, commit=65b38eeb2c523a32c396c6d636ea07e855241b92
Sep-09 19:14:23.077[db]               info: Connected to LevelDB database name=/data/lodestar/chain-db
Sep-09 19:14:24.088[]                 info: Fetching weak subjectivity state weakSubjectivityServerUrl=https://lodestar-goerli.chainsafe.io
Sep-09 19:14:30.310[]                 info: Download completed
Sep-09 19:14:39.227[]                 info: Initializing beacon state from anchor state slot=3853504, epoch=120422, stateRoot=0xb481bc9de94a7c408451ab7dd18c9203dd7b9d832776562a91d1aa5489b39a6f
Sep-09 19:14:51.955[metrics]          info: Starting metrics HTTP server port=8008, address=127.0.0.1
Sep-09 19:14:52.045[rest]             info: Started REST api server address=http://0.0.0.0:9596
Sep-09 19:14:52.559[network]          info: PeerId 16Uiu2HAmTk7d2aXA84iYXndPYHAcdnR3oktZioSfgRNADquag7Tu, Multiaddrs /ip4/127.0.0.1/tcp/9001,/ip4/192.168.68.103/tcp/9001
Sep-09 19:14:52.560[]                 warn: Low peer count peers=0
Sep-09 19:14:52.560[]                 info: Searching peers - peers: 0 - slot: 3853574 (skipped 70) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:14:53.999[]                 info: Searching peers - peers: 0 - slot: 3853574 (skipped 70) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:14:54.000[]                 info: Searching peers - peers: 0 - slot: 3853574 (skipped 70) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:15:00.011[chain]            info: Validated transition configuration with execution client terminalTotalDifficulty=0xa4a470, terminalBlockHash=0x0000000000000000000000000000000000000000000000000000000000000000, terminalBlockNumber=0x0
Sep-09 19:15:06.001[]                 info: Searching peers - peers: 0 - slot: 3853575 (skipped 71) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:15:18.000[]                 info: Searching peers - peers: 0 - slot: 3853576 (skipped 72) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:15:28.954[chain]           error: Block error slot=3853568 code=BLOCK_ERROR_PARENT_UNKNOWN, parentRoot=0x87e1f1b53f3c9ba51680919909d6d68a7c2a718afff66de9f6e9db2496af7d6e
Error: BLOCK_ERROR_PARENT_UNKNOWN
    at verifyBlocksSanityChecks (file:///usr/app/packages/beacon-node/src/chain/blocks/verifyBlocksSanityChecks.ts:68:15)
    at BeaconChain.processBlocks (file:///usr/app/packages/beacon-node/src/chain/blocks/index.ts:61:56)
    at JobItemQueue.BlockProcessor.jobQueue.JobItemQueue.maxLength [as itemProcessor] (file:///usr/app/packages/beacon-node/src/chain/blocks/index.ts:27:30)
    at Timeout.JobItemQueue.runJob [as _onTimeout] (file:///usr/app/packages/beacon-node/src/util/queue/itemQueue.ts:92:33)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7)
Sep-09 19:15:30.001[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853577 (skipped 73) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 6
Sep-09 19:15:42.331[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853578 (skipped 74) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 2
Sep-09 19:15:54.173[]                 warn: Low peer count peers=1
Sep-09 19:15:54.173[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853579 (skipped 75) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 1
Sep-09 19:16:06.001[]                 info: Searching peers - peers: 5 - slot: 3853580 (skipped 76) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:16:18.000[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853581 (skipped 77) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 6
Sep-09 19:16:30.001[]                 info: Searching peers - peers: 6 - slot: 3853582 (skipped 78) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:16:42.000[]                 info: Searching peers - peers: 3 - slot: 3853583 (skipped 79) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:16:54.001[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853584 (skipped 80) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 5
Sep-09 19:17:06.283[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853585 (skipped 81) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 8
Sep-09 19:17:18.000[]                 info: Searching peers - peers: 10 - slot: 3853586 (skipped 82) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:17:30.020[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853587 (skipped 83) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 8
Sep-09 19:17:42.000[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853588 (skipped 84) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 8
Sep-09 19:17:54.000[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853589 (skipped 85) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 7
Sep-09 19:18:06.000[]                 info: Searching peers - peers: 5 - slot: 3853590 (skipped 86) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:18:18.000[]                 info: Searching peers - peers: 5 - slot: 3853591 (skipped 87) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:18:30.001[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853592 (skipped 88) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 11
Sep-09 19:18:42.000[]                 info: Searching peers - peers: 11 - slot: 3853593 (skipped 89) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:18:54.000[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853594 (skipped 90) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 8
Sep-09 19:19:06.001[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853595 (skipped 91) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 12
Sep-09 19:19:18.000[]                 info: Searching peers - peers: 11 - slot: 3853596 (skipped 92) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420
Sep-09 19:19:29.999[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853597 (skipped 93) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 14
Sep-09 19:19:42.377[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853598 (skipped 94) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 15
Sep-09 19:19:54.000[]                 info: Syncing - ? left - 0.00 slots/s - slot: 3853599 (skipped 95) - head: 3853504 0x8a6c…3b40 - execution: syncing(0xadc1…141b) - finalized: 0x2563…a2da:120420 - peers: 15
Sep-09 19:20:06.001[]                error: Node notifier error  headState does not exist
Error: headState does not exist
    at BeaconChain.getHeadState (file:///usr/app/packages/beacon-node/src/chain/chain.ts:319:27)
    at runNodeNotifier (file:///usr/app/packages/beacon-node/src/node/notifier.ts:55:31)
Sep-09 19:20:11.338[network]          info: Subscribed gossip core topics
Sep-09 19:20:11.342[sync]             info: Subscribed gossip core topics

As you may hypothesized @g11tech , regen module doesn't generate state till a new blocks come in, so somehow we didn't have head state on 96th slot and then notifier crashed.

@philknows
Copy link
Member Author

Also note @g11tech this does not seem to be an issue in stable branch v1.0.0 commit 4fb7950

@twoeths
Copy link
Contributor

twoeths commented Sep 21, 2022

#4562 fixes the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants