Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighthouse VC to Lodestar BN aggregated attestation errors #5553

Closed
nflaig opened this issue May 26, 2023 · 3 comments
Closed

Lighthouse VC to Lodestar BN aggregated attestation errors #5553

nflaig opened this issue May 26, 2023 · 3 comments
Labels
prio-high Resolve issues as soon as possible. scope-interop Issues that fix interop issues between Lodestar and CL, EL or tooling.
Milestone

Comments

@nflaig
Copy link
Member

nflaig commented May 26, 2023

Describe the bug

Lighthouse VC is not able to produce aggregate attestations

May 26 10:45:43.120 CRIT Error during attestation routine        slot: 6522826, committee_index: 19, error: "Some endpoints failed, num_failed: 1 http://consensus:5052/ => RequestFailed(\"Failed to produce an aggregate attestation: StatusCode(500)\")", service: attestation

On Lodestar BN there is the following error

May-26 10:45:43.119[rest]            error: Req req-1rji getAggregatedAttestation error  No attestation for slot=6522826 dataRoot=0xa609bb3a48bb75756afbd12c925314c8c274ecaa0ebd2ee5dfdee7ae8807306a
Error: No attestation for slot=6522826 dataRoot=0xa609bb3a48bb75756afbd12c925314c8c274ecaa0ebd2ee5dfdee7ae8807306a
at AttestationPool.getAggregate (file:///usr/app/packages/beacon-node/src/chain/opPools/attestationPool.ts:136:13)
at Object.getAggregatedAttestation (file:///usr/app/packages/beacon-node/src/api/impl/validator/index.ts:538:37)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at Object.handler (file:///usr/app/packages/api/src/utils/server/genericJsonServer.ts:45:23)

Expected behavior

Aggregated attestations should work when running Lighthouse VC with Lodestar BN. No errors should be logged on both services.

Steps to reproduce

  1. Run Lodestar BN
  2. Attach Lighthouse VC with active validators
  3. Observe logs on both services

Versions

  • Lodestar: v1.8.0 / v1.9.0-rc.0
  • Lighthouse: v4.1.0 / v4.2.0
@nflaig nflaig added the scope-interop Issues that fix interop issues between Lodestar and CL, EL or tooling. label May 26, 2023
@philknows philknows added the prio-high Resolve issues as soon as possible. label Aug 8, 2023
@philknows philknows added this to the v1.11.0 milestone Aug 8, 2023
@nflaig
Copy link
Member Author

nflaig commented Aug 9, 2023

Based on the logs it is not clear to me why getAggregatedAttestation fails, everything in the slot besides getting the aggregate attestation looks pretty healthy.

Possibly related to different attestation_data_root stored in cache compared to what Lighthouse VC passes as query parameter (Edit: After a bit more debugging this seems to be the issue), or something else we are handling different. As this happens consistently (100% of the time), it must be something deterministic and is likely unrelated to any timing / processing / lag issues.

getAggregate(slot: Slot, dataRoot: Root): phase0.Attestation {
const dataRootHex = toHexString(dataRoot);
const aggregate = this.attestationByRootBySlot.get(slot)?.get(dataRootHex);

Below are the logs from both Lodestar BN and Lighthouse VC for slot 7062817 (09/08/2023 10:43:47 UTC)

Lodestar BN

Import and processing of block

Aug-09 10:43:43.099[chain]         verbose: Running prepareForNextSlot nextEpoch=220714, prepareSlot=7062817, headSlot=7062816, headRoot=0x76296e1027bfc345be2e485863a440c67cb1c8eabc8c46103ac1e8f143ab550e, isEpochTransition=false
Aug-09 10:43:47.018[chain]         verbose: Clock slot slot=7062817
Aug-09 10:43:48.973[network]       verbose: Received gossip block slot=7062817, root=0x7d34…a9d0, curentSlot=7062817, peerId=16Uiu2HAmPkRtMYiQfDj2NHvX3sa1ZfNcGRjKehXmLBW9os1ZeTig, delaySec=1.88100004196167, recvToVal=0.09200000762939453
Aug-09 10:43:49.099[chain]         verbose: Transitioned gossip block slot=7062817, recvToTransition=0.21799993515014648
Aug-09 10:43:49.104[chain]         verbose: Verified block signatures slot=7062817, recvToSigVer=0.22300004959106445
Aug-09 10:43:49.104[chain]           debug: Persist block to hot DB slot=7062817, root=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0
Aug-09 10:43:49.165[sync]          verbose: Added unknown block to pendingBlocks root=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0
Aug-09 10:43:49.165[sync]          verbose: Downloading unknown block root=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0, pendingBlocks=1
Aug-09 10:43:49.285[chain]         verbose: Verified execution payload slot=7062817, recvToVerifiedExecPayload=0.40400004386901855
Aug-09 10:43:49.286[chain]         verbose: Added block to forkchoice and state cache slot=7062817, root=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0
Aug-09 10:43:49.399[chain]         verbose: New chain head slot=7062817, root=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0, delaySec=2.3989999294281006
Aug-09 10:43:49.401[chain]         verbose: Imported block slot=7062817, recvToImportedBlock=0.5199999809265137
Aug-09 10:43:49.401[chain]         verbose: Block processed slot=7062817, root=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0, delaySec=2.4010000228881836
Aug-09 10:43:49.452[rest]            debug: Req req-fii0 172.18.0.7 getBlockV2
Aug-09 10:43:49.471[rest]            debug: Res req-fii0 getBlockV2 - 200
Aug-09 10:43:50.041[sync]          verbose: Downloaded unknown block root=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0, pendingBlocks=1, parentInForkchoice=true
Aug-09 10:43:50.041[sync]          verbose: Avoid proposer boost for this block of known proposer blockSlot=7062817, blockRoot=0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0, proposerIndex=370689

Produce and submit attestation

Aug-09 10:43:51.010[rest]            debug: Req req-fii1 172.18.0.16 produceAttestationData
Aug-09 10:43:51.011[rest]            debug: Res req-fii1 produceAttestationData - 200
Aug-09 10:43:51.033[rest]            debug: Req req-fii2 172.18.0.16 submitPoolAttestations
Aug-09 10:43:51.160[vmon]            debug: Local validator published unaggregated attestation validatorIndex=xxx, slot=7062817, committeeIndex=xx, subnet=xx, sentPeers=5, delaySec=0.03399991989135742
Aug-09 10:43:51.162[rest]            debug: Res req-fii2 submitPoolAttestations - 200

Sync log

Aug-09 10:43:53.000[]                 info: Synced - slot: 7062817 - head: 0x7d34…a9d0 - exec-block: valid(17876827 0x7f96) - finalized: 0x351d…cf71:220711 - peers: 49

Get aggregate attestation (500 error)

Aug-09 10:43:55.001[rest]            debug: Req req-fii6 172.18.0.16 getAggregatedAttestation
Aug-09 10:43:55.001[rest]            error: Req req-fii6 getAggregatedAttestation error  No attestation for slot=7062817 dataRoot=0x4dbec5c57d8319d8f1bedf36418c851e261810f69eff757dee1b66c5a287125d
Error: No attestation for slot=7062817 dataRoot=0x4dbec5c57d8319d8f1bedf36418c851e261810f69eff757dee1b66c5a287125d
    at AttestationPool.getAggregate (file:///usr/app/packages/beacon-node/src/chain/opPools/attestationPool.ts:136:13)
    at Object.getAggregatedAttestation (file:///usr/app/packages/beacon-node/src/api/impl/validator/index.ts:602:47)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at Object.handler (file:///usr/app/packages/api/src/utils/server/genericJsonServer.ts:45:23)
Aug-09 10:43:55.002[rest]            debug: Res req-fii6 getAggregatedAttestation - 500

Based on validator monitor the submitted attestation was included on chain

Aug-09 10:43:55.321[vmon]            debug: Local validator attestation is included in AggregatedAndProof validatorIndex=xxx, slot=7062817, committeeIndex=xx
Aug-09 10:44:02.043[vmon]            debug: Local validator attestation is included in block validatorIndex=xxx, slot=7062817, committeeIndex=xx, inclusionDistance=1, correctHead=true, participants=347

Lighthouse VC

Publish attestation

Aug 09 10:43:51.161 INFO Successfully published attestations     type: unaggregated, slot: 7062817, committee_index: xx, head_block: 0x7d34f71fd09eb268d68e2735692026046328a19feb84096070d8f175b514a9d0, validator_indices: [xxx], count: 1, service: attestation

Failed to produce aggregate attestation

Aug 09 10:43:55.002 CRIT Error during attestation routine        slot: 7062817, committee_index: xx, error: "Some endpoints failed, num_failed: 1 http://consensus:5052/ => RequestFailed(\"Failed to produce an aggregate attestation: StatusCode(500)\")", service: attestation

Validator status log

Aug 09 10:43:53.001 INFO All validators active                   slot: 7062817, epoch: 220713, total_validators: x, active_validators: x, current_epoch_proposers: x, service: notifier

@nflaig
Copy link
Member Author

nflaig commented Sep 7, 2023

After further debugging this and looking in to LH code base, it looks like this is not an actual problem which would prevent Lodestar BN <> Lighthouse VC combinations from producing aggregates. Mixed client sim tests are passing #5940.

What's happening is that Lighthouse VC calls getAggregatedAttestation even if there is no aggregator for the slot.
https://github.com/sigp/lighthouse/blob/2841f60686d642fcc0785c884d43e34e47a800dc/validator_client/src/attestation_service.rs#L494-L518

This deviates from what Lodestar VC (and also other VCs) do, those exit the aggregation flow early if there is no aggregator

// No validator is aggregator, skip
if (duties.every(({selectionProof}) => selectionProof === null)) {
return;
}

The Lodestar BN caches previously submitted attestations (submitPoolAttestations) but only if there is an aggregator for the slot which is determined by subnet subscriptions.

if (network.shouldAggregate(subnet, slot)) {
const insertOutcome = chain.attestationPool.add(attestation, attDataRootHex);

if (aggregatorTracker.shouldAggregate(subnet, indexedAttestation.data.slot)) {
const insertOutcome = chain.attestationPool.add(attestation, attDataRootHex);

This results in an error when calling getAggregatedAttestation as Lodestar is not able to fulfill the request

throw Error(`No attestation for slot=${slot} dataRoot=${dataRootHex}`);

We could consider to throw an 404 ApiError instead of a 500 Error but during normal operation this error should never happen and we usually do not throw ApiError at a lower level in the stack.

To summarize:

  • not an actual problem, attestations and aggregates are produced correctly
  • error only happens with Lighthouse VC, all other VCs work as expected
  • ideally should be fixed on Lighthouse Validator client shouldn't request aggregate with no aggregators sigp/lighthouse#4712
  • there is no fix required on Lodestar side, it is expected that we can't fulfill the getAggregatedAttestation request
  • error verbosity could potentially be reduced on Lodestar

@nflaig
Copy link
Member Author

nflaig commented Sep 29, 2023

A fix will be released with Lighthouse v4.6.0 (next release)

bors bot pushed a commit to sigp/lighthouse that referenced this issue Oct 5, 2023
## Issue Addressed

Closes #4712

## Proposed Changes

Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm.

## Additional Info

Related issue ChainSafe/lodestar#5553
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
## Issue Addressed

Closes sigp#4712

## Proposed Changes

Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm.

## Additional Info

Related issue ChainSafe/lodestar#5553
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
## Issue Addressed

Closes sigp#4712

## Proposed Changes

Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm.

## Additional Info

Related issue ChainSafe/lodestar#5553
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this issue Jan 6, 2024
## Issue Addressed

Closes sigp#4712

## Proposed Changes

Exit aggregation step early if no validator is aggregator. This avoids an unnecessary request to the beacon node and more importantly fixes noisy errors if Lighthouse VC is used with other clients such as Lodestar and Prysm.

## Additional Info

Related issue ChainSafe/lodestar#5553
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-high Resolve issues as soon as possible. scope-interop Issues that fix interop issues between Lodestar and CL, EL or tooling.
Projects
None yet
Development

No branches or pull requests

2 participants