feat: track syncing status and fetch duties on resynced #6995

nflaig · 2024-08-01T12:10:44Z

Motivation

Connected beacon node might be syncing or is offline on validator client startup or on start of new epoch, in both cases, we wait until the next epoch to fetch validator duties again. This is not ideal as we can miss attestation and sync committee duties for a whole epoch (does not apply to block duties as we fetch those each slot).

Since all duty apis will return a 503 if node is syncing, we can keep track of syncing status and when it changes from syncing to synced we can fetch validator duties again as the request is very likely to succeed if previous syncing status check had no issues.

Tracking the syncing status of connected beacon node via GET /eth/v1/node/syncing is practically free and can be called once a slot, similar to tracking beacon health (see #4939).

Description

Track syncing status of connected beacon node(s) and fetch duties on resynced
Add ssz support to getSyncingStatus api
Move beacon health metric to status tracker (from Track beacon health from vc #4939)
Add beacon health panel to validator client dashboard

Same reasoning from #4939 applies

Adds metric vc_beacon_health to track the health of the "current" connected beacon node. Note that retry logic of the multi-endpoint REST client apply, but should represent the current URL that the validator client is attempting to pull duties from.

In a multi node setup, we will be tracking the syncing status of the first node that returns a successful response which in case of the getSyncingStatus should be most of the time as the only error condition is due to network errors, e.g. due to node being offline. In my opinion, this behavior is good enough considering how our fallback logic works and in a multi node setup, it is much less likely that all connected nodes are syncing and obtaining duties is much more reliable.

Closes #6962

codecov · 2024-08-01T12:28:16Z

Codecov Report

Attention: Patch coverage is 89.81481% with 11 lines in your changes missing coverage. Please review.

Project coverage is 49.23%. Comparing base (be03ef1) to head (5e21e6a).
Report is 3 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #6995      +/-   ##
============================================
+ Coverage     49.10%   49.23%   +0.12%     
============================================
  Files           577      578       +1     
  Lines         37336    37426      +90     
  Branches       2139     2162      +23     
============================================
+ Hits          18334    18425      +91     
+ Misses        18963    18961       -2     
- Partials         39       40       +1

github-actions · 2024-08-01T14:42:19Z

Performance Report

✔️ no performance regression detected

Full benchmark results

Benchmark suite	Current: `4965e21`	Previous: `cf00c3f`	Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc	1.8987 ms/op	1.7436 ms/op	1.09
getPubkeys - validatorsArr - req 1000 vs - 250000 vc	42.177 us/op	39.748 us/op	1.06
BLS verify - blst	921.70 us/op	860.98 us/op	1.07
BLS verifyMultipleSignatures 3 - blst	1.3454 ms/op	1.2737 ms/op	1.06
BLS verifyMultipleSignatures 8 - blst	2.0715 ms/op	1.9901 ms/op	1.04
BLS verifyMultipleSignatures 32 - blst	4.7314 ms/op	4.4934 ms/op	1.05
BLS verifyMultipleSignatures 64 - blst	8.4482 ms/op	8.2804 ms/op	1.02
BLS verifyMultipleSignatures 128 - blst	16.003 ms/op	15.730 ms/op	1.02
BLS deserializing 10000 signatures	624.34 ms/op	572.28 ms/op	1.09
BLS deserializing 100000 signatures	6.2780 s/op	5.8852 s/op	1.07
BLS verifyMultipleSignatures - same message - 3 - blst	936.08 us/op	938.54 us/op	1.00
BLS verifyMultipleSignatures - same message - 8 - blst	1.1277 ms/op	1.0909 ms/op	1.03
BLS verifyMultipleSignatures - same message - 32 - blst	1.6537 ms/op	1.6668 ms/op	0.99
BLS verifyMultipleSignatures - same message - 64 - blst	2.4305 ms/op	2.4708 ms/op	0.98
BLS verifyMultipleSignatures - same message - 128 - blst	3.9819 ms/op	3.9666 ms/op	1.00
BLS aggregatePubkeys 32 - blst	17.182 us/op	17.346 us/op	0.99
BLS aggregatePubkeys 128 - blst	60.640 us/op	60.146 us/op	1.01
notSeenSlots=1 numMissedVotes=1 numBadVotes=10	69.664 ms/op	59.293 ms/op	1.17
notSeenSlots=1 numMissedVotes=0 numBadVotes=4	41.385 ms/op	60.301 ms/op	0.69
notSeenSlots=2 numMissedVotes=1 numBadVotes=10	30.668 ms/op	28.238 ms/op	1.09
getSlashingsAndExits - default max	75.384 us/op	62.650 us/op	1.20
getSlashingsAndExits - 2k	342.02 us/op	264.77 us/op	1.29
proposeBlockBody type=full, size=empty	4.9514 ms/op	5.2195 ms/op	0.95
isKnown best case - 1 super set check	827.00 ns/op	412.00 ns/op	2.01
isKnown normal case - 2 super set checks	790.00 ns/op	429.00 ns/op	1.84
isKnown worse case - 16 super set checks	783.00 ns/op	428.00 ns/op	1.83
InMemoryCheckpointStateCache - add get delete	6.0600 us/op	5.0300 us/op	1.20
validate api signedAggregateAndProof - struct	1.9941 ms/op	1.9233 ms/op	1.04
validate gossip signedAggregateAndProof - struct	1.5325 ms/op	1.7305 ms/op	0.89
validate gossip attestation - vc 640000	967.99 us/op	984.75 us/op	0.98
batch validate gossip attestation - vc 640000 - chunk 32	133.95 us/op	124.18 us/op	1.08
batch validate gossip attestation - vc 640000 - chunk 64	119.97 us/op	104.12 us/op	1.15
batch validate gossip attestation - vc 640000 - chunk 128	107.31 us/op	96.597 us/op	1.11
batch validate gossip attestation - vc 640000 - chunk 256	109.17 us/op	98.474 us/op	1.11
pickEth1Vote - no votes	981.01 us/op	785.36 us/op	1.25
pickEth1Vote - max votes	6.4770 ms/op	7.3717 ms/op	0.88
pickEth1Vote - Eth1Data hashTreeRoot value x2048	14.720 ms/op	12.029 ms/op	1.22
pickEth1Vote - Eth1Data hashTreeRoot tree x2048	24.909 ms/op	19.990 ms/op	1.25
pickEth1Vote - Eth1Data fastSerialize value x2048	390.44 us/op	391.45 us/op	1.00
pickEth1Vote - Eth1Data fastSerialize tree x2048	6.7435 ms/op	3.3766 ms/op	2.00
bytes32 toHexString	881.00 ns/op	654.00 ns/op	1.35
bytes32 Buffer.toString(hex)	473.00 ns/op	482.00 ns/op	0.98
bytes32 Buffer.toString(hex) from Uint8Array	653.00 ns/op	582.00 ns/op	1.12
bytes32 Buffer.toString(hex) + 0x	477.00 ns/op	465.00 ns/op	1.03
Object access 1 prop	0.38200 ns/op	0.32600 ns/op	1.17
Map access 1 prop	0.32200 ns/op	0.32700 ns/op	0.98
Object get x1000	4.9970 ns/op	4.9560 ns/op	1.01
Map get x1000	5.7160 ns/op	5.5980 ns/op	1.02
Object set x1000	40.554 ns/op	26.957 ns/op	1.50
Map set x1000	31.398 ns/op	19.279 ns/op	1.63
Return object 10000 times	0.30320 ns/op	0.28470 ns/op	1.06
Throw Error 10000 times	2.7763 us/op	2.6586 us/op	1.04
fastMsgIdFn sha256 / 200 bytes	2.0930 us/op	2.0630 us/op	1.01
fastMsgIdFn h32 xxhash / 200 bytes	500.00 ns/op	428.00 ns/op	1.17
fastMsgIdFn h64 xxhash / 200 bytes	479.00 ns/op	440.00 ns/op	1.09
fastMsgIdFn sha256 / 1000 bytes	6.2610 us/op	5.8250 us/op	1.07
fastMsgIdFn h32 xxhash / 1000 bytes	634.00 ns/op	515.00 ns/op	1.23
fastMsgIdFn h64 xxhash / 1000 bytes	536.00 ns/op	500.00 ns/op	1.07
fastMsgIdFn sha256 / 10000 bytes	52.210 us/op	50.146 us/op	1.04
fastMsgIdFn h32 xxhash / 10000 bytes	2.0400 us/op	1.8970 us/op	1.08
fastMsgIdFn h64 xxhash / 10000 bytes	1.3770 us/op	1.3160 us/op	1.05
send data - 1000 256B messages	14.060 ms/op	9.9564 ms/op	1.41
send data - 1000 512B messages	17.519 ms/op	12.961 ms/op	1.35
send data - 1000 1024B messages	28.265 ms/op	20.716 ms/op	1.36
send data - 1000 1200B messages	28.698 ms/op	23.749 ms/op	1.21
send data - 1000 2048B messages	32.877 ms/op	28.915 ms/op	1.14
send data - 1000 4096B messages	35.993 ms/op	26.574 ms/op	1.35
send data - 1000 16384B messages	71.163 ms/op	66.929 ms/op	1.06
send data - 1000 65536B messages	307.67 ms/op	255.81 ms/op	1.20
enrSubnets - fastDeserialize 64 bits	1.8230 us/op	1.1670 us/op	1.56
enrSubnets - ssz BitVector 64 bits	676.00 ns/op	522.00 ns/op	1.30
enrSubnets - fastDeserialize 4 bits	423.00 ns/op	347.00 ns/op	1.22
enrSubnets - ssz BitVector 4 bits	687.00 ns/op	560.00 ns/op	1.23
prioritizePeers score -10:0 att 32-0.1 sync 2-0	225.78 us/op	110.08 us/op	2.05
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25	146.09 us/op	153.41 us/op	0.95
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5	216.85 us/op	202.26 us/op	1.07
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75	471.24 us/op	336.79 us/op	1.40
prioritizePeers score 0:0 att 64-1 sync 4-1	929.68 us/op	428.10 us/op	2.17
array of 16000 items push then shift	1.3071 us/op	1.2217 us/op	1.07
LinkedList of 16000 items push then shift	12.387 ns/op	7.1930 ns/op	1.72
array of 16000 items push then pop	139.27 ns/op	94.571 ns/op	1.47
LinkedList of 16000 items push then pop	10.514 ns/op	6.2960 ns/op	1.67
array of 24000 items push then shift	2.2073 us/op	1.7983 us/op	1.23
LinkedList of 24000 items push then shift	7.2630 ns/op	6.6530 ns/op	1.09
array of 24000 items push then pop	116.66 ns/op	105.72 ns/op	1.10
LinkedList of 24000 items push then pop	6.2970 ns/op	6.6030 ns/op	0.95
intersect bitArray bitLen 8	5.6000 ns/op	5.5850 ns/op	1.00
intersect array and set length 8	41.207 ns/op	38.531 ns/op	1.07
intersect bitArray bitLen 128	26.876 ns/op	27.653 ns/op	0.97
intersect array and set length 128	600.28 ns/op	573.92 ns/op	1.05
bitArray.getTrueBitIndexes() bitLen 128	2.2240 us/op	1.2670 us/op	1.76
bitArray.getTrueBitIndexes() bitLen 248	3.0230 us/op	2.0280 us/op	1.49
bitArray.getTrueBitIndexes() bitLen 512	5.9680 us/op	3.7720 us/op	1.58
Buffer.concat 32 items	998.00 ns/op	978.00 ns/op	1.02
Uint8Array.set 32 items	1.2980 us/op	1.3480 us/op	0.96
Buffer.copy	1.5310 us/op	1.3870 us/op	1.10
Uint8Array.set - with subarray	1.9210 us/op	1.8820 us/op	1.02
Uint8Array.set - without subarray	1.4330 us/op	1.3200 us/op	1.09
getUint32 - dataview	424.00 ns/op	386.00 ns/op	1.10
getUint32 - manual	362.00 ns/op	316.00 ns/op	1.15
Set add up to 64 items then delete first	1.8025 us/op	1.7584 us/op	1.03
OrderedSet add up to 64 items then delete first	2.7994 us/op	2.7326 us/op	1.02
Set add up to 64 items then delete last	2.0326 us/op	2.0197 us/op	1.01
OrderedSet add up to 64 items then delete last	3.0486 us/op	3.1391 us/op	0.97
Set add up to 64 items then delete middle	2.0388 us/op	2.0463 us/op	1.00
OrderedSet add up to 64 items then delete middle	4.4405 us/op	4.5294 us/op	0.98
Set add up to 128 items then delete first	4.0220 us/op	3.9323 us/op	1.02
OrderedSet add up to 128 items then delete first	6.2126 us/op	5.8991 us/op	1.05
Set add up to 128 items then delete last	3.9165 us/op	3.9520 us/op	0.99
OrderedSet add up to 128 items then delete last	6.1842 us/op	6.0619 us/op	1.02
Set add up to 128 items then delete middle	3.8920 us/op	3.9273 us/op	0.99
OrderedSet add up to 128 items then delete middle	11.999 us/op	12.015 us/op	1.00
Set add up to 256 items then delete first	7.8773 us/op	7.6805 us/op	1.03
OrderedSet add up to 256 items then delete first	12.552 us/op	11.570 us/op	1.08
Set add up to 256 items then delete last	7.6197 us/op	7.6656 us/op	0.99
OrderedSet add up to 256 items then delete last	11.767 us/op	11.963 us/op	0.98
Set add up to 256 items then delete middle	7.6636 us/op	7.5982 us/op	1.01
OrderedSet add up to 256 items then delete middle	34.760 us/op	34.888 us/op	1.00
transfer serialized Status (84 B)	1.3650 us/op	1.4700 us/op	0.93
copy serialized Status (84 B)	1.1490 us/op	1.2800 us/op	0.90
transfer serialized SignedVoluntaryExit (112 B)	1.4330 us/op	1.4740 us/op	0.97
copy serialized SignedVoluntaryExit (112 B)	1.2150 us/op	1.2110 us/op	1.00
transfer serialized ProposerSlashing (416 B)	1.5450 us/op	2.0830 us/op	0.74
copy serialized ProposerSlashing (416 B)	1.4390 us/op	1.5450 us/op	0.93
transfer serialized Attestation (485 B)	1.5950 us/op	1.5900 us/op	1.00
copy serialized Attestation (485 B)	1.7820 us/op	2.0070 us/op	0.89
transfer serialized AttesterSlashing (33232 B)	1.9460 us/op	2.3020 us/op	0.85
copy serialized AttesterSlashing (33232 B)	4.4150 us/op	4.2880 us/op	1.03
transfer serialized Small SignedBeaconBlock (128000 B)	2.4990 us/op	2.7030 us/op	0.92
copy serialized Small SignedBeaconBlock (128000 B)	10.539 us/op	8.9260 us/op	1.18
transfer serialized Avg SignedBeaconBlock (200000 B)	3.3580 us/op	2.8990 us/op	1.16
copy serialized Avg SignedBeaconBlock (200000 B)	14.467 us/op	12.883 us/op	1.12
transfer serialized BlobsSidecar (524380 B)	3.5510 us/op	3.1640 us/op	1.12
copy serialized BlobsSidecar (524380 B)	76.830 us/op	111.86 us/op	0.69
transfer serialized Big SignedBeaconBlock (1000000 B)	3.2530 us/op	3.4050 us/op	0.96
copy serialized Big SignedBeaconBlock (1000000 B)	373.39 us/op	213.92 us/op	1.75
pass gossip attestations to forkchoice per slot	2.8215 ms/op	2.7562 ms/op	1.02
forkChoice updateHead vc 100000 bc 64 eq 0	415.27 us/op	373.41 us/op	1.11
forkChoice updateHead vc 600000 bc 64 eq 0	2.5460 ms/op	2.4783 ms/op	1.03
forkChoice updateHead vc 1000000 bc 64 eq 0	4.1336 ms/op	3.9976 ms/op	1.03
forkChoice updateHead vc 600000 bc 320 eq 0	2.4774 ms/op	2.6184 ms/op	0.95
forkChoice updateHead vc 600000 bc 1200 eq 0	2.5376 ms/op	2.6909 ms/op	0.94
forkChoice updateHead vc 600000 bc 7200 eq 0	3.0441 ms/op	2.7858 ms/op	1.09
forkChoice updateHead vc 600000 bc 64 eq 1000	9.8628 ms/op	9.5983 ms/op	1.03
forkChoice updateHead vc 600000 bc 64 eq 10000	9.7729 ms/op	9.5791 ms/op	1.02
forkChoice updateHead vc 600000 bc 64 eq 300000	11.621 ms/op	11.650 ms/op	1.00
computeDeltas 500000 validators 300 proto nodes	2.9023 ms/op	2.8879 ms/op	1.00
computeDeltas 500000 validators 1200 proto nodes	2.9413 ms/op	2.8895 ms/op	1.02
computeDeltas 500000 validators 7200 proto nodes	3.0262 ms/op	2.9670 ms/op	1.02
computeDeltas 750000 validators 300 proto nodes	4.3401 ms/op	4.4186 ms/op	0.98
computeDeltas 750000 validators 1200 proto nodes	4.2620 ms/op	4.4231 ms/op	0.96
computeDeltas 750000 validators 7200 proto nodes	4.4587 ms/op	4.3298 ms/op	1.03
computeDeltas 1400000 validators 300 proto nodes	8.0967 ms/op	8.1621 ms/op	0.99
computeDeltas 1400000 validators 1200 proto nodes	7.9743 ms/op	8.2419 ms/op	0.97
computeDeltas 1400000 validators 7200 proto nodes	7.7580 ms/op	8.2172 ms/op	0.94
computeDeltas 2100000 validators 300 proto nodes	12.239 ms/op	11.800 ms/op	1.04
computeDeltas 2100000 validators 1200 proto nodes	12.083 ms/op	12.169 ms/op	0.99
computeDeltas 2100000 validators 7200 proto nodes	11.782 ms/op	12.107 ms/op	0.97
altair processAttestation - 250000 vs - 7PWei normalcase	1.3263 ms/op	1.3827 ms/op	0.96
altair processAttestation - 250000 vs - 7PWei worstcase	2.0260 ms/op	2.1125 ms/op	0.96
altair processAttestation - setStatus - 1/6 committees join	63.388 us/op	62.779 us/op	1.01
altair processAttestation - setStatus - 1/3 committees join	120.66 us/op	125.61 us/op	0.96
altair processAttestation - setStatus - 1/2 committees join	184.80 us/op	185.85 us/op	0.99
altair processAttestation - setStatus - 2/3 committees join	244.10 us/op	240.21 us/op	1.02
altair processAttestation - setStatus - 4/5 committees join	357.10 us/op	362.56 us/op	0.98
altair processAttestation - setStatus - 100% committees join	423.79 us/op	462.03 us/op	0.92
altair processBlock - 250000 vs - 7PWei normalcase	3.2298 ms/op	4.1738 ms/op	0.77
altair processBlock - 250000 vs - 7PWei normalcase hashState	29.443 ms/op	26.617 ms/op	1.11
altair processBlock - 250000 vs - 7PWei worstcase	37.369 ms/op	29.701 ms/op	1.26
altair processBlock - 250000 vs - 7PWei worstcase hashState	66.746 ms/op	56.113 ms/op	1.19
phase0 processBlock - 250000 vs - 7PWei normalcase	1.4522 ms/op	1.3598 ms/op	1.07
phase0 processBlock - 250000 vs - 7PWei worstcase	22.223 ms/op	21.983 ms/op	1.01
altair processEth1Data - 250000 vs - 7PWei normalcase	245.63 us/op	250.28 us/op	0.98
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15	4.7270 us/op	5.1360 us/op	0.92
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219	18.389 us/op	18.976 us/op	0.97
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42	7.3840 us/op	6.8390 us/op	1.08
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18	2.9670 us/op	6.1510 us/op	0.48
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020	81.155 us/op	82.634 us/op	0.98
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777	502.60 us/op	755.61 us/op	0.67
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384	1.2072 ms/op	1.1475 ms/op	1.05
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384	691.48 us/op	690.80 us/op	1.00
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384	2.0551 ms/op	2.0423 ms/op	1.01
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384	1.1655 ms/op	1.2929 ms/op	0.90
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384	3.0172 ms/op	2.9412 ms/op	1.03
Tree 40 250000 create	182.68 ms/op	181.58 ms/op	1.01
Tree 40 250000 get(125000)	115.03 ns/op	117.83 ns/op	0.98
Tree 40 250000 set(125000)	544.35 ns/op	566.44 ns/op	0.96
Tree 40 250000 toArray()	10.344 ms/op	10.764 ms/op	0.96
Tree 40 250000 iterate all - toArray() + loop	10.142 ms/op	9.9010 ms/op	1.02
Tree 40 250000 iterate all - get(i)	43.490 ms/op	40.617 ms/op	1.07
MutableVector 250000 create	9.2372 ms/op	8.3181 ms/op	1.11
MutableVector 250000 get(125000)	5.6910 ns/op	5.7150 ns/op	1.00
MutableVector 250000 set(125000)	175.95 ns/op	150.62 ns/op	1.17
MutableVector 250000 toArray()	4.5161 ms/op	3.1379 ms/op	1.44
MutableVector 250000 iterate all - toArray() + loop	3.5159 ms/op	3.2909 ms/op	1.07
MutableVector 250000 iterate all - get(i)	1.5984 ms/op	1.3173 ms/op	1.21
Array 250000 create	3.3313 ms/op	2.4534 ms/op	1.36
Array 250000 clone - spread	1.4846 ms/op	1.2787 ms/op	1.16
Array 250000 get(125000)	0.59200 ns/op	0.57700 ns/op	1.03
Array 250000 set(125000)	0.60400 ns/op	0.58600 ns/op	1.03
Array 250000 iterate all - loop	78.997 us/op	76.539 us/op	1.03
effectiveBalanceIncrements clone Uint8Array 300000	17.040 us/op	14.037 us/op	1.21
effectiveBalanceIncrements clone MutableVector 300000	322.00 ns/op	309.00 ns/op	1.04
effectiveBalanceIncrements rw all Uint8Array 300000	170.90 us/op	166.02 us/op	1.03
effectiveBalanceIncrements rw all MutableVector 300000	60.340 ms/op	57.505 ms/op	1.05
phase0 afterProcessEpoch - 250000 vs - 7PWei	78.498 ms/op	75.612 ms/op	1.04
Array.fill - length 1000000	2.7812 ms/op	2.6632 ms/op	1.04
Array push - length 1000000	16.099 ms/op	14.748 ms/op	1.09
Array.get	0.26450 ns/op	0.26246 ns/op	1.01
Uint8Array.get	0.35147 ns/op	0.34034 ns/op	1.03
phase0 beforeProcessEpoch - 250000 vs - 7PWei	15.527 ms/op	14.922 ms/op	1.04
altair processEpoch - mainnet_e81889	341.22 ms/op	283.33 ms/op	1.20
mainnet_e81889 - altair beforeProcessEpoch	21.786 ms/op	18.691 ms/op	1.17
mainnet_e81889 - altair processJustificationAndFinalization	10.779 us/op	10.851 us/op	0.99
mainnet_e81889 - altair processInactivityUpdates	5.9546 ms/op	4.2819 ms/op	1.39
mainnet_e81889 - altair processRewardsAndPenalties	50.187 ms/op	55.831 ms/op	0.90
mainnet_e81889 - altair processRegistryUpdates	2.0740 us/op	2.1690 us/op	0.96
mainnet_e81889 - altair processSlashings	898.00 ns/op	712.00 ns/op	1.26
mainnet_e81889 - altair processEth1DataReset	769.00 ns/op	767.00 ns/op	1.00
mainnet_e81889 - altair processEffectiveBalanceUpdates	803.38 us/op	1.4627 ms/op	0.55
mainnet_e81889 - altair processSlashingsReset	1.3490 us/op	2.5690 us/op	0.53
mainnet_e81889 - altair processRandaoMixesReset	2.4240 us/op	3.3360 us/op	0.73
mainnet_e81889 - altair processHistoricalRootsUpdate	1.7670 us/op	874.00 ns/op	2.02
mainnet_e81889 - altair processParticipationFlagUpdates	2.6020 us/op	1.7670 us/op	1.47
mainnet_e81889 - altair processSyncCommitteeUpdates	770.00 ns/op	562.00 ns/op	1.37
mainnet_e81889 - altair afterProcessEpoch	78.355 ms/op	75.859 ms/op	1.03
capella processEpoch - mainnet_e217614	946.49 ms/op	1.2686 s/op	0.75
mainnet_e217614 - capella beforeProcessEpoch	72.943 ms/op	67.173 ms/op	1.09
mainnet_e217614 - capella processJustificationAndFinalization	20.829 us/op	12.228 us/op	1.70
mainnet_e217614 - capella processInactivityUpdates	14.422 ms/op	14.721 ms/op	0.98
mainnet_e217614 - capella processRewardsAndPenalties	243.07 ms/op	230.64 ms/op	1.05
mainnet_e217614 - capella processRegistryUpdates	14.294 us/op	11.349 us/op	1.26
mainnet_e217614 - capella processSlashings	909.00 ns/op	753.00 ns/op	1.21
mainnet_e217614 - capella processEth1DataReset	766.00 ns/op	735.00 ns/op	1.04
mainnet_e217614 - capella processEffectiveBalanceUpdates	16.087 ms/op	4.9544 ms/op	3.25
mainnet_e217614 - capella processSlashingsReset	4.8200 us/op	1.8150 us/op	2.66
mainnet_e217614 - capella processRandaoMixesReset	6.0550 us/op	3.1580 us/op	1.92
mainnet_e217614 - capella processHistoricalRootsUpdate	835.00 ns/op	798.00 ns/op	1.05
mainnet_e217614 - capella processParticipationFlagUpdates	1.6090 us/op	4.6230 us/op	0.35
mainnet_e217614 - capella afterProcessEpoch	249.89 ms/op	222.96 ms/op	1.12
phase0 processEpoch - mainnet_e58758	379.02 ms/op	381.52 ms/op	0.99
mainnet_e58758 - phase0 beforeProcessEpoch	98.788 ms/op	86.314 ms/op	1.14
mainnet_e58758 - phase0 processJustificationAndFinalization	25.507 us/op	12.350 us/op	2.07
mainnet_e58758 - phase0 processRewardsAndPenalties	21.713 ms/op	30.672 ms/op	0.71
mainnet_e58758 - phase0 processRegistryUpdates	8.8590 us/op	6.7100 us/op	1.32
mainnet_e58758 - phase0 processSlashings	1.0570 us/op	738.00 ns/op	1.43
mainnet_e58758 - phase0 processEth1DataReset	758.00 ns/op	675.00 ns/op	1.12
mainnet_e58758 - phase0 processEffectiveBalanceUpdates	733.81 us/op	1.1030 ms/op	0.67
mainnet_e58758 - phase0 processSlashingsReset	2.9540 us/op	2.2970 us/op	1.29
mainnet_e58758 - phase0 processRandaoMixesReset	6.0260 us/op	3.4210 us/op	1.76
mainnet_e58758 - phase0 processHistoricalRootsUpdate	885.00 ns/op	663.00 ns/op	1.33
mainnet_e58758 - phase0 processParticipationRecordUpdates	4.2090 us/op	2.5140 us/op	1.67
mainnet_e58758 - phase0 afterProcessEpoch	69.625 ms/op	64.703 ms/op	1.08
phase0 processEffectiveBalanceUpdates - 250000 normalcase	791.31 us/op	765.39 us/op	1.03
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5	1.3919 ms/op	1.1902 ms/op	1.17
altair processInactivityUpdates - 250000 normalcase	15.623 ms/op	16.962 ms/op	0.92
altair processInactivityUpdates - 250000 worstcase	15.989 ms/op	17.982 ms/op	0.89
phase0 processRegistryUpdates - 250000 normalcase	6.8720 us/op	5.1210 us/op	1.34
phase0 processRegistryUpdates - 250000 badcase_full_deposits	301.55 us/op	297.65 us/op	1.01
phase0 processRegistryUpdates - 250000 worstcase 0.5	109.39 ms/op	112.09 ms/op	0.98
altair processRewardsAndPenalties - 250000 normalcase	40.380 ms/op	31.379 ms/op	1.29
altair processRewardsAndPenalties - 250000 worstcase	41.583 ms/op	32.889 ms/op	1.26
phase0 getAttestationDeltas - 250000 normalcase	7.8181 ms/op	5.9886 ms/op	1.31
phase0 getAttestationDeltas - 250000 worstcase	8.4994 ms/op	5.7657 ms/op	1.47
phase0 processSlashings - 250000 worstcase	92.173 us/op	92.960 us/op	0.99
altair processSyncCommitteeUpdates - 250000	95.987 ms/op	104.59 ms/op	0.92
BeaconState.hashTreeRoot - No change	503.00 ns/op	506.00 ns/op	0.99
BeaconState.hashTreeRoot - 1 full validator	82.992 us/op	81.577 us/op	1.02
BeaconState.hashTreeRoot - 32 full validator	750.35 us/op	1.0764 ms/op	0.70
BeaconState.hashTreeRoot - 512 full validator	8.5602 ms/op	11.748 ms/op	0.73
BeaconState.hashTreeRoot - 1 validator.effectiveBalance	102.52 us/op	151.23 us/op	0.68
BeaconState.hashTreeRoot - 32 validator.effectiveBalance	1.6919 ms/op	2.0283 ms/op	0.83
BeaconState.hashTreeRoot - 512 validator.effectiveBalance	15.450 ms/op	24.255 ms/op	0.64
BeaconState.hashTreeRoot - 1 balances	72.886 us/op	115.50 us/op	0.63
BeaconState.hashTreeRoot - 32 balances	608.23 us/op	788.79 us/op	0.77
BeaconState.hashTreeRoot - 512 balances	5.8359 ms/op	6.4934 ms/op	0.90
BeaconState.hashTreeRoot - 250000 balances	119.70 ms/op	161.31 ms/op	0.74
aggregationBits - 2048 els - zipIndexesInBitList	23.874 us/op	22.017 us/op	1.08
byteArrayEquals 32	45.954 ns/op	46.234 ns/op	0.99
Buffer.compare 32	15.093 ns/op	15.582 ns/op	0.97
byteArrayEquals 1024	1.2462 us/op	1.2611 us/op	0.99
Buffer.compare 1024	23.215 ns/op	24.430 ns/op	0.95
byteArrayEquals 16384	19.630 us/op	19.971 us/op	0.98
Buffer.compare 16384	179.93 ns/op	202.45 ns/op	0.89
byteArrayEquals 123687377	150.75 ms/op	144.63 ms/op	1.04
Buffer.compare 123687377	6.8525 ms/op	5.3296 ms/op	1.29
byteArrayEquals 32 - diff last byte	48.941 ns/op	45.049 ns/op	1.09
Buffer.compare 32 - diff last byte	17.451 ns/op	14.810 ns/op	1.18
byteArrayEquals 1024 - diff last byte	1.3381 us/op	1.1928 us/op	1.12
Buffer.compare 1024 - diff last byte	22.902 ns/op	22.644 ns/op	1.01
byteArrayEquals 16384 - diff last byte	20.597 us/op	18.962 us/op	1.09
Buffer.compare 16384 - diff last byte	212.01 ns/op	158.34 ns/op	1.34
byteArrayEquals 123687377 - diff last byte	153.86 ms/op	150.15 ms/op	1.02
Buffer.compare 123687377 - diff last byte	7.0141 ms/op	3.8713 ms/op	1.81
byteArrayEquals 32 - random bytes	5.1030 ns/op	4.8720 ns/op	1.05
Buffer.compare 32 - random bytes	15.709 ns/op	16.258 ns/op	0.97
byteArrayEquals 1024 - random bytes	5.7370 ns/op	5.1000 ns/op	1.12
Buffer.compare 1024 - random bytes	17.493 ns/op	16.210 ns/op	1.08
byteArrayEquals 16384 - random bytes	5.1920 ns/op	4.8800 ns/op	1.06
Buffer.compare 16384 - random bytes	15.788 ns/op	16.665 ns/op	0.95
byteArrayEquals 123687377 - random bytes	8.0700 ns/op	7.7200 ns/op	1.05
Buffer.compare 123687377 - random bytes	20.710 ns/op	20.010 ns/op	1.03
regular array get 100000 times	32.489 us/op	30.694 us/op	1.06
wrappedArray get 100000 times	31.246 us/op	30.678 us/op	1.02
arrayWithProxy get 100000 times	9.5213 ms/op	9.7377 ms/op	0.98
ssz.Root.equals	43.352 ns/op	39.194 ns/op	1.11
byteArrayEquals	41.494 ns/op	42.095 ns/op	0.99
Buffer.compare	9.6730 ns/op	9.6560 ns/op	1.00
shuffle list - 16384 els	5.8532 ms/op	5.4517 ms/op	1.07
shuffle list - 250000 els	88.684 ms/op	78.043 ms/op	1.14
processSlot - 1 slots	19.383 us/op	11.670 us/op	1.66
processSlot - 32 slots	2.4418 ms/op	2.7631 ms/op	0.88
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei	39.352 ms/op	35.956 ms/op	1.09
getCommitteeAssignments - req 1 vs - 250000 vc	1.8453 ms/op	1.7160 ms/op	1.08
getCommitteeAssignments - req 100 vs - 250000 vc	3.5834 ms/op	3.3566 ms/op	1.07
getCommitteeAssignments - req 1000 vs - 250000 vc	3.8188 ms/op	3.6364 ms/op	1.05
findModifiedValidators - 10000 modified validators	244.84 ms/op	200.41 ms/op	1.22
findModifiedValidators - 1000 modified validators	188.34 ms/op	137.79 ms/op	1.37
findModifiedValidators - 100 modified validators	186.27 ms/op	140.48 ms/op	1.33
findModifiedValidators - 10 modified validators	160.42 ms/op	140.23 ms/op	1.14
findModifiedValidators - 1 modified validators	165.42 ms/op	126.39 ms/op	1.31
findModifiedValidators - no difference	178.35 ms/op	131.85 ms/op	1.35
compare ViewDUs	3.7375 s/op	2.9247 s/op	1.28
compare each validator Uint8Array	1.4058 s/op	1.8023 s/op	0.78
compare ViewDU to Uint8Array	844.49 ms/op	668.35 ms/op	1.26
migrate state 1000000 validators, 24 modified, 0 new	572.20 ms/op	508.84 ms/op	1.12
migrate state 1000000 validators, 1700 modified, 1000 new	791.01 ms/op	783.50 ms/op	1.01
migrate state 1000000 validators, 3400 modified, 2000 new	1.0368 s/op	931.89 ms/op	1.11
migrate state 1500000 validators, 24 modified, 0 new	604.33 ms/op	598.57 ms/op	1.01
migrate state 1500000 validators, 1700 modified, 1000 new	907.02 ms/op	803.58 ms/op	1.13
migrate state 1500000 validators, 3400 modified, 2000 new	1.0288 s/op	962.03 ms/op	1.07
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei	6.8000 ns/op	6.1200 ns/op	1.11
state getBlockRootAtSlot - 250000 vs - 7PWei	925.63 ns/op	807.16 ns/op	1.15
computeProposers - vc 250000	7.9787 ms/op	6.1965 ms/op	1.29
computeEpochShuffling - vc 250000	90.126 ms/op	83.202 ms/op	1.08
getNextSyncCommittee - vc 250000	102.70 ms/op	102.60 ms/op	1.00
computeSigningRoot for AttestationData	22.026 us/op	20.535 us/op	1.07
hash AttestationData serialized data then Buffer.toString(base64)	1.2168 us/op	1.1673 us/op	1.04
toHexString serialized data	986.85 ns/op	744.36 ns/op	1.33
Buffer.toString(base64)	163.21 ns/op	136.32 ns/op	1.20

by benchmarkbot/action

wemeetagain

The approach here looks good.
I wonder if it can or should be tied to beacon health call in some way.

packages/validator/src/services/syncingStatusTracker.ts

wemeetagain · 2024-08-01T14:50:46Z

packages/validator/src/services/syncingStatusTracker.ts

+        this.logger.debug("Connected beacon node is synced", {slot, ...syncingStatus});
+      }
+    } catch (e) {
+      this.logger.error("Failed to check syncing status", {}, e as Error);


what's the UX of this (and can it be improved)? Most of the time we'll get two logs here, right? both health log and this one?

So we shouldn't have two logs, that seems wrong, need to review

I think a good UX would be

verbose: once a slot to print sync status if isSyncing=false

warn: once a slot if isSyncing=true

error: once a slot if failed to check status (e.g. due to node offline)

but in any case, just one of those, never multiple logs per slot

both health log and this one?

ah you mean a log related to polling health status api? There is no log like this as far as I know

The remaining ux question for me right now is how to deal with existing "Node is syncing" logs which are printed out due to other api failures, like polling duties and are already throttled to once per slot.

I aligned the logs to be somewhat similar

but it might look a bit redundant because getProposerDuties is called every slot right now. I am still not sure if removing those altogether is good because it still provides information about polling failures + includes error message of beacon node.

I think the UX is clean now

info log to inform user about sync status just once on startup or on resynced

warn every slot if not synced with details about head slot / sync distance

verbose dump node syncing status every slot, including all data retrieved from api

There might be additional "Node is syncing" warnings due to duty polling attempts running into 503 errors but those are slightly different in nature as (1) they include error details (hence important) and (2) depend on lifecycle of imported keys and might not even fire but I think it's good to inform the user about the node sync status in any case (think pending validator).

Could further consider the case if beacon node is synced but connected EL client is syncing or offline and emit a warning every slot. Might be out of scope though since status tracker service is primarily designed to trigger duties polling.

nflaig · 2024-08-01T15:09:35Z

I wonder if it can or should be tied to beacon health call in some way.

The syncing status call is more useful as the health status as it provides more detailed info instead of just the status code. Honestly we could consider removing the health status calls completely, the metric is unused and to observe health status of all nodes the approach to use the URL score seems better to me, and we already display this information on the validator client dashboard #6415.

The health status api is mostly meant to be used by tooling like liveness / readiness probes in k8s or other health monitoring systems.

wemeetagain · 2024-08-01T15:53:09Z

Honestly we could consider removing the health status calls completely

would be in favor of that. it would also resolve the UX concern mentioned ^

nflaig · 2024-08-01T15:57:07Z

Honestly we could consider removing the health status calls completely

would be in favor of that. it would also resolve the UX concern mentioned ^

Do we care about removing the metric, is anybody using it? I can't tell but likely not, I can migrate the logic to set the metric to syncing status tracker, it has all the information needed to set it as before

packages/validator/src/services/attestationDuties.ts

wemeetagain · 2024-08-01T16:14:57Z

Do we care about removing the metric, is anybody using it?

Afaik its only used on the beacon side, not on the validator side (because of complications wrt maintaining state of backup beacon nodes). Would be in favor of removal in that case.

nflaig · 2024-08-01T16:22:11Z

Do we care about removing the metric, is anybody using it?

Afaik its only used on the beacon side, not on the validator side (because of complications wrt maintaining state of backup beacon nodes). Would be in favor of removal in that case.

I wonder if the infra team uses it for alerts @Faithtosin? It's mentioned here https://github.com/ChainSafe/lodestar-ansible-development/issues/57, and Lion tagged this as high prio issue #4637 but I fail to see how it was ever used, other than manually via exploring metrics via grafana or /metrics.

Faithtosin · 2024-08-01T20:59:39Z

Do we care about removing the metric, is anybody using it?

Afaik its only used on the beacon side, not on the validator side (because of complications wrt maintaining state of backup beacon nodes). Would be in favor of removal in that case.

I wonder if the infra team uses it for alerts @Faithtosin? It's mentioned here ChainSafe/lodestar-ansible-development#57, and Lion tagged this as high prio issue #4637 but I fail to see how it was ever used, other than manually via exploring metrics via grafana or /metrics.

@nflaig vc_beacon_health metric is not used in any of our internal alerting systems.

…synced

wemeetagain

Looks good. Will you remove the health check + metric?

nflaig · 2024-08-02T14:59:56Z

Looks good. Will you remove the health check + metric?

Have some stuff locally I need to push (mostly tests), was thinking about moving the metrics to syncing status tracker as we anyways have the data there already and we could add this to validator client dashboard, if we remove the metric there is no way from metrics to see if node is syncing.

nflaig · 2024-08-02T21:36:57Z

Looks good. Will you remove the health check + metric?

Added this panel to validator client dashboard now, I guess this was the original idea behind #4939 + could use this for alerting if infra team is interestd (cc @Faithtosin). It might be useful to detect if a validator client is only connected to a beacon node that is not ready to fulfill duties. But only helpful if there are no active keys yet on the validator client, otherwise should probably prefer to use failed duties as an alert trigger.

wemeetagain · 2024-08-08T21:22:47Z

🎉 This PR is included in v1.21.0 🎉

* feat: track syncing status and fetch duties on synced * Rename scheduling function to runOnResynced * Consider prev offline and syncing to trigger resynced event handlers * Add comment to error handler * Add note about el offline and sycning not considered * Align syncing status logs with existing node is syncing logs * Cleanup * Add ssz support to syncing status api * Align beacon node code to return proper types * Keep track of error in prev syncing status * Print slot in error log * Skip on first slot of epoch since tasks are already scheduled * Update api test data * Fix endpoint tests * await scheduled tasks, mostly relevant for testing * Add unit tests * Move beacon heath metric to syncing status tracker * Add beacon health panel to validator client dashboard * Formatting * Improve info called once assertion * Reset mocks after each test

feat: track syncing status and fetch duties on synced

b90a6ce

nflaig requested a review from a team as a code owner August 1, 2024 12:10

nflaig marked this pull request as draft August 1, 2024 12:10

wemeetagain reviewed Aug 1, 2024

View reviewed changes

nflaig changed the title ~~feat: track syncing status and fetch duties on synced~~ feat: track syncing status and fetch duties on resynced Aug 1, 2024

Rename scheduling function to runOnResynced

3969c92

Consider prev offline and syncing to trigger resynced event handlers

f6bff9f

nflaig added 2 commits August 1, 2024 17:04

Add comment to error handler

fde12ca

Merge branch 'unstable' into nflaig/duties-on-synced

1e0b9ea

nflaig commented Aug 1, 2024

View reviewed changes

packages/validator/src/services/attestationDuties.ts Outdated Show resolved Hide resolved

nflaig added 11 commits August 1, 2024 22:43

Add note about el offline and sycning not considered

af82dc0

Align syncing status logs with existing node is syncing logs

2dd62ca

Cleanup

6bc2044

Add ssz support to syncing status api

c7ee7a3

Align beacon node code to return proper types

3febdeb

Keep track of error in prev syncing status

ce63c36

Print slot in error log

e544fd2

Skip on first slot of epoch since tasks are already scheduled

eb92501

Update api test data

56bf716

Fix endpoint tests

84be269

Merge remote-tracking branch 'origin/unstable' into nflaig/duties-on-…

72e20f7

…synced

wemeetagain reviewed Aug 2, 2024

View reviewed changes

nflaig added 7 commits August 2, 2024 19:29

await scheduled tasks, mostly relevant for testing

42bc513

Add unit tests

9368d85

Move beacon heath metric to syncing status tracker

0e4d1bd

Add beacon health panel to validator client dashboard

15b9bde

Formatting

abe5a07

Improve info called once assertion

cdc5a61

Reset mocks after each test

5e21e6a

nflaig marked this pull request as ready for review August 2, 2024 21:31

wemeetagain approved these changes Aug 2, 2024

View reviewed changes

wemeetagain merged commit 3d2c2b3 into unstable Aug 2, 2024
20 checks passed

wemeetagain deleted the nflaig/duties-on-synced branch August 2, 2024 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: track syncing status and fetch duties on resynced #6995

feat: track syncing status and fetch duties on resynced #6995

nflaig commented Aug 1, 2024 •

edited

Loading

codecov bot commented Aug 1, 2024 •

edited

Loading

github-actions bot commented Aug 1, 2024 •

edited

Loading

wemeetagain left a comment

wemeetagain Aug 1, 2024

nflaig Aug 1, 2024

nflaig Aug 1, 2024

nflaig Aug 1, 2024 •

edited

Loading

nflaig Aug 2, 2024 •

edited

Loading

nflaig Aug 2, 2024

nflaig commented Aug 1, 2024 •

edited

Loading

wemeetagain commented Aug 1, 2024

nflaig commented Aug 1, 2024 •

edited

Loading

wemeetagain commented Aug 1, 2024

nflaig commented Aug 1, 2024

Faithtosin commented Aug 1, 2024

wemeetagain left a comment

nflaig commented Aug 2, 2024

nflaig commented Aug 2, 2024

wemeetagain commented Aug 8, 2024

feat: track syncing status and fetch duties on resynced #6995

feat: track syncing status and fetch duties on resynced #6995

Conversation

nflaig commented Aug 1, 2024 • edited Loading

codecov bot commented Aug 1, 2024 • edited Loading

Codecov Report

github-actions bot commented Aug 1, 2024 • edited Loading

Performance Report

wemeetagain left a comment

Choose a reason for hiding this comment

wemeetagain Aug 1, 2024

Choose a reason for hiding this comment

nflaig Aug 1, 2024

Choose a reason for hiding this comment

nflaig Aug 1, 2024

Choose a reason for hiding this comment

nflaig Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

nflaig Aug 2, 2024 • edited Loading

Choose a reason for hiding this comment

nflaig Aug 2, 2024

Choose a reason for hiding this comment

nflaig commented Aug 1, 2024 • edited Loading

wemeetagain commented Aug 1, 2024

nflaig commented Aug 1, 2024 • edited Loading

wemeetagain commented Aug 1, 2024

nflaig commented Aug 1, 2024

Faithtosin commented Aug 1, 2024

wemeetagain left a comment

Choose a reason for hiding this comment

nflaig commented Aug 2, 2024

nflaig commented Aug 2, 2024

wemeetagain commented Aug 8, 2024

nflaig commented Aug 1, 2024 •

edited

Loading

codecov bot commented Aug 1, 2024 •

edited

Loading

github-actions bot commented Aug 1, 2024 •

edited

Loading

nflaig Aug 1, 2024 •

edited

Loading

nflaig Aug 2, 2024 •

edited

Loading

nflaig commented Aug 1, 2024 •

edited

Loading

nflaig commented Aug 1, 2024 •

edited

Loading