Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: verify gossip attestation messages in batch #5896

Merged
merged 21 commits into from
Aug 22, 2023

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Aug 20, 2023

Motivation

  • Improve node's performance mainly in subscribeAllSubnets mode

Description

  • Consume the IndexedGossipQueues feat: implement IndexedGossipQueue #5803
    • Enforce 50ms wait time for each key if key size is <= 32 to allow batch more attestations
  • Consume the verifySignatureSetsSameMessage BLS api feat: add verifySignatureSetsSameMessage BLS api #5747
  • This work is designed to work with network thread (useWorker=true) because it blocks main thread to aggregate signatures, so implemented network.beaconAttestationBatchValidation
    • By default this flag is undefined|false, it's for backward compatible. In this case it goes with LinearGossipQueue and DefaultGossipHandlers
    • When this flag is turned on, it uses IndexedGossipQueueMinSize and BatchGossipHandlers
  • This work reduces cpu time a lot because we pass way less bls signature set through thread boundary, and likely it resolves Network worker extremely busy / high event loop lag / high I/O lag #5604 because I/O lag is reduced significantly there (see test result below)

Closes #5416

Test result on test mainnet node

  • Same mesh peers to feat mainnet node (unstable + use_worker=true) there, so the gossipsub bandwidth is the same
Screenshot 2023-08-20 at 14 44 42
  • Attestations are forwarded more with more sent peers (feat1 has a little bit less forwarded attestations and <0.5 sent peers so this is >3x better)
Screenshot 2023-08-20 at 14 45 56
  • This is because of way better gossip job time + job wait time for beacon_attestation topic
Screenshot 2023-08-20 at 14 50 13
  • Attestation batch percentage
Screenshot 2023-08-20 at 14 52 37
  • Attestation batch histogram
Screenshot 2023-08-20 at 14 53 29
  • Gossip queue key size
Screenshot 2023-08-20 at 14 53 02
  • CPU usage is ~150% which less than half of feat1 (~330%)
Screenshot 2023-08-20 at 14 56 00
  • Lastly prom-client event loop lag is way more better than feat1
Screenshot 2023-08-20 at 14 59 13
  • vs feat1 mainnet node (useWorker=true there)
Screenshot 2023-08-20 at 14 59 56

@github-actions
Copy link
Contributor

github-actions bot commented Aug 20, 2023

Performance Report

✔️ no performance regression detected

🚀🚀 Significant benchmark improvement detected

Benchmark suite Current: bd21c59 Previous: 8c6ad38 Ratio
forkChoice updateHead vc 600000 bc 64 eq 300000 15.540 ms/op 88.923 ms/op 0.17
Full benchmark results
Benchmark suite Current: bd21c59 Previous: 8c6ad38 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 650.25 us/op 583.11 us/op 1.12
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 93.750 us/op 118.29 us/op 0.79
BLS verify - blst-native 1.3058 ms/op 1.3179 ms/op 0.99
BLS verifyMultipleSignatures 3 - blst-native 2.7418 ms/op 2.6660 ms/op 1.03
BLS verifyMultipleSignatures 8 - blst-native 6.0050 ms/op 5.6243 ms/op 1.07
BLS verifyMultipleSignatures 32 - blst-native 21.955 ms/op 20.362 ms/op 1.08
BLS verifyMultipleSignatures 64 - blst-native 43.203 ms/op
BLS verifyMultipleSignatures 128 - blst-native 85.783 ms/op
BLS verifyMultipleSignatures - same message - 3 - blst-native 1.2879 ms/op
BLS verifyMultipleSignatures - same message - 8 - blst-native 1.5560 ms/op
BLS verifyMultipleSignatures - same message - 32 - blst-native 2.3037 ms/op
BLS verifyMultipleSignatures - same message - 64 - blst-native 3.3715 ms/op
BLS verifyMultipleSignatures - same message - 128 - blst-native 5.4355 ms/op
BLS aggregatePubkeys 32 - blst-native 24.768 us/op 27.050 us/op 0.92
BLS aggregatePubkeys 128 - blst-native 97.424 us/op 106.80 us/op 0.91
getAttestationsForBlock 52.019 ms/op 97.151 ms/op 0.54
isKnown best case - 1 super set check 289.00 ns/op 633.00 ns/op 0.46
isKnown normal case - 2 super set checks 281.00 ns/op 607.00 ns/op 0.46
isKnown worse case - 16 super set checks 271.00 ns/op 615.00 ns/op 0.44
CheckpointStateCache - add get delete 4.9360 us/op 7.2140 us/op 0.68
validate api signedAggregateAndProof - struct 2.7462 ms/op 3.1097 ms/op 0.88
validate gossip signedAggregateAndProof - struct 2.7531 ms/op 3.0851 ms/op 0.89
validate gossip attestation - vc 640000 1.3034 ms/op
batch validate gossip attestation - vc 640000 - chunk 32 142.38 us/op
batch validate gossip attestation - vc 640000 - chunk 64 126.53 us/op
batch validate gossip attestation - vc 640000 - chunk 128 115.97 us/op
batch validate gossip attestation - vc 640000 - chunk 256 109.40 us/op
pickEth1Vote - no votes 1.0949 ms/op 2.2464 ms/op 0.49
pickEth1Vote - max votes 9.5591 ms/op 19.880 ms/op 0.48
pickEth1Vote - Eth1Data hashTreeRoot value x2048 8.6985 ms/op 11.426 ms/op 0.76
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 13.583 ms/op 24.787 ms/op 0.55
pickEth1Vote - Eth1Data fastSerialize value x2048 571.51 us/op 883.25 us/op 0.65
pickEth1Vote - Eth1Data fastSerialize tree x2048 7.2380 ms/op 12.534 ms/op 0.58
bytes32 toHexString 455.00 ns/op 902.00 ns/op 0.50
bytes32 Buffer.toString(hex) 280.00 ns/op 375.00 ns/op 0.75
bytes32 Buffer.toString(hex) from Uint8Array 418.00 ns/op 772.00 ns/op 0.54
bytes32 Buffer.toString(hex) + 0x 280.00 ns/op 424.00 ns/op 0.66
Object access 1 prop 0.14300 ns/op 0.28100 ns/op 0.51
Map access 1 prop 0.14500 ns/op 0.18500 ns/op 0.78
Object get x1000 7.4270 ns/op 10.240 ns/op 0.73
Map get x1000 0.60400 ns/op 0.80800 ns/op 0.75
Object set x1000 46.505 ns/op 88.724 ns/op 0.52
Map set x1000 37.832 ns/op 64.521 ns/op 0.59
Return object 10000 times 0.23160 ns/op 0.30570 ns/op 0.76
Throw Error 10000 times 3.7640 us/op 4.4103 us/op 0.85
fastMsgIdFn sha256 / 200 bytes 3.1850 us/op 4.7120 us/op 0.68
fastMsgIdFn h32 xxhash / 200 bytes 267.00 ns/op 464.00 ns/op 0.58
fastMsgIdFn h64 xxhash / 200 bytes 333.00 ns/op 659.00 ns/op 0.51
fastMsgIdFn sha256 / 1000 bytes 11.126 us/op 12.731 us/op 0.87
fastMsgIdFn h32 xxhash / 1000 bytes 388.00 ns/op 651.00 ns/op 0.60
fastMsgIdFn h64 xxhash / 1000 bytes 398.00 ns/op 740.00 ns/op 0.54
fastMsgIdFn sha256 / 10000 bytes 102.66 us/op 122.45 us/op 0.84
fastMsgIdFn h32 xxhash / 10000 bytes 1.8620 us/op 2.1400 us/op 0.87
fastMsgIdFn h64 xxhash / 10000 bytes 1.2730 us/op 1.6330 us/op 0.78
enrSubnets - fastDeserialize 64 bits 1.1720 us/op 2.2500 us/op 0.52
enrSubnets - ssz BitVector 64 bits 411.00 ns/op 837.00 ns/op 0.49
enrSubnets - fastDeserialize 4 bits 158.00 ns/op 309.00 ns/op 0.51
enrSubnets - ssz BitVector 4 bits 403.00 ns/op 936.00 ns/op 0.43
prioritizePeers score -10:0 att 32-0.1 sync 2-0 100.05 us/op 165.98 us/op 0.60
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 121.07 us/op 204.55 us/op 0.59
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 157.94 us/op 293.21 us/op 0.54
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 280.99 us/op 598.55 us/op 0.47
prioritizePeers score 0:0 att 64-1 sync 4-1 333.13 us/op 557.27 us/op 0.60
array of 16000 items push then shift 1.5873 us/op 2.1480 us/op 0.74
LinkedList of 16000 items push then shift 8.8520 ns/op 15.605 ns/op 0.57
array of 16000 items push then pop 70.599 ns/op 98.577 ns/op 0.72
LinkedList of 16000 items push then pop 8.6370 ns/op 12.996 ns/op 0.66
array of 24000 items push then shift 2.3498 us/op 3.3258 us/op 0.71
LinkedList of 24000 items push then shift 8.5460 ns/op 17.562 ns/op 0.49
array of 24000 items push then pop 86.028 ns/op 198.26 ns/op 0.43
LinkedList of 24000 items push then pop 8.6060 ns/op 12.932 ns/op 0.67
intersect bitArray bitLen 8 6.7350 ns/op 8.8160 ns/op 0.76
intersect array and set length 8 53.897 ns/op 99.065 ns/op 0.54
intersect bitArray bitLen 128 31.624 ns/op 42.202 ns/op 0.75
intersect array and set length 128 741.54 ns/op 1.2440 us/op 0.60
bitArray.getTrueBitIndexes() bitLen 128 1.3750 us/op 2.7640 us/op 0.50
bitArray.getTrueBitIndexes() bitLen 248 2.3370 us/op 4.6910 us/op 0.50
bitArray.getTrueBitIndexes() bitLen 512 4.5270 us/op 8.3870 us/op 0.54
Buffer.concat 32 items 901.00 ns/op 1.5760 us/op 0.57
Uint8Array.set 32 items 1.5940 us/op 3.2600 us/op 0.49
Set add up to 64 items then delete first 4.2512 us/op 5.3837 us/op 0.79
OrderedSet add up to 64 items then delete first 5.3042 us/op 7.1666 us/op 0.74
Set add up to 64 items then delete last 4.5342 us/op 5.8912 us/op 0.77
OrderedSet add up to 64 items then delete last 5.6435 us/op 8.4772 us/op 0.67
Set add up to 64 items then delete middle 4.5266 us/op 5.5665 us/op 0.81
OrderedSet add up to 64 items then delete middle 6.9160 us/op 9.3635 us/op 0.74
Set add up to 128 items then delete first 9.3302 us/op 12.042 us/op 0.77
OrderedSet add up to 128 items then delete first 12.172 us/op 16.527 us/op 0.74
Set add up to 128 items then delete last 9.0123 us/op 12.905 us/op 0.70
OrderedSet add up to 128 items then delete last 11.341 us/op 16.961 us/op 0.67
Set add up to 128 items then delete middle 8.8846 us/op 11.264 us/op 0.79
OrderedSet add up to 128 items then delete middle 16.566 us/op 22.948 us/op 0.72
Set add up to 256 items then delete first 18.736 us/op 23.258 us/op 0.81
OrderedSet add up to 256 items then delete first 24.583 us/op 32.332 us/op 0.76
Set add up to 256 items then delete last 18.045 us/op 23.386 us/op 0.77
OrderedSet add up to 256 items then delete last 23.178 us/op 37.348 us/op 0.62
Set add up to 256 items then delete middle 17.830 us/op 24.414 us/op 0.73
OrderedSet add up to 256 items then delete middle 44.745 us/op 57.099 us/op 0.78
transfer serialized Status (84 B) 1.7980 us/op 3.3990 us/op 0.53
copy serialized Status (84 B) 1.5290 us/op 2.5660 us/op 0.60
transfer serialized SignedVoluntaryExit (112 B) 1.9120 us/op 3.1400 us/op 0.61
copy serialized SignedVoluntaryExit (112 B) 1.6860 us/op 2.4860 us/op 0.68
transfer serialized ProposerSlashing (416 B) 2.5660 us/op 3.2500 us/op 0.79
copy serialized ProposerSlashing (416 B) 2.6820 us/op 3.4880 us/op 0.77
transfer serialized Attestation (485 B) 2.5030 us/op 3.4950 us/op 0.72
copy serialized Attestation (485 B) 2.9090 us/op 3.6480 us/op 0.80
transfer serialized AttesterSlashing (33232 B) 3.1470 us/op 4.2810 us/op 0.74
copy serialized AttesterSlashing (33232 B) 5.8990 us/op 15.038 us/op 0.39
transfer serialized Small SignedBeaconBlock (128000 B) 3.0850 us/op 5.4080 us/op 0.57
copy serialized Small SignedBeaconBlock (128000 B) 12.647 us/op 45.354 us/op 0.28
transfer serialized Avg SignedBeaconBlock (200000 B) 3.2910 us/op 6.3970 us/op 0.51
copy serialized Avg SignedBeaconBlock (200000 B) 17.883 us/op 63.969 us/op 0.28
transfer serialized BlobsSidecar (524380 B) 3.0440 us/op 7.1090 us/op 0.43
copy serialized BlobsSidecar (524380 B) 103.33 us/op 199.05 us/op 0.52
transfer serialized Big SignedBeaconBlock (1000000 B) 3.2730 us/op 9.0040 us/op 0.36
copy serialized Big SignedBeaconBlock (1000000 B) 151.87 us/op 408.84 us/op 0.37
pass gossip attestations to forkchoice per slot 3.7002 ms/op 5.5674 ms/op 0.66
forkChoice updateHead vc 100000 bc 64 eq 0 712.56 us/op 759.34 us/op 0.94
forkChoice updateHead vc 600000 bc 64 eq 0 4.5349 ms/op 6.9538 ms/op 0.65
forkChoice updateHead vc 1000000 bc 64 eq 0 6.9796 ms/op 11.220 ms/op 0.62
forkChoice updateHead vc 600000 bc 320 eq 0 4.1492 ms/op 7.6581 ms/op 0.54
forkChoice updateHead vc 600000 bc 1200 eq 0 4.2519 ms/op 7.3551 ms/op 0.58
forkChoice updateHead vc 600000 bc 7200 eq 0 5.3916 ms/op 10.010 ms/op 0.54
forkChoice updateHead vc 600000 bc 64 eq 1000 10.986 ms/op 14.429 ms/op 0.76
forkChoice updateHead vc 600000 bc 64 eq 10000 11.792 ms/op 16.235 ms/op 0.73
forkChoice updateHead vc 600000 bc 64 eq 300000 15.540 ms/op 88.923 ms/op 0.17
computeDeltas 500000 validators 300 proto nodes 6.2227 ms/op 8.3456 ms/op 0.75
computeDeltas 500000 validators 1200 proto nodes 6.1935 ms/op 7.7754 ms/op 0.80
computeDeltas 500000 validators 7200 proto nodes 6.2293 ms/op 7.5948 ms/op 0.82
computeDeltas 750000 validators 300 proto nodes 9.2191 ms/op 10.087 ms/op 0.91
computeDeltas 750000 validators 1200 proto nodes 9.3378 ms/op 10.342 ms/op 0.90
computeDeltas 750000 validators 7200 proto nodes 9.3410 ms/op 10.366 ms/op 0.90
computeDeltas 1400000 validators 300 proto nodes 17.659 ms/op 19.346 ms/op 0.91
computeDeltas 1400000 validators 1200 proto nodes 17.896 ms/op 18.657 ms/op 0.96
computeDeltas 1400000 validators 7200 proto nodes 17.423 ms/op 19.874 ms/op 0.88
computeDeltas 2100000 validators 300 proto nodes 25.984 ms/op 27.885 ms/op 0.93
computeDeltas 2100000 validators 1200 proto nodes 26.317 ms/op 27.596 ms/op 0.95
computeDeltas 2100000 validators 7200 proto nodes 26.237 ms/op 27.501 ms/op 0.95
computeProposerBoostScoreFromBalances 500000 validators 3.2321 ms/op 3.4304 ms/op 0.94
computeProposerBoostScoreFromBalances 750000 validators 3.1966 ms/op 3.3827 ms/op 0.94
computeProposerBoostScoreFromBalances 1400000 validators 3.2050 ms/op 3.7896 ms/op 0.85
computeProposerBoostScoreFromBalances 2100000 validators 3.1955 ms/op 3.7836 ms/op 0.84
altair processAttestation - 250000 vs - 7PWei normalcase 2.2101 ms/op 2.5830 ms/op 0.86
altair processAttestation - 250000 vs - 7PWei worstcase 3.1831 ms/op 4.2656 ms/op 0.75
altair processAttestation - setStatus - 1/6 committees join 177.83 us/op 186.17 us/op 0.96
altair processAttestation - setStatus - 1/3 committees join 342.80 us/op 354.21 us/op 0.97
altair processAttestation - setStatus - 1/2 committees join 463.41 us/op 494.12 us/op 0.94
altair processAttestation - setStatus - 2/3 committees join 583.91 us/op 621.04 us/op 0.94
altair processAttestation - setStatus - 4/5 committees join 791.06 us/op 852.86 us/op 0.93
altair processAttestation - setStatus - 100% committees join 901.96 us/op 1.0013 ms/op 0.90
altair processBlock - 250000 vs - 7PWei normalcase 10.953 ms/op 10.393 ms/op 1.05
altair processBlock - 250000 vs - 7PWei normalcase hashState 17.433 ms/op 19.884 ms/op 0.88
altair processBlock - 250000 vs - 7PWei worstcase 39.838 ms/op 42.433 ms/op 0.94
altair processBlock - 250000 vs - 7PWei worstcase hashState 58.688 ms/op 67.993 ms/op 0.86
phase0 processBlock - 250000 vs - 7PWei normalcase 2.4166 ms/op 3.1210 ms/op 0.77
phase0 processBlock - 250000 vs - 7PWei worstcase 30.491 ms/op 35.322 ms/op 0.86
altair processEth1Data - 250000 vs - 7PWei normalcase 486.07 us/op 666.95 us/op 0.73
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 12.357 us/op 12.286 us/op 1.01
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 85.147 us/op 93.928 us/op 0.91
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 15.605 us/op 34.737 us/op 0.45
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 11.047 us/op 21.408 us/op 0.52
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 169.72 us/op 220.03 us/op 0.77
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 1.0464 ms/op 1.4474 ms/op 0.72
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 1.4493 ms/op 2.2025 ms/op 0.66
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 1.4264 ms/op 1.8636 ms/op 0.77
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 3.4463 ms/op 5.4899 ms/op 0.63
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 2.3035 ms/op 5.3710 ms/op 0.43
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 4.7756 ms/op 8.5747 ms/op 0.56
Tree 40 250000 create 327.09 ms/op 514.08 ms/op 0.64
Tree 40 250000 get(125000) 191.44 ns/op 219.50 ns/op 0.87
Tree 40 250000 set(125000) 858.58 ns/op 1.3089 us/op 0.66
Tree 40 250000 toArray() 16.994 ms/op 23.388 ms/op 0.73
Tree 40 250000 iterate all - toArray() + loop 17.180 ms/op 24.119 ms/op 0.71
Tree 40 250000 iterate all - get(i) 63.857 ms/op 83.852 ms/op 0.76
MutableVector 250000 create 10.365 ms/op 14.481 ms/op 0.72
MutableVector 250000 get(125000) 6.3660 ns/op 6.8400 ns/op 0.93
MutableVector 250000 set(125000) 245.74 ns/op 339.05 ns/op 0.72
MutableVector 250000 toArray() 2.6099 ms/op 3.6200 ms/op 0.72
MutableVector 250000 iterate all - toArray() + loop 3.1794 ms/op 4.3904 ms/op 0.72
MutableVector 250000 iterate all - get(i) 1.5206 ms/op 1.6740 ms/op 0.91
Array 250000 create 2.3982 ms/op 3.9551 ms/op 0.61
Array 250000 clone - spread 1.1496 ms/op 1.3201 ms/op 0.87
Array 250000 get(125000) 0.57100 ns/op 0.62500 ns/op 0.91
Array 250000 set(125000) 0.64200 ns/op 0.72100 ns/op 0.89
Array 250000 iterate all - loop 81.995 us/op 110.69 us/op 0.74
effectiveBalanceIncrements clone Uint8Array 300000 27.633 us/op 45.593 us/op 0.61
effectiveBalanceIncrements clone MutableVector 300000 334.00 ns/op 385.00 ns/op 0.87
effectiveBalanceIncrements rw all Uint8Array 300000 176.35 us/op 187.52 us/op 0.94
effectiveBalanceIncrements rw all MutableVector 300000 80.935 ms/op 106.02 ms/op 0.76
phase0 afterProcessEpoch - 250000 vs - 7PWei 110.98 ms/op 138.88 ms/op 0.80
phase0 beforeProcessEpoch - 250000 vs - 7PWei 30.951 ms/op 40.303 ms/op 0.77
altair processEpoch - mainnet_e81889 320.09 ms/op 407.61 ms/op 0.79
mainnet_e81889 - altair beforeProcessEpoch 61.541 ms/op 92.367 ms/op 0.67
mainnet_e81889 - altair processJustificationAndFinalization 14.429 us/op 31.078 us/op 0.46
mainnet_e81889 - altair processInactivityUpdates 5.3161 ms/op 9.3057 ms/op 0.57
mainnet_e81889 - altair processRewardsAndPenalties 63.112 ms/op 82.088 ms/op 0.77
mainnet_e81889 - altair processRegistryUpdates 2.4470 us/op 6.1730 us/op 0.40
mainnet_e81889 - altair processSlashings 450.00 ns/op 908.00 ns/op 0.50
mainnet_e81889 - altair processEth1DataReset 572.00 ns/op 1.0470 us/op 0.55
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.2013 ms/op 1.5316 ms/op 0.78
mainnet_e81889 - altair processSlashingsReset 3.3230 us/op 5.4910 us/op 0.61
mainnet_e81889 - altair processRandaoMixesReset 4.1460 us/op 7.8130 us/op 0.53
mainnet_e81889 - altair processHistoricalRootsUpdate 693.00 ns/op 1.3220 us/op 0.52
mainnet_e81889 - altair processParticipationFlagUpdates 1.4250 us/op 2.1760 us/op 0.65
mainnet_e81889 - altair processSyncCommitteeUpdates 434.00 ns/op 964.00 ns/op 0.45
mainnet_e81889 - altair afterProcessEpoch 116.62 ms/op 132.57 ms/op 0.88
capella processEpoch - mainnet_e217614 1.0113 s/op 1.1635 s/op 0.87
mainnet_e217614 - capella beforeProcessEpoch 236.59 ms/op 286.55 ms/op 0.83
mainnet_e217614 - capella processJustificationAndFinalization 13.693 us/op 25.480 us/op 0.54
mainnet_e217614 - capella processInactivityUpdates 17.952 ms/op 22.257 ms/op 0.81
mainnet_e217614 - capella processRewardsAndPenalties 280.16 ms/op 326.51 ms/op 0.86
mainnet_e217614 - capella processRegistryUpdates 16.812 us/op 33.520 us/op 0.50
mainnet_e217614 - capella processSlashings 530.00 ns/op 1.0420 us/op 0.51
mainnet_e217614 - capella processEth1DataReset 450.00 ns/op 843.00 ns/op 0.53
mainnet_e217614 - capella processEffectiveBalanceUpdates 3.9557 ms/op 4.4853 ms/op 0.88
mainnet_e217614 - capella processSlashingsReset 2.4900 us/op 3.4100 us/op 0.73
mainnet_e217614 - capella processRandaoMixesReset 4.1840 us/op 6.8280 us/op 0.61
mainnet_e217614 - capella processHistoricalRootsUpdate 645.00 ns/op 789.00 ns/op 0.82
mainnet_e217614 - capella processParticipationFlagUpdates 1.4900 us/op 4.2580 us/op 0.35
mainnet_e217614 - capella afterProcessEpoch 287.83 ms/op 356.94 ms/op 0.81
phase0 processEpoch - mainnet_e58758 327.96 ms/op 401.57 ms/op 0.82
mainnet_e58758 - phase0 beforeProcessEpoch 117.00 ms/op 173.45 ms/op 0.67
mainnet_e58758 - phase0 processJustificationAndFinalization 14.464 us/op 29.569 us/op 0.49
mainnet_e58758 - phase0 processRewardsAndPenalties 53.410 ms/op 51.914 ms/op 1.03
mainnet_e58758 - phase0 processRegistryUpdates 8.9190 us/op 21.355 us/op 0.42
mainnet_e58758 - phase0 processSlashings 536.00 ns/op 1.4800 us/op 0.36
mainnet_e58758 - phase0 processEth1DataReset 434.00 ns/op 1.0350 us/op 0.42
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 958.69 us/op 1.5271 ms/op 0.63
mainnet_e58758 - phase0 processSlashingsReset 2.3230 us/op 5.8890 us/op 0.39
mainnet_e58758 - phase0 processRandaoMixesReset 4.1090 us/op 11.874 us/op 0.35
mainnet_e58758 - phase0 processHistoricalRootsUpdate 457.00 ns/op 1.5370 us/op 0.30
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.5970 us/op 7.3670 us/op 0.49
mainnet_e58758 - phase0 afterProcessEpoch 100.98 ms/op 117.37 ms/op 0.86
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.2123 ms/op 2.0920 ms/op 0.58
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.4037 ms/op 2.2833 ms/op 0.61
altair processInactivityUpdates - 250000 normalcase 17.802 ms/op 39.554 ms/op 0.45
altair processInactivityUpdates - 250000 worstcase 20.393 ms/op 30.971 ms/op 0.66
phase0 processRegistryUpdates - 250000 normalcase 8.2830 us/op 14.919 us/op 0.56
phase0 processRegistryUpdates - 250000 badcase_full_deposits 308.23 us/op 436.78 us/op 0.71
phase0 processRegistryUpdates - 250000 worstcase 0.5 130.33 ms/op 173.65 ms/op 0.75
altair processRewardsAndPenalties - 250000 normalcase 45.003 ms/op 81.151 ms/op 0.55
altair processRewardsAndPenalties - 250000 worstcase 46.157 ms/op 88.723 ms/op 0.52
phase0 getAttestationDeltas - 250000 normalcase 7.4918 ms/op 13.233 ms/op 0.57
phase0 getAttestationDeltas - 250000 worstcase 7.7325 ms/op 12.633 ms/op 0.61
phase0 processSlashings - 250000 worstcase 2.1993 ms/op 3.4890 ms/op 0.63
altair processSyncCommitteeUpdates - 250000 150.36 ms/op 178.96 ms/op 0.84
BeaconState.hashTreeRoot - No change 260.00 ns/op 329.00 ns/op 0.79
BeaconState.hashTreeRoot - 1 full validator 50.168 us/op 56.799 us/op 0.88
BeaconState.hashTreeRoot - 32 full validator 545.79 us/op 603.49 us/op 0.90
BeaconState.hashTreeRoot - 512 full validator 5.2685 ms/op 7.1041 ms/op 0.74
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 65.721 us/op 70.069 us/op 0.94
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 841.20 us/op 1.0183 ms/op 0.83
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 11.053 ms/op 13.428 ms/op 0.82
BeaconState.hashTreeRoot - 1 balances 46.805 us/op 56.504 us/op 0.83
BeaconState.hashTreeRoot - 32 balances 440.15 us/op 490.27 us/op 0.90
BeaconState.hashTreeRoot - 512 balances 4.3072 ms/op 5.7796 ms/op 0.75
BeaconState.hashTreeRoot - 250000 balances 75.362 ms/op 91.190 ms/op 0.83
aggregationBits - 2048 els - zipIndexesInBitList 15.927 us/op 28.384 us/op 0.56
regular array get 100000 times 32.177 us/op 46.821 us/op 0.69
wrappedArray get 100000 times 32.239 us/op 46.359 us/op 0.70
arrayWithProxy get 100000 times 14.687 ms/op 15.327 ms/op 0.96
ssz.Root.equals 196.00 ns/op 298.00 ns/op 0.66
byteArrayEquals 200.00 ns/op 275.00 ns/op 0.73
shuffle list - 16384 els 6.8569 ms/op 8.6454 ms/op 0.79
shuffle list - 250000 els 100.42 ms/op 114.65 ms/op 0.88
processSlot - 1 slots 8.2270 us/op 12.421 us/op 0.66
processSlot - 32 slots 1.3145 ms/op 1.4523 ms/op 0.91
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 51.232 ms/op 66.370 ms/op 0.77
getCommitteeAssignments - req 1 vs - 250000 vc 2.4876 ms/op 2.7183 ms/op 0.92
getCommitteeAssignments - req 100 vs - 250000 vc 3.6322 ms/op 4.0481 ms/op 0.90
getCommitteeAssignments - req 1000 vs - 250000 vc 3.9877 ms/op 4.5978 ms/op 0.87
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 4.5400 ns/op 5.9900 ns/op 0.76
state getBlockRootAtSlot - 250000 vs - 7PWei 874.50 ns/op 1.0437 us/op 0.84
computeProposers - vc 250000 9.1674 ms/op 10.148 ms/op 0.90
computeEpochShuffling - vc 250000 103.29 ms/op 132.95 ms/op 0.78
getNextSyncCommittee - vc 250000 149.53 ms/op 181.86 ms/op 0.82
computeSigningRoot for AttestationData 12.686 us/op 16.452 us/op 0.77
hash AttestationData serialized data then Buffer.toString(base64) 2.2657 us/op 2.6387 us/op 0.86
toHexString serialized data 1.0693 us/op 1.8381 us/op 0.58
Buffer.toString(base64) 212.78 ns/op 288.75 ns/op 0.74

by benchmarkbot/action

@twoeths twoeths marked this pull request as ready for review August 20, 2023 08:27
@twoeths twoeths requested a review from a team as a code owner August 20, 2023 08:27
@twoeths
Copy link
Contributor Author

twoeths commented Aug 20, 2023

one trade off is "prometheus scrape duration" metric, it's ~80ms compare to ~25ms in feat1 mostly because we aggregate signatures on the main thread:

  • it does not really matter because we mostly care about I/O lag issue in the network thread
  • when validating block we don't validate any attestations so it does not affect notifyNewPayload engine api call much
  • vc apis (mostly submitPoolAttestations) are the same, we prioritized api bls signature sets already

also the beacon block "time until head" is the same

@twoeths twoeths changed the title feat: verify gossip messages in batch feat: verify gossip attestation messages in batch Aug 20, 2023
attestationOrBytesArr: AttestationOrBytes[],
subnet: number,
// for unit test, consumers do not need to pass this
phase0ValidationFn = validateGossipAttestationNoSignatureCheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this is talking about the phase0 fork, would recommend changing this language to "step 0", "step 1", etc.

@@ -133,7 +133,8 @@ export class BlsMultiThreadWorkerPool implements IBlsVerifier {
// THe worker is not able to deserialize from uncompressed
// `Error: err _wrapDeserialize`
this.format = implementation === "blst-native" ? PointFormat.uncompressed : PointFormat.compressed;
this.workers = this.createWorkers(implementation, defaultPoolSize);
// 1 worker for the main thread
this.workers = this.createWorkers(implementation, defaultPoolSize - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check above that defaultPoolSize is greater than 1

this.lastWaitTimeCheckedMs = now;
this.nextWaitTimeMs = null;
let resultedKey: string | null = null;
for (const key of this.indexedItems.keys()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterate by entries so you don't have to do a get below

@@ -18,28 +29,33 @@ import {GossipQueue, IndexedGossipQueueMinSizeOpts} from "./types.js";
*/
export class IndexedGossipQueueMinSize<T extends {indexed?: string}> implements GossipQueue<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not already you should collect this metrics when the node is scraped:

  • Total count of unique data strings in the cache
  • Total count of individual attestations in the cache
  • Current histogram (reset, iterate, observe; like we do in the peer manager) of how long each data has been in the queue since addition
  • Histogram of the age of a data when it is picked by the processor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added missing metrics 👍

@twoeths twoeths merged commit 7ee07da into unstable Aug 22, 2023
@twoeths twoeths deleted the tuyen/verify_gossip_messages_in_batch_pr branch August 22, 2023 08:29
@wemeetagain
Copy link
Member

🎉 This PR is included in v1.11.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Network worker extremely busy / high event loop lag / high I/O lag Batch gossip validation
3 participants