Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize calls to the engine api #4651

Merged
merged 5 commits into from
Oct 10, 2022
Merged

Conversation

g11tech
Copy link
Contributor

@g11tech g11tech commented Oct 7, 2022

Serialize calls to the engine api

Follow up of #4595

Metrics for engine job processor:

image

@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2022

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: cef6462 Previous: 2a39dff Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 1.5504 ms/op 3.0373 ms/op 0.51
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 64.231 us/op 103.09 us/op 0.62
BLS verify - blst-native 2.1635 ms/op 2.4488 ms/op 0.88
BLS verifyMultipleSignatures 3 - blst-native 4.4676 ms/op 4.9245 ms/op 0.91
BLS verifyMultipleSignatures 8 - blst-native 9.6614 ms/op 10.955 ms/op 0.88
BLS verifyMultipleSignatures 32 - blst-native 35.137 ms/op 38.429 ms/op 0.91
BLS aggregatePubkeys 32 - blst-native 46.427 us/op 52.410 us/op 0.89
BLS aggregatePubkeys 128 - blst-native 182.15 us/op 204.63 us/op 0.89
getAttestationsForBlock 75.163 ms/op 126.01 ms/op 0.60
isKnown best case - 1 super set check 465.00 ns/op 562.00 ns/op 0.83
isKnown normal case - 2 super set checks 455.00 ns/op 549.00 ns/op 0.83
isKnown worse case - 16 super set checks 457.00 ns/op 569.00 ns/op 0.80
CheckpointStateCache - add get delete 8.5130 us/op 12.358 us/op 0.69
validate gossip signedAggregateAndProof - struct 5.0084 ms/op 5.5426 ms/op 0.90
validate gossip attestation - struct 2.3707 ms/op 2.7089 ms/op 0.88
pickEth1Vote - no votes 2.3492 ms/op 2.7510 ms/op 0.85
pickEth1Vote - max votes 17.979 ms/op 28.381 ms/op 0.63
pickEth1Vote - Eth1Data hashTreeRoot value x2048 12.289 ms/op 15.589 ms/op 0.79
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 19.876 ms/op 30.041 ms/op 0.66
pickEth1Vote - Eth1Data fastSerialize value x2048 1.4126 ms/op 2.0355 ms/op 0.69
pickEth1Vote - Eth1Data fastSerialize tree x2048 11.938 ms/op 19.723 ms/op 0.61
bytes32 toHexString 934.00 ns/op 1.5390 us/op 0.61
bytes32 Buffer.toString(hex) 720.00 ns/op 980.00 ns/op 0.73
bytes32 Buffer.toString(hex) from Uint8Array 1.0010 us/op 1.5640 us/op 0.64
bytes32 Buffer.toString(hex) + 0x 750.00 ns/op 968.00 ns/op 0.77
Object access 1 prop 0.34500 ns/op 0.51000 ns/op 0.68
Map access 1 prop 0.31400 ns/op 0.40100 ns/op 0.78
Object get x1000 11.616 ns/op 17.508 ns/op 0.66
Map get x1000 0.92400 ns/op 1.0210 ns/op 0.90
Object set x1000 70.475 ns/op 145.15 ns/op 0.49
Map set x1000 46.658 ns/op 102.27 ns/op 0.46
Return object 10000 times 0.43360 ns/op 0.46540 ns/op 0.93
Throw Error 10000 times 6.1513 us/op 9.0209 us/op 0.68
fastMsgIdFn sha256 / 200 bytes 4.8790 us/op 6.0590 us/op 0.81
fastMsgIdFn h32 xxhash / 200 bytes 579.00 ns/op 710.00 ns/op 0.82
fastMsgIdFn h64 xxhash / 200 bytes 731.00 ns/op 974.00 ns/op 0.75
fastMsgIdFn sha256 / 1000 bytes 15.573 us/op 18.723 us/op 0.83
fastMsgIdFn h32 xxhash / 1000 bytes 782.00 ns/op 957.00 ns/op 0.82
fastMsgIdFn h64 xxhash / 1000 bytes 880.00 ns/op 995.00 ns/op 0.88
fastMsgIdFn sha256 / 10000 bytes 135.27 us/op 159.37 us/op 0.85
fastMsgIdFn h32 xxhash / 10000 bytes 2.6520 us/op 2.9940 us/op 0.89
fastMsgIdFn h64 xxhash / 10000 bytes 1.8560 us/op 2.3490 us/op 0.79
enrSubnets - fastDeserialize 64 bits 2.4630 us/op 3.9330 us/op 0.63
enrSubnets - ssz BitVector 64 bits 812.00 ns/op 1.0250 us/op 0.79
enrSubnets - fastDeserialize 4 bits 370.00 ns/op 518.00 ns/op 0.71
enrSubnets - ssz BitVector 4 bits 773.00 ns/op 1.0500 us/op 0.74
prioritizePeers score -10:0 att 32-0.1 sync 2-0 81.263 us/op 132.71 us/op 0.61
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 114.66 us/op 170.91 us/op 0.67
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 190.61 us/op 321.56 us/op 0.59
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 372.84 us/op 691.84 us/op 0.54
prioritizePeers score 0:0 att 64-1 sync 4-1 406.90 us/op 638.90 us/op 0.64
RateTracker 1000000 limit, 1 obj count per request 177.96 ns/op 246.03 ns/op 0.72
RateTracker 1000000 limit, 2 obj count per request 130.16 ns/op 194.11 ns/op 0.67
RateTracker 1000000 limit, 4 obj count per request 106.12 ns/op 160.24 ns/op 0.66
RateTracker 1000000 limit, 8 obj count per request 94.140 ns/op 148.13 ns/op 0.64
RateTracker with prune 3.9510 us/op 6.8690 us/op 0.58
array of 16000 items push then shift 51.574 us/op 5.7998 us/op 8.89
LinkedList of 16000 items push then shift 12.220 ns/op 23.240 ns/op 0.53
array of 16000 items push then pop 205.38 ns/op 306.47 ns/op 0.67
LinkedList of 16000 items push then pop 12.005 ns/op 21.246 ns/op 0.57
array of 24000 items push then shift 77.349 us/op 8.6610 us/op 8.93
LinkedList of 24000 items push then shift 12.614 ns/op 23.878 ns/op 0.53
array of 24000 items push then pop 195.41 ns/op 299.46 ns/op 0.65
LinkedList of 24000 items push then pop 12.040 ns/op 20.141 ns/op 0.60
intersect bitArray bitLen 8 10.782 ns/op 13.651 ns/op 0.79
intersect array and set length 8 133.90 ns/op 248.37 ns/op 0.54
intersect bitArray bitLen 128 55.601 ns/op 78.999 ns/op 0.70
intersect array and set length 128 1.7327 us/op 2.8147 us/op 0.62
Buffer.concat 32 items 1.8240 ns/op 2.5340 ns/op 0.72
pass gossip attestations to forkchoice per slot 5.0137 ms/op 8.4102 ms/op 0.60
computeDeltas 4.7583 ms/op 6.5696 ms/op 0.72
computeProposerBoostScoreFromBalances 809.04 us/op 978.76 us/op 0.83
altair processAttestation - 250000 vs - 7PWei normalcase 3.3004 ms/op 6.2416 ms/op 0.53
altair processAttestation - 250000 vs - 7PWei worstcase 5.0317 ms/op 8.9642 ms/op 0.56
altair processAttestation - setStatus - 1/6 committees join 178.99 us/op 294.27 us/op 0.61
altair processAttestation - setStatus - 1/3 committees join 351.88 us/op 530.08 us/op 0.66
altair processAttestation - setStatus - 1/2 committees join 504.37 us/op 835.52 us/op 0.60
altair processAttestation - setStatus - 2/3 committees join 665.28 us/op 1.0622 ms/op 0.63
altair processAttestation - setStatus - 4/5 committees join 924.39 us/op 1.4124 ms/op 0.65
altair processAttestation - setStatus - 100% committees join 1.1253 ms/op 1.7425 ms/op 0.65
altair processBlock - 250000 vs - 7PWei normalcase 24.924 ms/op 34.297 ms/op 0.73
altair processBlock - 250000 vs - 7PWei normalcase hashState 38.369 ms/op 48.110 ms/op 0.80
altair processBlock - 250000 vs - 7PWei worstcase 74.272 ms/op 120.81 ms/op 0.61
altair processBlock - 250000 vs - 7PWei worstcase hashState 101.91 ms/op 128.26 ms/op 0.79
phase0 processBlock - 250000 vs - 7PWei normalcase 3.2074 ms/op 5.0348 ms/op 0.64
phase0 processBlock - 250000 vs - 7PWei worstcase 50.584 ms/op 63.113 ms/op 0.80
altair processEth1Data - 250000 vs - 7PWei normalcase 600.89 us/op 1.3191 ms/op 0.46
Tree 40 250000 create 678.19 ms/op 1.3128 s/op 0.52
Tree 40 250000 get(125000) 224.33 ns/op 354.21 ns/op 0.63
Tree 40 250000 set(125000) 2.0172 us/op 4.1474 us/op 0.49
Tree 40 250000 toArray() 26.888 ms/op 41.014 ms/op 0.66
Tree 40 250000 iterate all - toArray() + loop 26.273 ms/op 41.343 ms/op 0.64
Tree 40 250000 iterate all - get(i) 109.61 ms/op 146.30 ms/op 0.75
MutableVector 250000 create 12.294 ms/op 19.713 ms/op 0.62
MutableVector 250000 get(125000) 10.960 ns/op 15.535 ns/op 0.71
MutableVector 250000 set(125000) 479.58 ns/op 1.0730 us/op 0.45
MutableVector 250000 toArray() 5.5846 ms/op 9.3481 ms/op 0.60
MutableVector 250000 iterate all - toArray() + loop 5.7368 ms/op 9.2572 ms/op 0.62
MutableVector 250000 iterate all - get(i) 2.6984 ms/op 3.8198 ms/op 0.71
Array 250000 create 5.6020 ms/op 8.4663 ms/op 0.66
Array 250000 clone - spread 3.0102 ms/op 5.4978 ms/op 0.55
Array 250000 get(125000) 1.3870 ns/op 2.3070 ns/op 0.60
Array 250000 set(125000) 1.3890 ns/op 2.2400 ns/op 0.62
Array 250000 iterate all - loop 151.00 us/op 160.83 us/op 0.94
effectiveBalanceIncrements clone Uint8Array 300000 43.987 us/op 141.71 us/op 0.31
effectiveBalanceIncrements clone MutableVector 300000 975.00 ns/op 1.7340 us/op 0.56
effectiveBalanceIncrements rw all Uint8Array 300000 247.33 us/op 324.97 us/op 0.76
effectiveBalanceIncrements rw all MutableVector 300000 157.27 ms/op 368.18 ms/op 0.43
phase0 afterProcessEpoch - 250000 vs - 7PWei 191.68 ms/op 220.77 ms/op 0.87
phase0 beforeProcessEpoch - 250000 vs - 7PWei 74.870 ms/op 118.71 ms/op 0.63
altair processEpoch - mainnet_e81889 533.54 ms/op 741.01 ms/op 0.72
mainnet_e81889 - altair beforeProcessEpoch 111.53 ms/op 226.81 ms/op 0.49
mainnet_e81889 - altair processJustificationAndFinalization 18.058 us/op 73.871 us/op 0.24
mainnet_e81889 - altair processInactivityUpdates 8.7674 ms/op 13.158 ms/op 0.67
mainnet_e81889 - altair processRewardsAndPenalties 76.503 ms/op 119.86 ms/op 0.64
mainnet_e81889 - altair processRegistryUpdates 2.7320 us/op 16.760 us/op 0.16
mainnet_e81889 - altair processSlashings 522.00 ns/op 4.4190 us/op 0.12
mainnet_e81889 - altair processEth1DataReset 586.00 ns/op 4.2590 us/op 0.14
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.9242 ms/op 2.7159 ms/op 0.71
mainnet_e81889 - altair processSlashingsReset 3.9970 us/op 25.789 us/op 0.15
mainnet_e81889 - altair processRandaoMixesReset 4.1820 us/op 24.104 us/op 0.17
mainnet_e81889 - altair processHistoricalRootsUpdate 624.00 ns/op 4.1040 us/op 0.15
mainnet_e81889 - altair processParticipationFlagUpdates 2.2290 us/op 17.117 us/op 0.13
mainnet_e81889 - altair processSyncCommitteeUpdates 499.00 ns/op 3.6130 us/op 0.14
mainnet_e81889 - altair afterProcessEpoch 200.66 ms/op 231.66 ms/op 0.87
phase0 processEpoch - mainnet_e58758 481.04 ms/op 704.56 ms/op 0.68
mainnet_e58758 - phase0 beforeProcessEpoch 175.09 ms/op 313.21 ms/op 0.56
mainnet_e58758 - phase0 processJustificationAndFinalization 15.442 us/op 58.194 us/op 0.27
mainnet_e58758 - phase0 processRewardsAndPenalties 112.97 ms/op 174.88 ms/op 0.65
mainnet_e58758 - phase0 processRegistryUpdates 8.1560 us/op 30.360 us/op 0.27
mainnet_e58758 - phase0 processSlashings 578.00 ns/op 3.4800 us/op 0.17
mainnet_e58758 - phase0 processEth1DataReset 603.00 ns/op 3.4600 us/op 0.17
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.8566 ms/op 2.2598 ms/op 0.82
mainnet_e58758 - phase0 processSlashingsReset 3.4240 us/op 15.587 us/op 0.22
mainnet_e58758 - phase0 processRandaoMixesReset 3.9050 us/op 24.866 us/op 0.16
mainnet_e58758 - phase0 processHistoricalRootsUpdate 655.00 ns/op 4.0530 us/op 0.16
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.3090 us/op 23.242 us/op 0.14
mainnet_e58758 - phase0 afterProcessEpoch 164.91 ms/op 191.98 ms/op 0.86
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.9418 ms/op 2.7010 ms/op 0.72
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.2277 ms/op 3.8288 ms/op 0.58
altair processInactivityUpdates - 250000 normalcase 38.495 ms/op 54.419 ms/op 0.71
altair processInactivityUpdates - 250000 worstcase 32.730 ms/op 71.041 ms/op 0.46
phase0 processRegistryUpdates - 250000 normalcase 6.3490 us/op 27.700 us/op 0.23
phase0 processRegistryUpdates - 250000 badcase_full_deposits 379.52 us/op 543.57 us/op 0.70
phase0 processRegistryUpdates - 250000 worstcase 0.5 163.99 ms/op 284.92 ms/op 0.58
altair processRewardsAndPenalties - 250000 normalcase 74.457 ms/op 164.29 ms/op 0.45
altair processRewardsAndPenalties - 250000 worstcase 94.207 ms/op 110.32 ms/op 0.85
phase0 getAttestationDeltas - 250000 normalcase 11.930 ms/op 16.218 ms/op 0.74
phase0 getAttestationDeltas - 250000 worstcase 12.185 ms/op 16.000 ms/op 0.76
phase0 processSlashings - 250000 worstcase 5.0123 ms/op 7.8036 ms/op 0.64
altair processSyncCommitteeUpdates - 250000 292.23 ms/op 384.72 ms/op 0.76
BeaconState.hashTreeRoot - No change 519.00 ns/op 653.00 ns/op 0.79
BeaconState.hashTreeRoot - 1 full validator 69.513 us/op 87.043 us/op 0.80
BeaconState.hashTreeRoot - 32 full validator 650.93 us/op 837.29 us/op 0.78
BeaconState.hashTreeRoot - 512 full validator 6.8243 ms/op 9.2583 ms/op 0.74
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 88.775 us/op 109.56 us/op 0.81
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.2914 ms/op 1.7074 ms/op 0.76
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 18.311 ms/op 22.648 ms/op 0.81
BeaconState.hashTreeRoot - 1 balances 67.029 us/op 84.047 us/op 0.80
BeaconState.hashTreeRoot - 32 balances 662.24 us/op 816.36 us/op 0.81
BeaconState.hashTreeRoot - 512 balances 6.3087 ms/op 8.5786 ms/op 0.74
BeaconState.hashTreeRoot - 250000 balances 104.41 ms/op 115.65 ms/op 0.90
aggregationBits - 2048 els - zipIndexesInBitList 24.580 us/op 45.048 us/op 0.55
regular array get 100000 times 60.812 us/op 62.479 us/op 0.97
wrappedArray get 100000 times 61.464 us/op 62.226 us/op 0.99
arrayWithProxy get 100000 times 27.002 ms/op 39.091 ms/op 0.69
ssz.Root.equals 451.00 ns/op 638.00 ns/op 0.71
byteArrayEquals 444.00 ns/op 622.00 ns/op 0.71
shuffle list - 16384 els 11.629 ms/op 13.600 ms/op 0.86
shuffle list - 250000 els 170.41 ms/op 196.10 ms/op 0.87
processSlot - 1 slots 12.587 us/op 18.591 us/op 0.68
processSlot - 32 slots 1.9255 ms/op 2.6842 ms/op 0.72
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 405.54 us/op 1.0482 ms/op 0.39
getCommitteeAssignments - req 1 vs - 250000 vc 5.3800 ms/op 6.0629 ms/op 0.89
getCommitteeAssignments - req 100 vs - 250000 vc 7.8499 ms/op 8.8378 ms/op 0.89
getCommitteeAssignments - req 1000 vs - 250000 vc 8.4680 ms/op 9.5892 ms/op 0.88
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 8.4800 ns/op 12.260 ns/op 0.69
state getBlockRootAtSlot - 250000 vs - 7PWei 1.0730 us/op 1.7317 us/op 0.62
computeProposers - vc 250000 17.496 ms/op 23.357 ms/op 0.75
computeEpochShuffling - vc 250000 173.07 ms/op 200.24 ms/op 0.86
getNextSyncCommittee - vc 250000 290.01 ms/op 370.92 ms/op 0.78

by benchmarkbot/action

jobWaitTime: register.histogram({
name: "lodestar_engine_http_processor_queue_job_wait_time_seconds",
help: "Time from job added to the engine http processor queue to starting in seconds",
buckets: [0.1, 1, 10, 100],
Copy link
Contributor

@dapplion dapplion Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bucket values must be realistic on expected values else not useful at all. If this sort of x10 prev value buckets are still in Lodestar is because we add them a long time ago, or we haven't figured out the right values. The wait time metric is crucial so buckets must be good. See below as example of buckets values specific to a metric given expected values

// Epoch transitions are 100ms on very fast clients, and average 800ms on heavy networks
buckets: [0.01, 0.05, 0.1, 0.2, 0.5, 0.75, 1, 1.25, 1.5, 3, 10],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm right, most of the time will be taken up by new payload, and it can vary from 100s of ms to 2-4 seconds (if EL is really slow) and may be on average 300-400ms, so should be bucket for [0.1, 0.5, 2, 5] ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then do for example

[0.05, 0.1, 0.2, 0.3, 0.5, 0.75, 1, 2, 5, 10, 25]

method,
params: [payloadId],
methodOpts: getPayloadOpts,
}) as Promise<EngineApiRpcReturnTypes[typeof method]>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPayload doesn't need to be sequential right? Don't feel it needs to be queued

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, can keep it out of the queue, but ideally there shouldn't be anything else in the queue at that moment anyways. still can keep it out.

@@ -75,6 +78,14 @@ const exchageTransitionConfigOpts: ReqOpts = {routeId: "exchangeTransitionConfig
export class ExecutionEngineHttp implements IExecutionEngine {
readonly payloadIdCache = new PayloadIdCache();
private readonly rpc: IJsonRpcHttpClient;
private readonly jobQueue: JobItemQueue<[EngineRequest], EngineResponse>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's not generic to multiple jobs I would call it rpcFetchQueue or something like that. Also add a comment explaining why this is necessary and reason which methods need it and which do not

@@ -57,6 +58,8 @@ export const defaultExecutionEngineHttpOpts: ExecutionEngineHttpOpts = {
timeout: 12000,
};

const QUEUE_MAX_LENGHT = 256;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a JSDoc comment reasoning the choice of this number. What the tradeoff here? A job queue holds items in memory which in this case are full de-serialized blocks. Are there any memory concerns if the queue fills? What's a reasonable length given the usage of this queue? Since blocks are imported strictly sequential the queue length should always be pretty short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check from the blocks processor queue what limit it has, ideally there could be fcUs corresponding to the last batch of blocks plus new payload

@g11tech g11tech enabled auto-merge (squash) October 9, 2022 16:36
* Size for the serializing queue for fcUs and new payloads, the max length could be equal to
* EPOCHS_PER_BATCH * 2 in case new payloads are also not awaited serially
*/
const QUEUE_MAX_LENGHT = EPOCHS_PER_BATCH * SLOTS_PER_EPOCH * 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lenght -> length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

Copy link
Member

@wemeetagain wemeetagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, appreciate the metrics

@g11tech g11tech merged commit 711151b into unstable Oct 10, 2022
@g11tech g11tech deleted the g11tech/serialize-engine-calls branch October 10, 2022 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants