Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: generating and using flamegraphs #5519

Merged
merged 6 commits into from
May 23, 2023
Merged

Conversation

matthewkeil
Copy link
Member

@matthewkeil matthewkeil commented May 22, 2023

Motivation

We often need do do processor profiling and using the --inspect flag has detrimental effect on performance. It is also difficult to profile for a short period of time and revert performance back to normal so its not great for production nodes. The existing flamegraph libraries (0x is a good example) are not capable of turning on and off for a node in production to get real world data.

Description

Includes instructions for how to do cpu profiling on a running production node.

Closes #5341

Steps to test or reproduce

Follow instructions in docs/tools/flamegraphs.md

Note for Future

There is a flamegraph panel for Grafana so it should be possible to pull them with a cron job and visualize from our dashboard

https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/flame-graph/

@matthewkeil matthewkeil requested a review from a team as a code owner May 22, 2023 14:27
@github-actions
Copy link
Contributor

github-actions bot commented May 22, 2023

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 3fefefb Previous: 42c8097 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 806.37 us/op 1.0613 ms/op 0.76
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 43.867 us/op 43.243 us/op 1.01
BLS verify - blst-native 1.1685 ms/op 1.1676 ms/op 1.00
BLS verifyMultipleSignatures 3 - blst-native 2.3749 ms/op 2.3748 ms/op 1.00
BLS verifyMultipleSignatures 8 - blst-native 5.0743 ms/op 5.1080 ms/op 0.99
BLS verifyMultipleSignatures 32 - blst-native 18.332 ms/op 18.442 ms/op 0.99
BLS aggregatePubkeys 32 - blst-native 24.420 us/op 24.713 us/op 0.99
BLS aggregatePubkeys 128 - blst-native 98.140 us/op 96.263 us/op 1.02
getAttestationsForBlock 51.103 ms/op 60.538 ms/op 0.84
isKnown best case - 1 super set check 262.00 ns/op 280.00 ns/op 0.94
isKnown normal case - 2 super set checks 261.00 ns/op 272.00 ns/op 0.96
isKnown worse case - 16 super set checks 255.00 ns/op 274.00 ns/op 0.93
CheckpointStateCache - add get delete 4.8670 us/op 4.8420 us/op 1.01
validate gossip signedAggregateAndProof - struct 2.6579 ms/op 2.6868 ms/op 0.99
validate gossip attestation - struct 1.2689 ms/op 1.2880 ms/op 0.99
pickEth1Vote - no votes 1.1781 ms/op 1.2135 ms/op 0.97
pickEth1Vote - max votes 10.629 ms/op 13.496 ms/op 0.79
pickEth1Vote - Eth1Data hashTreeRoot value x2048 9.0748 ms/op 8.0801 ms/op 1.12
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 14.184 ms/op 15.545 ms/op 0.91
pickEth1Vote - Eth1Data fastSerialize value x2048 626.60 us/op 621.78 us/op 1.01
pickEth1Vote - Eth1Data fastSerialize tree x2048 7.9202 ms/op 9.0688 ms/op 0.87
bytes32 toHexString 481.00 ns/op 481.00 ns/op 1.00
bytes32 Buffer.toString(hex) 336.00 ns/op 326.00 ns/op 1.03
bytes32 Buffer.toString(hex) from Uint8Array 522.00 ns/op 545.00 ns/op 0.96
bytes32 Buffer.toString(hex) + 0x 328.00 ns/op 330.00 ns/op 0.99
Object access 1 prop 0.15300 ns/op 0.15800 ns/op 0.97
Map access 1 prop 0.14800 ns/op 0.15700 ns/op 0.94
Object get x1000 6.2760 ns/op 6.4070 ns/op 0.98
Map get x1000 0.58800 ns/op 0.58600 ns/op 1.00
Object set x1000 49.163 ns/op 49.065 ns/op 1.00
Map set x1000 41.297 ns/op 40.643 ns/op 1.02
Return object 10000 times 0.23470 ns/op 0.22650 ns/op 1.04
Throw Error 10000 times 4.1249 us/op 4.0484 us/op 1.02
fastMsgIdFn sha256 / 200 bytes 3.3700 us/op 3.2890 us/op 1.02
fastMsgIdFn h32 xxhash / 200 bytes 278.00 ns/op 268.00 ns/op 1.04
fastMsgIdFn h64 xxhash / 200 bytes 386.00 ns/op 376.00 ns/op 1.03
fastMsgIdFn sha256 / 1000 bytes 11.428 us/op 11.226 us/op 1.02
fastMsgIdFn h32 xxhash / 1000 bytes 422.00 ns/op 391.00 ns/op 1.08
fastMsgIdFn h64 xxhash / 1000 bytes 466.00 ns/op 442.00 ns/op 1.05
fastMsgIdFn sha256 / 10000 bytes 103.53 us/op 100.28 us/op 1.03
fastMsgIdFn h32 xxhash / 10000 bytes 1.8780 us/op 1.8250 us/op 1.03
fastMsgIdFn h64 xxhash / 10000 bytes 1.3310 us/op 1.2970 us/op 1.03
enrSubnets - fastDeserialize 64 bits 1.2750 us/op 1.1940 us/op 1.07
enrSubnets - ssz BitVector 64 bits 497.00 ns/op 468.00 ns/op 1.06
enrSubnets - fastDeserialize 4 bits 171.00 ns/op 160.00 ns/op 1.07
enrSubnets - ssz BitVector 4 bits 493.00 ns/op 467.00 ns/op 1.06
prioritizePeers score -10:0 att 32-0.1 sync 2-0 104.22 us/op 100.82 us/op 1.03
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 132.38 us/op 124.55 us/op 1.06
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 159.02 us/op 158.29 us/op 1.00
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 287.02 us/op 284.58 us/op 1.01
prioritizePeers score 0:0 att 64-1 sync 4-1 343.80 us/op 340.36 us/op 1.01
array of 16000 items push then shift 1.5618 us/op 1.5654 us/op 1.00
LinkedList of 16000 items push then shift 8.3650 ns/op 8.4760 ns/op 0.99
array of 16000 items push then pop 80.986 ns/op 68.754 ns/op 1.18
LinkedList of 16000 items push then pop 8.1120 ns/op 8.2180 ns/op 0.99
array of 24000 items push then shift 2.2389 us/op 2.2666 us/op 0.99
LinkedList of 24000 items push then shift 8.6540 ns/op 10.483 ns/op 0.83
array of 24000 items push then pop 73.826 ns/op 69.557 ns/op 1.06
LinkedList of 24000 items push then pop 8.2230 ns/op 9.0500 ns/op 0.91
intersect bitArray bitLen 8 12.817 ns/op 12.717 ns/op 1.01
intersect array and set length 8 74.720 ns/op 82.971 ns/op 0.90
intersect bitArray bitLen 128 42.750 ns/op 42.240 ns/op 1.01
intersect array and set length 128 1.0148 us/op 991.39 ns/op 1.02
Buffer.concat 32 items 2.8780 us/op 2.4890 us/op 1.16
Uint8Array.set 32 items 2.9320 us/op 2.4760 us/op 1.18
pass gossip attestations to forkchoice per slot 2.7117 ms/op 2.1748 ms/op 1.25
computeDeltas 3.4524 ms/op 4.3628 ms/op 0.79
computeProposerBoostScoreFromBalances 1.7982 ms/op 1.7557 ms/op 1.02
altair processAttestation - 250000 vs - 7PWei normalcase 2.1148 ms/op 2.1166 ms/op 1.00
altair processAttestation - 250000 vs - 7PWei worstcase 3.3210 ms/op 3.3155 ms/op 1.00
altair processAttestation - setStatus - 1/6 committees join 139.89 us/op 135.00 us/op 1.04
altair processAttestation - setStatus - 1/3 committees join 278.69 us/op 267.43 us/op 1.04
altair processAttestation - setStatus - 1/2 committees join 367.11 us/op 358.37 us/op 1.02
altair processAttestation - setStatus - 2/3 committees join 472.26 us/op 441.90 us/op 1.07
altair processAttestation - setStatus - 4/5 committees join 646.75 us/op 629.50 us/op 1.03
altair processAttestation - setStatus - 100% committees join 761.01 us/op 740.91 us/op 1.03
altair processBlock - 250000 vs - 7PWei normalcase 17.204 ms/op 21.518 ms/op 0.80
altair processBlock - 250000 vs - 7PWei normalcase hashState 27.498 ms/op 30.202 ms/op 0.91
altair processBlock - 250000 vs - 7PWei worstcase 54.575 ms/op 64.927 ms/op 0.84
altair processBlock - 250000 vs - 7PWei worstcase hashState 65.694 ms/op 81.709 ms/op 0.80
phase0 processBlock - 250000 vs - 7PWei normalcase 1.9885 ms/op 1.9716 ms/op 1.01
phase0 processBlock - 250000 vs - 7PWei worstcase 28.900 ms/op 26.871 ms/op 1.08
altair processEth1Data - 250000 vs - 7PWei normalcase 508.18 us/op 457.16 us/op 1.11
vc - 250000 eb 1 eth1 1 we 0 wn 0 - smpl 15 7.8920 us/op 6.5520 us/op 1.20
vc - 250000 eb 0.95 eth1 0.1 we 0.05 wn 0 - smpl 219 25.322 us/op 19.492 us/op 1.30
vc - 250000 eb 0.95 eth1 0.3 we 0.05 wn 0 - smpl 42 9.0400 us/op 12.523 us/op 0.72
vc - 250000 eb 0.95 eth1 0.7 we 0.05 wn 0 - smpl 18 7.4090 us/op 6.2550 us/op 1.18
vc - 250000 eb 0.1 eth1 0.1 we 0 wn 0 - smpl 1020 79.412 us/op 123.19 us/op 0.64
vc - 250000 eb 0.03 eth1 0.03 we 0 wn 0 - smpl 11777 661.26 us/op 1.2331 ms/op 0.54
vc - 250000 eb 0.01 eth1 0.01 we 0 wn 0 - smpl 16384 905.00 us/op 1.6548 ms/op 0.55
vc - 250000 eb 0 eth1 0 we 0 wn 0 - smpl 16384 890.57 us/op 1.6198 ms/op 0.55
vc - 250000 eb 0 eth1 0 we 0 wn 0 nocache - smpl 16384 2.3803 ms/op 2.1067 ms/op 1.13
vc - 250000 eb 0 eth1 1 we 0 wn 0 - smpl 16384 1.4991 ms/op 2.2863 ms/op 0.66
vc - 250000 eb 0 eth1 1 we 0 wn 0 nocache - smpl 16384 3.7541 ms/op 3.7238 ms/op 1.01
Tree 40 250000 create 290.50 ms/op 293.31 ms/op 0.99
Tree 40 250000 get(125000) 183.28 ns/op 172.07 ns/op 1.07
Tree 40 250000 set(125000) 906.27 ns/op 856.72 ns/op 1.06
Tree 40 250000 toArray() 19.634 ms/op 16.552 ms/op 1.19
Tree 40 250000 iterate all - toArray() + loop 19.295 ms/op 16.592 ms/op 1.16
Tree 40 250000 iterate all - get(i) 68.133 ms/op 64.983 ms/op 1.05
MutableVector 250000 create 9.2058 ms/op 17.760 ms/op 0.52
MutableVector 250000 get(125000) 6.3920 ns/op 6.3060 ns/op 1.01
MutableVector 250000 set(125000) 269.13 ns/op 253.38 ns/op 1.06
MutableVector 250000 toArray() 2.7293 ms/op 2.8382 ms/op 0.96
MutableVector 250000 iterate all - toArray() + loop 2.8753 ms/op 2.9702 ms/op 0.97
MutableVector 250000 iterate all - get(i) 1.5352 ms/op 1.4740 ms/op 1.04
Array 250000 create 2.9296 ms/op 2.6621 ms/op 1.10
Array 250000 clone - spread 1.1775 ms/op 1.1076 ms/op 1.06
Array 250000 get(125000) 0.55600 ns/op 0.50700 ns/op 1.10
Array 250000 set(125000) 0.63000 ns/op 0.59000 ns/op 1.07
Array 250000 iterate all - loop 84.236 us/op 80.047 us/op 1.05
effectiveBalanceIncrements clone Uint8Array 300000 32.245 us/op 23.934 us/op 1.35
effectiveBalanceIncrements clone MutableVector 300000 358.00 ns/op 310.00 ns/op 1.15
effectiveBalanceIncrements rw all Uint8Array 300000 172.78 us/op 162.89 us/op 1.06
effectiveBalanceIncrements rw all MutableVector 300000 79.496 ms/op 75.310 ms/op 1.06
phase0 afterProcessEpoch - 250000 vs - 7PWei 115.34 ms/op 108.16 ms/op 1.07
phase0 beforeProcessEpoch - 250000 vs - 7PWei 42.066 ms/op 42.801 ms/op 0.98
altair processEpoch - mainnet_e81889 321.66 ms/op 339.76 ms/op 0.95
mainnet_e81889 - altair beforeProcessEpoch 61.085 ms/op 67.149 ms/op 0.91
mainnet_e81889 - altair processJustificationAndFinalization 16.104 us/op 27.114 us/op 0.59
mainnet_e81889 - altair processInactivityUpdates 4.9570 ms/op 5.2981 ms/op 0.94
mainnet_e81889 - altair processRewardsAndPenalties 67.400 ms/op 85.659 ms/op 0.79
mainnet_e81889 - altair processRegistryUpdates 2.9970 us/op 4.6460 us/op 0.65
mainnet_e81889 - altair processSlashings 445.00 ns/op 791.00 ns/op 0.56
mainnet_e81889 - altair processEth1DataReset 527.00 ns/op 948.00 ns/op 0.56
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.2019 ms/op 2.2928 ms/op 0.52
mainnet_e81889 - altair processSlashingsReset 5.1180 us/op 5.6860 us/op 0.90
mainnet_e81889 - altair processRandaoMixesReset 4.4090 us/op 7.2820 us/op 0.61
mainnet_e81889 - altair processHistoricalRootsUpdate 619.00 ns/op 1.1370 us/op 0.54
mainnet_e81889 - altair processParticipationFlagUpdates 2.4400 us/op 4.2150 us/op 0.58
mainnet_e81889 - altair processSyncCommitteeUpdates 506.00 ns/op 804.00 ns/op 0.63
mainnet_e81889 - altair afterProcessEpoch 117.60 ms/op 115.27 ms/op 1.02
phase0 processEpoch - mainnet_e58758 313.70 ms/op 360.22 ms/op 0.87
mainnet_e58758 - phase0 beforeProcessEpoch 120.20 ms/op 130.60 ms/op 0.92
mainnet_e58758 - phase0 processJustificationAndFinalization 14.709 us/op 25.605 us/op 0.57
mainnet_e58758 - phase0 processRewardsAndPenalties 54.111 ms/op 56.334 ms/op 0.96
mainnet_e58758 - phase0 processRegistryUpdates 7.6100 us/op 12.314 us/op 0.62
mainnet_e58758 - phase0 processSlashings 476.00 ns/op 960.00 ns/op 0.50
mainnet_e58758 - phase0 processEth1DataReset 529.00 ns/op 1.0270 us/op 0.52
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 948.19 us/op 1.8749 ms/op 0.51
mainnet_e58758 - phase0 processSlashingsReset 3.2510 us/op 5.1290 us/op 0.63
mainnet_e58758 - phase0 processRandaoMixesReset 4.3710 us/op 7.8540 us/op 0.56
mainnet_e58758 - phase0 processHistoricalRootsUpdate 636.00 ns/op 1.0980 us/op 0.58
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.8270 us/op 6.9040 us/op 0.55
mainnet_e58758 - phase0 afterProcessEpoch 93.741 ms/op 100.87 ms/op 0.93
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.1902 ms/op 1.9954 ms/op 0.60
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.4481 ms/op 2.6120 ms/op 0.55
altair processInactivityUpdates - 250000 normalcase 19.427 ms/op 26.111 ms/op 0.74
altair processInactivityUpdates - 250000 worstcase 21.896 ms/op 25.465 ms/op 0.86
phase0 processRegistryUpdates - 250000 normalcase 6.3280 us/op 10.292 us/op 0.61
phase0 processRegistryUpdates - 250000 badcase_full_deposits 225.20 us/op 390.32 us/op 0.58
phase0 processRegistryUpdates - 250000 worstcase 0.5 124.68 ms/op 138.39 ms/op 0.90
altair processRewardsAndPenalties - 250000 normalcase 54.562 ms/op 63.675 ms/op 0.86
altair processRewardsAndPenalties - 250000 worstcase 47.699 ms/op 68.111 ms/op 0.70
phase0 getAttestationDeltas - 250000 normalcase 6.2835 ms/op 6.4775 ms/op 0.97
phase0 getAttestationDeltas - 250000 worstcase 6.4022 ms/op 6.6190 ms/op 0.97
phase0 processSlashings - 250000 worstcase 3.5192 ms/op 3.7086 ms/op 0.95
altair processSyncCommitteeUpdates - 250000 169.91 ms/op 181.22 ms/op 0.94
BeaconState.hashTreeRoot - No change 261.00 ns/op 265.00 ns/op 0.98
BeaconState.hashTreeRoot - 1 full validator 49.449 us/op 60.598 us/op 0.82
BeaconState.hashTreeRoot - 32 full validator 518.20 us/op 585.76 us/op 0.88
BeaconState.hashTreeRoot - 512 full validator 5.1525 ms/op 6.2328 ms/op 0.83
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 62.102 us/op 65.393 us/op 0.95
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 874.53 us/op 1.0270 ms/op 0.85
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 11.260 ms/op 10.905 ms/op 1.03
BeaconState.hashTreeRoot - 1 balances 48.681 us/op 49.977 us/op 0.97
BeaconState.hashTreeRoot - 32 balances 436.41 us/op 618.73 us/op 0.71
BeaconState.hashTreeRoot - 512 balances 4.2844 ms/op 4.9881 ms/op 0.86
BeaconState.hashTreeRoot - 250000 balances 74.608 ms/op 78.168 ms/op 0.95
aggregationBits - 2048 els - zipIndexesInBitList 16.756 us/op 15.482 us/op 1.08
regular array get 100000 times 32.736 us/op 37.800 us/op 0.87
wrappedArray get 100000 times 32.746 us/op 31.512 us/op 1.04
arrayWithProxy get 100000 times 15.995 ms/op 15.290 ms/op 1.05
ssz.Root.equals 553.00 ns/op 525.00 ns/op 1.05
byteArrayEquals 540.00 ns/op 516.00 ns/op 1.05
shuffle list - 16384 els 6.8028 ms/op 6.4694 ms/op 1.05
shuffle list - 250000 els 99.671 ms/op 94.987 ms/op 1.05
processSlot - 1 slots 8.1350 us/op 8.4010 us/op 0.97
processSlot - 32 slots 1.2903 ms/op 1.2988 ms/op 0.99
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 34.147 ms/op 41.535 ms/op 0.82
getCommitteeAssignments - req 1 vs - 250000 vc 2.8232 ms/op 2.7731 ms/op 1.02
getCommitteeAssignments - req 100 vs - 250000 vc 3.9923 ms/op 3.9380 ms/op 1.01
getCommitteeAssignments - req 1000 vs - 250000 vc 4.3171 ms/op 4.4323 ms/op 0.97
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 4.3600 ns/op 4.4400 ns/op 0.98
state getBlockRootAtSlot - 250000 vs - 7PWei 790.44 ns/op 960.69 ns/op 0.82
computeProposers - vc 250000 10.433 ms/op 10.457 ms/op 1.00
computeEpochShuffling - vc 250000 101.57 ms/op 99.694 ms/op 1.02
getNextSyncCommittee - vc 250000 173.56 ms/op 170.20 ms/op 1.02
computeSigningRoot for AttestationData 13.317 us/op 12.652 us/op 1.05
hash AttestationData serialized data then Buffer.toString(base64) 2.4144 us/op 2.3378 us/op 1.03
toHexString serialized data 1.1059 us/op 1.0404 us/op 1.06
Buffer.toString(base64) 315.27 ns/op 309.47 ns/op 1.02

by benchmarkbot/action

Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice guide thank you! some minor comments

docs/tools/flamegraphs.md Outdated Show resolved Hide resolved
docs/tools/flamegraphs.md Outdated Show resolved Hide resolved
docs/tools/flamegraphs.md Show resolved Hide resolved
@matthewkeil matthewkeil requested a review from dapplion May 23, 2023 00:28
@matthewkeil matthewkeil added this to the v1.9.0 milestone May 23, 2023
@matthewkeil
Copy link
Member Author

@dapplion I went through the issue one last time to make sure i covered everything and noticed that I had some references stored in my work repo. I added them to the bottom.

Copy link
Contributor

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks!

@matthewkeil matthewkeil merged commit dca0c87 into unstable May 23, 2023
@matthewkeil matthewkeil deleted the mkeil/flamegraph-docs branch May 23, 2023 18:38
@wemeetagain
Copy link
Member

🎉 This PR is included in v1.9.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document getting flamechart of live node
3 participants