Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: batch hash tree root #378

Draft
wants to merge 118 commits into
base: master
Choose a base branch
from
Draft

Conversation

matthewkeil
Copy link
Member

@matthewkeil matthewkeil commented Jun 16, 2024

Motivation

Created on behalf of @twoeths. Keeping in draft for now but using to store some metrics from different iterations of the changes.

Update system to allow for batch hashing of trees. SIMD instruction set allows for parallel processing of hashes when running on a single core. Add the idea of levels to the hashing process to allow deeper sections of a tree to be processed before higher levels and then computes the hashes according to depth of the tree.

Description

  • New hashtree hasher for batch hash

  • Minimal memory allocation:

    • use same memory allocated in type to hash all objects of the same type
    • use same memory allocated in ListValidatorViewDU to hash all validators in batch
  • For type.hashTreeRoot():

    • Preallocate memory at type level, a type.hashTreeRoot() almost does not need to allocate any temporary Uint8Arrays
      • Child fields are hashed into a preallocated memory of parent
    • Implement merkleizeInto() in hasher, this also does not allocate memory, and it uses batch hash
  • For ViewDU:

    • Model HashComputation = {left: Node; right: Node; dest: Node}
    • Compute HashComputations at different level from the merkle tree
    • Execute the HashComputations using batch hash, order is bottom up
  • For BeaconState, it contains validators as ContainerNodeStructViewDUs

    • Overwrite ListValidatorViewDU
    • We already track viewsChanged there, when commit() we compute validator roots in batch using preallocated Uint8Array
    • Save a lot of memory by avoid tree creation for validator type

Benchmark result on Mac M1

  • 3x faster on a BeaconBlock.hashTreeRoot() call, data comes from a typical mainnet block with 200 transactions
  • 7x faster on a BeaconState of 200k validators, 100k are modified

TODOs

  • resolve TODO - batch
  • polish the branch
  • create smaller PRs for as-sha256, persistent-merkle-tree and ssz

part of #355
closes #78

@CLAassistant
Copy link

CLAassistant commented Jun 16, 2024

CLA assistant check
All committers have signed the CLA.

@matthewkeil
Copy link
Member Author

matthewkeil commented Jun 16, 2024

Ran perf tests on several machines. In all instances i stopped the major services checked with top that nothing substantial was running before starting the tests

ovh-***-***-ksm-0 (x86)

  hasher
    as-sha256
      ✓ hash 2 Uint8Array 500000 times - as-sha256                          2.356940 ops/s    424.2789 ms/op        -          3 runs   17.0 s
      ✓ hashTwoObjects 500000 times - as-sha256                             2.534535 ops/s    394.5497 ms/op        -          3 runs   15.8 s
      ✓ executeHashComputations - as-sha256                                 24.87851 ops/s    40.19533 ms/op        -         23 runs   3.32 s
    noble
      ✓ hash 2 Uint8Array 500000 times - noble                              1.146827 ops/s    871.9712 ms/op        -          3 runs   34.9 s
      ✓ hashTwoObjects 500000 times - noble                                0.7592710 ops/s    1.317053  s/op        -          3 runs   52.7 s
      ✓ executeHashComputations - noble                                     19.65518 ops/s    50.87717 ms/op        -          5 runs   1.61 s
    hashtree
      ✓ hash 2 Uint8Array 500000 times - hashtree                           2.456077 ops/s    407.1533 ms/op        -          3 runs   16.3 s
      ✓ hashTwoObjects 500000 times - hashtree                              2.523110 ops/s    396.3362 ms/op        -          3 runs   16.1 s
      ✓ executeHashComputations - hashtree                                  67.66811 ops/s    14.77801 ms/op        -         21 runs   3.62 s

matthewkeil-ax41x (x86)

  hasher
    as-sha256
      ✓ hash 2 Uint8Array 500000 times - as-sha256                          1.996066 ops/s    500.9855 ms/op        -          3 runs   20.0 s
      ✓ hashTwoObjects 500000 times - as-sha256                             2.187445 ops/s    457.1543 ms/op        -          3 runs   18.3 s
      ✓ executeHashComputations - as-sha256                                 19.78581 ops/s    50.54128 ms/op        -         16 runs   2.90 s
    noble
      ✓ hash 2 Uint8Array 500000 times - noble                             0.8846945 ops/s    1.130334  s/op        -          3 runs   45.2 s
      ✓ hashTwoObjects 500000 times - noble                                0.5145042 ops/s    1.943619  s/op        -          3 runs   77.8 s
      ✓ executeHashComputations - noble                                     16.45166 ops/s    60.78414 ms/op        -         20 runs   4.19 s
    hashtree
      ✓ hash 2 Uint8Array 500000 times - hashtree                           3.276744 ops/s    305.1810 ms/op        -          3 runs   12.2 s
      ✓ hashTwoObjects 500000 times - hashtree                              3.924958 ops/s    254.7798 ms/op        -          3 runs   10.2 s
      ✓ executeHashComputations - hashtree                                  92.09554 ops/s    10.85829 ms/op        -         62 runs   9.26 s

***-novc-***-cax11 (arm64)

  hasher
    as-sha256
      ✓ hash 2 Uint8Array 500000 times - as-sha256                          1.626183 ops/s    614.9369 ms/op        -          3 runs   24.6 s
      ✓ hashTwoObjects 500000 times - as-sha256                             1.709126 ops/s    585.0945 ms/op        -          3 runs   23.4 s
      ✓ executeHashComputations - as-sha256                                 13.50287 ops/s    74.05835 ms/op        -          9 runs   2.54 s
    noble
      ✓ hash 2 Uint8Array 500000 times - noble                             0.7180529 ops/s    1.392655  s/op        -          3 runs   55.6 s
      ✓ hashTwoObjects 500000 times - noble                                0.3339791 ops/s    2.994199  s/op        -          3 runs    120 s
      ✓ executeHashComputations - noble                                     13.06625 ops/s    76.53303 ms/op        -          9 runs   2.67 s
    hashtree
      ✓ hash 2 Uint8Array 500000 times - hashtree                           2.472214 ops/s    404.4957 ms/op        -          3 runs   16.2 s
      ✓ hashTwoObjects 500000 times - hashtree                              2.888047 ops/s    346.2547 ms/op        -          3 runs   13.9 s
      ✓ executeHashComputations - hashtree                                  32.63111 ops/s    30.64560 ms/op        -         27 runs   5.51 s

Locally on M1 mac

  hasher
    as-sha256
      ✓ hash 2 Uint8Array 500000 times - as-sha256                          2.710482 ops/s    368.9380 ms/op        -          3 runs   14.8 s
      ✓ hashTwoObjects 500000 times - as-sha256                             2.882722 ops/s    346.8944 ms/op        -          3 runs   13.9 s
      ✓ executeHashComputations - as-sha256                                 31.00534 ops/s    32.25251 ms/op        -          7 runs   1.36 s
    noble
      ✓ hash 2 Uint8Array 500000 times - noble                              1.384059 ops/s    722.5124 ms/op        -          3 runs   28.9 s
      ✓ hashTwoObjects 500000 times - noble                                0.8926629 ops/s    1.120244  s/op        -          3 runs   45.0 s
      ✓ executeHashComputations - noble                                     24.98550 ops/s    40.02322 ms/op        -         36 runs   3.75 s
    hashtree
      ✓ hash 2 Uint8Array 500000 times - hashtree                           4.368933 ops/s    228.8888 ms/op        -          4 runs   11.5 s
      ✓ hashTwoObjects 500000 times - hashtree                              4.910207 ops/s    203.6574 ms/op        -          4 runs   10.2 s
      ✓ executeHashComputations - hashtree                                  148.0720 ops/s    6.753470 ms/op        -         96 runs   6.95 s

@twoeths twoeths changed the title Te/batch hash tree root feat: batch hash tree root Jul 1, 2024
Copy link

github-actions bot commented Jul 1, 2024

⚠️ Performance Alert ⚠️

Possible performance regression was detected for some benchmarks.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold.

Benchmark suite Current: 178a6c8 Previous: b4beed2 Ratio
batchHash 250000 nodes 314.05 ms/op 88.547 ms/op 3.55
get root 250000 nodes 816.77 ms/op 118.56 ms/op 6.89
batchHash 500000 nodes 564.12 ms/op 142.17 ms/op 3.97
get root 500000 nodes 1.5257 s/op 238.10 ms/op 6.41
batchHash 1000000 nodes 1.1228 s/op 328.90 ms/op 3.41
get root 1000000 nodes 3.0909 s/op 477.38 ms/op 6.47
250000 validators root getter 829.31 ms/op 117.93 ms/op 7.03
250000 validators batchHash() 287.17 ms/op 81.978 ms/op 3.50
BeaconState ViewDU hashTreeRoot() vc=200000 738.93 ms/op 101.07 ms/op 7.31

🚀🚀 Significant benchmark improvement detected

Benchmark suite Current: 178a6c8 Previous: b4beed2 Ratio
List(Validator-NS) 100000 struct -> binary 7.4331 ms/op 27.056 ms/op 0.27
Full benchmark results
Benchmark suite Current: 178a6c8 Previous: b4beed2 Ratio
digestTwoHashObjects 50023 times 47.887 ms/op 47.563 ms/op 1.01
digest64 50023 times 54.142 ms/op 53.231 ms/op 1.02
digest 50023 times 54.710 ms/op 54.726 ms/op 1.00
input length 32 1.2420 us/op 1.2230 us/op 1.02
input length 64 1.3520 us/op 1.3700 us/op 0.99
input length 128 2.3310 us/op 2.2950 us/op 1.02
input length 256 3.4570 us/op 3.4350 us/op 1.01
input length 512 5.6200 us/op 5.5950 us/op 1.00
input length 1024 10.759 us/op 10.736 us/op 1.00
digest 1000000 times 882.12 ms/op 920.41 ms/op 0.96
hashObjectToByteArray 50023 times 1.4314 ms/op 1.4270 ms/op 1.00
byteArrayToHashObject 50023 times 2.5494 ms/op 2.5095 ms/op 1.02
digest64 200092 times 217.56 ms/op 219.30 ms/op 0.99
hash 200092 times using batchHash4UintArray64s 238.10 ms/op 237.94 ms/op 1.00
digest64HashObjects 200092 times 192.82 ms/op 191.47 ms/op 1.01
hash 200092 times using batchHash4HashObjectInputs 202.43 ms/op 199.39 ms/op 1.02
getGindicesAtDepth 4.4580 us/op 4.3020 us/op 1.04
iterateAtDepth 7.7400 us/op 7.6230 us/op 1.02
getGindexBits 465.00 ns/op 478.00 ns/op 0.97
gindexIterator 1.0940 us/op 1.0660 us/op 1.03
HashComputationLevel.push then loop 24.947 ms/op 25.273 ms/op 0.99
HashComputation[] push then loop 49.330 ms/op 47.331 ms/op 1.04
hash 2 Uint8Array 500000 times - as-sha256 542.80 ms/op 538.91 ms/op 1.01
hashTwoObjects 500000 times - as-sha256 514.18 ms/op 496.24 ms/op 1.04
executeHashComputations - as-sha256 48.294 ms/op 46.232 ms/op 1.04
hash 2 Uint8Array 500000 times - noble 1.0735 s/op 1.0572 s/op 1.02
hashTwoObjects 500000 times - noble 1.4991 s/op 1.4922 s/op 1.00
executeHashComputations - noble 40.414 ms/op 40.539 ms/op 1.00
hash 2 Uint8Array 500000 times - hashtree 224.10 ms/op 224.17 ms/op 1.00
hashTwoObjects 500000 times - hashtree 218.22 ms/op 218.75 ms/op 1.00
executeHashComputations - hashtree 10.857 ms/op 10.730 ms/op 1.01
getNodeH() x7812.5 avg hindex 12.621 us/op 12.511 us/op 1.01
getNodeH() x7812.5 index 0 6.3370 us/op 6.2320 us/op 1.02
getNodeH() x7812.5 index 7 6.2730 us/op 6.3240 us/op 0.99
getNodeH() x7812.5 index 7 with key array 6.2510 us/op 6.2210 us/op 1.00
new LeafNode() x7812.5 14.740 us/op 14.723 us/op 1.00
getHashComputations 250000 nodes 15.312 ms/op 19.422 ms/op 0.79
batchHash 250000 nodes 314.05 ms/op 88.547 ms/op 3.55
get root 250000 nodes 816.77 ms/op 118.56 ms/op 6.89
getHashComputations 500000 nodes 30.479 ms/op 28.728 ms/op 1.06
batchHash 500000 nodes 564.12 ms/op 142.17 ms/op 3.97
get root 500000 nodes 1.5257 s/op 238.10 ms/op 6.41
getHashComputations 1000000 nodes 51.810 ms/op 70.132 ms/op 0.74
batchHash 1000000 nodes 1.1228 s/op 328.90 ms/op 3.41
get root 1000000 nodes 3.0909 s/op 477.38 ms/op 6.47
multiproof - depth 15, 1 requested leaves 8.0800 us/op 8.1730 us/op 0.99
tree offset multiproof - depth 15, 1 requested leaves 18.295 us/op 18.112 us/op 1.01
compact multiproof - depth 15, 1 requested leaves 3.4280 us/op 3.4020 us/op 1.01
multiproof - depth 15, 2 requested leaves 11.948 us/op 11.860 us/op 1.01
tree offset multiproof - depth 15, 2 requested leaves 21.608 us/op 21.715 us/op 1.00
compact multiproof - depth 15, 2 requested leaves 3.4820 us/op 3.4780 us/op 1.00
multiproof - depth 15, 3 requested leaves 16.444 us/op 16.901 us/op 0.97
tree offset multiproof - depth 15, 3 requested leaves 27.441 us/op 27.903 us/op 0.98
compact multiproof - depth 15, 3 requested leaves 4.1650 us/op 4.2200 us/op 0.99
multiproof - depth 15, 4 requested leaves 21.696 us/op 21.786 us/op 1.00
tree offset multiproof - depth 15, 4 requested leaves 33.847 us/op 35.400 us/op 0.96
compact multiproof - depth 15, 4 requested leaves 4.8280 us/op 5.0270 us/op 0.96
packedRootsBytesToLeafNodes bytes 4000 offset 0 1.8860 us/op 2.0580 us/op 0.92
packedRootsBytesToLeafNodes bytes 4000 offset 1 1.8700 us/op 2.0460 us/op 0.91
packedRootsBytesToLeafNodes bytes 4000 offset 2 1.8620 us/op 2.0460 us/op 0.91
packedRootsBytesToLeafNodes bytes 4000 offset 3 1.9060 us/op 2.0170 us/op 0.94
subtreeFillToContents depth 40 count 250000 42.318 ms/op 39.406 ms/op 1.07
setRoot - gindexBitstring 9.2668 ms/op 9.4713 ms/op 0.98
setRoot - gindex 9.4822 ms/op 9.7799 ms/op 0.97
getRoot - gindexBitstring 2.4363 ms/op 2.3727 ms/op 1.03
getRoot - gindex 3.1066 ms/op 3.2045 ms/op 0.97
getHashObject then setHashObject 9.8422 ms/op 10.076 ms/op 0.98
setNodeWithFn 7.6997 ms/op 7.6212 ms/op 1.01
getNodeAtDepth depth 0 x100000 1.1140 ms/op 1.1147 ms/op 1.00
setNodeAtDepth depth 0 x100000 2.4540 ms/op 2.4581 ms/op 1.00
getNodesAtDepth depth 0 x100000 1.0530 ms/op 1.0597 ms/op 0.99
setNodesAtDepth depth 0 x100000 1.5167 ms/op 1.5213 ms/op 1.00
getNodeAtDepth depth 1 x100000 1.1805 ms/op 1.1774 ms/op 1.00
setNodeAtDepth depth 1 x100000 5.1819 ms/op 5.1523 ms/op 1.01
getNodesAtDepth depth 1 x100000 1.1777 ms/op 1.1761 ms/op 1.00
setNodesAtDepth depth 1 x100000 4.3257 ms/op 4.3912 ms/op 0.99
getNodeAtDepth depth 2 x100000 1.4553 ms/op 1.4585 ms/op 1.00
setNodeAtDepth depth 2 x100000 8.9919 ms/op 8.9137 ms/op 1.01
getNodesAtDepth depth 2 x100000 18.677 ms/op 17.966 ms/op 1.04
setNodesAtDepth depth 2 x100000 12.947 ms/op 12.902 ms/op 1.00
tree.getNodesAtDepth - gindexes 7.3524 ms/op 7.2954 ms/op 1.01
tree.getNodesAtDepth - push all nodes 1.8070 ms/op 1.8712 ms/op 0.97
tree.getNodesAtDepth - navigation 235.82 us/op 233.54 us/op 1.01
tree.setNodesAtDepth - indexes 393.33 us/op 389.71 us/op 1.01
set at depth 8 451.00 ns/op 455.00 ns/op 0.99
set at depth 16 617.00 ns/op 584.00 ns/op 1.06
set at depth 32 915.00 ns/op 931.00 ns/op 0.98
iterateNodesAtDepth 8 256 13.036 us/op 13.033 us/op 1.00
getNodesAtDepth 8 256 3.3830 us/op 3.3670 us/op 1.00
iterateNodesAtDepth 16 65536 4.2376 ms/op 4.2538 ms/op 1.00
getNodesAtDepth 16 65536 1.4871 ms/op 1.5348 ms/op 0.97
iterateNodesAtDepth 32 250000 15.015 ms/op 15.465 ms/op 0.97
getNodesAtDepth 32 250000 4.2326 ms/op 4.2977 ms/op 0.98
iterateNodesAtDepth 40 250000 15.273 ms/op 15.075 ms/op 1.01
getNodesAtDepth 40 250000 4.2321 ms/op 4.2544 ms/op 0.99
250000 validators root getter 829.31 ms/op 117.93 ms/op 7.03
250000 validators batchHash() 287.17 ms/op 81.978 ms/op 3.50
250000 validators hashComputations 16.674 ms/op 17.881 ms/op 0.93
bitlist bytes to struct (120,90) 701.00 ns/op 821.00 ns/op 0.85
bitlist bytes to tree (120,90) 2.7660 us/op 3.1870 us/op 0.87
bitlist bytes to struct (2048,2048) 1.0780 us/op 1.1530 us/op 0.93
bitlist bytes to tree (2048,2048) 4.0390 us/op 3.9570 us/op 1.02
ByteListType - deserialize 7.7894 ms/op 7.1776 ms/op 1.09
BasicListType - deserialize 17.966 ms/op 15.106 ms/op 1.19
ByteListType - serialize 7.8533 ms/op 7.3920 ms/op 1.06
BasicListType - serialize 10.540 ms/op 10.336 ms/op 1.02
BasicListType - tree_convertToStruct 29.413 ms/op 26.430 ms/op 1.11
List[uint8, 68719476736] len 300000 ViewDU.getAll() + iterate 4.7271 ms/op 4.3451 ms/op 1.09
List[uint8, 68719476736] len 300000 ViewDU.get(i) 4.0231 ms/op 4.0176 ms/op 1.00
Array.push len 300000 empty Array - number 6.1254 ms/op 5.7115 ms/op 1.07
Array.set len 300000 from new Array - number 2.1033 ms/op 1.8534 ms/op 1.13
Array.set len 300000 - number 6.0065 ms/op 5.6512 ms/op 1.06
Uint8Array.set len 300000 372.80 us/op 367.16 us/op 1.02
Uint32Array.set len 300000 437.91 us/op 420.72 us/op 1.04
Container({a: uint8, b: uint8}) getViewDU x300000 26.554 ms/op 45.041 ms/op 0.59
ContainerNodeStruct({a: uint8, b: uint8}) getViewDU x300000 11.231 ms/op 10.702 ms/op 1.05
List(Container) len 300000 ViewDU.getAllReadonly() + iterate 211.05 ms/op 198.38 ms/op 1.06
List(Container) len 300000 ViewDU.getAllReadonlyValues() + iterate 237.85 ms/op 230.48 ms/op 1.03
List(Container) len 300000 ViewDU.get(i) 6.1234 ms/op 6.3351 ms/op 0.97
List(Container) len 300000 ViewDU.getReadonly(i) 6.0767 ms/op 6.2013 ms/op 0.98
List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonly() + iterate 40.154 ms/op 40.398 ms/op 0.99
List(ContainerNodeStruct) len 300000 ViewDU.getAllReadonlyValues() + iterate 5.3753 ms/op 5.0758 ms/op 1.06
List(ContainerNodeStruct) len 300000 ViewDU.get(i) 5.9325 ms/op 5.9316 ms/op 1.00
List(ContainerNodeStruct) len 300000 ViewDU.getReadonly(i) 5.8328 ms/op 5.8203 ms/op 1.00
Array.push len 300000 empty Array - object 5.8677 ms/op 5.8913 ms/op 1.00
Array.set len 300000 from new Array - object 2.0802 ms/op 2.0034 ms/op 1.04
Array.set len 300000 - object 5.6109 ms/op 6.2420 ms/op 0.90
ListUintNum64Type.toViewDU 1900000 -> 2000000 204.12 ms/op
ListUintNum64Type.toViewDU() 177.74 ms/op
cachePermanentRootStruct no cache 3.5550 us/op 5.1110 us/op 0.70
cachePermanentRootStruct with cache 203.00 ns/op 211.00 ns/op 0.96
epochParticipation len 250000 rws 7813 2.1959 ms/op 2.1637 ms/op 1.01
Deneb BeaconBlock.hashTreeRoot(), numTransaction=200 4.1054 ms/op
BeaconState ViewDU batchHashTreeRoot vc=200000 218.73 ms/op 89.382 ms/op 2.45
BeaconState ViewDU batchHashTreeRoot - commit step vc=200000 191.73 ms/op
BeaconState ViewDU batchHashTreeRoot - hash step vc=200000 48.397 ms/op
BeaconState ViewDU hashTreeRoot() vc=200000 738.93 ms/op 101.07 ms/op 7.31
BeaconState ViewDU hashTreeRoot - commit step vc=200000 62.920 ms/op 80.229 ms/op 0.78
BeaconState ViewDU hashTreeRoot - validator tree creation vc=100000 227.56 ms/op
deserialize Attestation - tree 4.0080 us/op 3.9630 us/op 1.01
deserialize Attestation - struct 1.8090 us/op 1.7580 us/op 1.03
deserialize SignedAggregateAndProof - tree 3.7710 us/op 3.6310 us/op 1.04
deserialize SignedAggregateAndProof - struct 2.9350 us/op 2.9160 us/op 1.01
deserialize SyncCommitteeMessage - tree 1.1060 us/op 1.0250 us/op 1.08
deserialize SyncCommitteeMessage - struct 1.0840 us/op 1.0620 us/op 1.02
deserialize SignedContributionAndProof - tree 2.1710 us/op 2.0080 us/op 1.08
deserialize SignedContributionAndProof - struct 2.3540 us/op 2.1560 us/op 1.09
deserialize SignedBeaconBlock - tree 220.27 us/op 207.08 us/op 1.06
deserialize SignedBeaconBlock - struct 115.14 us/op 114.47 us/op 1.01
BeaconState vc 300000 - deserialize tree 616.33 ms/op 585.06 ms/op 1.05
BeaconState vc 300000 - serialize tree 112.53 ms/op 136.42 ms/op 0.82
BeaconState.historicalRoots vc 300000 - deserialize tree 823.00 ns/op 691.00 ns/op 1.19
BeaconState.historicalRoots vc 300000 - serialize tree 695.00 ns/op 650.00 ns/op 1.07
BeaconState.validators vc 300000 - deserialize tree 576.79 ms/op 566.56 ms/op 1.02
BeaconState.validators vc 300000 - serialize tree 45.117 ms/op 97.294 ms/op 0.46
BeaconState.balances vc 300000 - deserialize tree 18.977 ms/op 23.164 ms/op 0.82
BeaconState.balances vc 300000 - serialize tree 3.2652 ms/op 3.5235 ms/op 0.93
BeaconState.previousEpochParticipation vc 300000 - deserialize tree 339.03 us/op 334.74 us/op 1.01
BeaconState.previousEpochParticipation vc 300000 - serialize tree 269.61 us/op 271.56 us/op 0.99
BeaconState.currentEpochParticipation vc 300000 - deserialize tree 348.84 us/op 335.55 us/op 1.04
BeaconState.currentEpochParticipation vc 300000 - serialize tree 266.74 us/op 274.71 us/op 0.97
BeaconState.inactivityScores vc 300000 - deserialize tree 23.292 ms/op 23.268 ms/op 1.00
BeaconState.inactivityScores vc 300000 - serialize tree 3.0084 ms/op 3.3465 ms/op 0.90
hashTreeRoot Attestation - struct 12.917 us/op 19.610 us/op 0.66
hashTreeRoot Attestation - tree 8.8670 us/op 9.0690 us/op 0.98
hashTreeRoot SignedAggregateAndProof - struct 16.185 us/op 24.121 us/op 0.67
hashTreeRoot SignedAggregateAndProof - tree 13.173 us/op 12.891 us/op 1.02
hashTreeRoot SyncCommitteeMessage - struct 3.8730 us/op 6.0570 us/op 0.64
hashTreeRoot SyncCommitteeMessage - tree 3.1990 us/op 3.1200 us/op 1.03
hashTreeRoot SignedContributionAndProof - struct 9.8520 us/op 16.609 us/op 0.59
hashTreeRoot SignedContributionAndProof - tree 9.0040 us/op 8.8530 us/op 1.02
hashTreeRoot SignedBeaconBlock - struct 897.97 us/op 1.2562 ms/op 0.71
hashTreeRoot SignedBeaconBlock - tree 783.51 us/op 759.84 us/op 1.03
hashTreeRoot Validator - struct 4.7090 us/op 7.5450 us/op 0.62
hashTreeRoot Validator - tree 6.3310 us/op 6.3470 us/op 1.00
BeaconState vc 300000 - hashTreeRoot tree 2.0114 s/op 2.0804 s/op 0.97
BeaconState vc 300000 - batchHashTreeRoot tree 3.2718 s/op 3.2767 s/op 1.00
BeaconState.historicalRoots vc 300000 - hashTreeRoot tree 956.00 ns/op 950.00 ns/op 1.01
BeaconState.validators vc 300000 - hashTreeRoot tree 2.0597 s/op 2.0641 s/op 1.00
BeaconState.balances vc 300000 - hashTreeRoot tree 34.595 ms/op 32.841 ms/op 1.05
BeaconState.previousEpochParticipation vc 300000 - hashTreeRoot tree 4.2911 ms/op 4.2966 ms/op 1.00
BeaconState.currentEpochParticipation vc 300000 - hashTreeRoot tree 4.2921 ms/op 4.1245 ms/op 1.04
BeaconState.inactivityScores vc 300000 - hashTreeRoot tree 37.102 ms/op 33.347 ms/op 1.11
hash64 x18 10.304 us/op 9.0690 us/op 1.14
hashTwoObjects x18 8.6080 us/op 8.7140 us/op 0.99
hash64 x1740 903.25 us/op 820.40 us/op 1.10
hashTwoObjects x1740 806.08 us/op 822.67 us/op 0.98
hash64 x2700000 1.3970 s/op 1.2905 s/op 1.08
hashTwoObjects x2700000 1.2221 s/op 1.2732 s/op 0.96
get_exitEpoch - ContainerType 223.00 ns/op 366.00 ns/op 0.61
get_exitEpoch - ContainerNodeStructType 224.00 ns/op 363.00 ns/op 0.62
set_exitEpoch - ContainerType 236.00 ns/op 381.00 ns/op 0.62
set_exitEpoch - ContainerNodeStructType 233.00 ns/op 372.00 ns/op 0.63
get_pubkey - ContainerType 841.00 ns/op 1.3880 us/op 0.61
get_pubkey - ContainerNodeStructType 216.00 ns/op 361.00 ns/op 0.60
hashTreeRoot - ContainerType 386.00 ns/op 614.00 ns/op 0.63
hashTreeRoot - ContainerNodeStructType 430.00 ns/op 645.00 ns/op 0.67
createProof - ContainerType 3.9950 us/op 6.3600 us/op 0.63
createProof - ContainerNodeStructType 20.072 us/op 24.930 us/op 0.81
serialize - ContainerType 1.7630 us/op 1.9060 us/op 0.92
serialize - ContainerNodeStructType 1.1180 us/op 1.4160 us/op 0.79
set_exitEpoch_and_hashTreeRoot - ContainerType 3.1490 us/op 2.7860 us/op 1.13
set_exitEpoch_and_hashTreeRoot - ContainerNodeStructType 6.7540 us/op 7.1890 us/op 0.94
ValidatorViewDU hashTreeRoot 8.3440 us/op
ContainerNodeStructViewDU hashTreeRoot 23.857 us/op
Array - for of 5.4820 us/op 6.5340 us/op 0.84
Array - for(;;) 5.4330 us/op 6.3540 us/op 0.86
basicListValue.readonlyValuesArray() 3.9917 ms/op 4.1882 ms/op 0.95
basicListValue.readonlyValuesArray() + loop all 4.1465 ms/op 4.3806 ms/op 0.95
compositeListValue.readonlyValuesArray() 29.444 ms/op 30.292 ms/op 0.97
compositeListValue.readonlyValuesArray() + loop all 28.950 ms/op 29.651 ms/op 0.98
Number64UintType - get balances list 4.1746 ms/op 5.5447 ms/op 0.75
Number64UintType - set balances list 10.035 ms/op 10.067 ms/op 1.00
Number64UintType - get and increase 10 then set 40.161 ms/op 39.157 ms/op 1.03
Number64UintType - increase 10 using applyDelta 15.742 ms/op 16.529 ms/op 0.95
Number64UintType - increase 10 using applyDeltaInBatch 15.827 ms/op 17.171 ms/op 0.92
tree_newTreeFromUint64Deltas 16.065 ms/op 16.725 ms/op 0.96
unsafeUint8ArrayToTree 31.302 ms/op 32.415 ms/op 0.97
bitLength(50) 224.00 ns/op 222.00 ns/op 1.01
bitLengthStr(50) 211.00 ns/op 210.00 ns/op 1.00
bitLength(8000) 214.00 ns/op 218.00 ns/op 0.98
bitLengthStr(8000) 258.00 ns/op 251.00 ns/op 1.03
bitLength(250000) 219.00 ns/op 220.00 ns/op 1.00
bitLengthStr(250000) 294.00 ns/op 289.00 ns/op 1.02
merkleizeInto 4 chunks 1.3550 us/op
merkleize 4 chunks 1.9230 us/op
merkleizeInto 8 chunks 1.9010 us/op
merkleize 8 chunks 4.0130 us/op
merkleizeInto 16 chunks 2.5520 us/op
merkleize 16 chunks 8.1090 us/op
merkleizeInto 32 chunks 3.5040 us/op
merkleize 32 chunks 16.208 us/op
floor - Math.floor (53) 1.2391 ns/op 1.2430 ns/op 1.00
floor - << 0 (53) 1.2368 ns/op 1.2366 ns/op 1.00
floor - Math.floor (512) 1.2402 ns/op 1.2373 ns/op 1.00
floor - << 0 (512) 1.2393 ns/op 1.2400 ns/op 1.00
fnIf(0) 1.5467 ns/op 1.5538 ns/op 1.00
fnSwitch(0) 2.1649 ns/op 2.1668 ns/op 1.00
fnObj(0) 1.5518 ns/op 1.5568 ns/op 1.00
fnArr(0) 1.5471 ns/op 1.5465 ns/op 1.00
fnIf(4) 2.1650 ns/op 2.1743 ns/op 1.00
fnSwitch(4) 2.1653 ns/op 2.1669 ns/op 1.00
fnObj(4) 1.5591 ns/op 1.5487 ns/op 1.01
fnArr(4) 1.5479 ns/op 1.5480 ns/op 1.00
fnIf(9) 3.0920 ns/op 3.0924 ns/op 1.00
fnSwitch(9) 2.1649 ns/op 2.1665 ns/op 1.00
fnObj(9) 1.5470 ns/op 1.5475 ns/op 1.00
fnArr(9) 1.5505 ns/op 1.5461 ns/op 1.00
Container {a,b,vec} - as struct x100000 124.51 us/op 123.84 us/op 1.01
Container {a,b,vec} - as tree x100000 340.73 us/op 340.09 us/op 1.00
Container {a,vec,b} - as struct x100000 154.86 us/op 154.32 us/op 1.00
Container {a,vec,b} - as tree x100000 371.62 us/op 371.18 us/op 1.00
get 2 props x1000000 - rawObject 310.52 us/op 309.88 us/op 1.00
get 2 props x1000000 - proxy 73.639 ms/op 72.741 ms/op 1.01
get 2 props x1000000 - customObj 309.30 us/op 307.92 us/op 1.00
Simple object binary -> struct 549.00 ns/op 567.00 ns/op 0.97
Simple object binary -> tree_backed 1.0490 us/op 986.00 ns/op 1.06
Simple object struct -> tree_backed 1.5280 us/op 1.5440 us/op 0.99
Simple object tree_backed -> struct 1.5170 us/op 1.4670 us/op 1.03
Simple object struct -> binary 789.00 ns/op 789.00 ns/op 1.00
Simple object tree_backed -> binary 1.2240 us/op 1.2600 us/op 0.97
aggregationBits binary -> struct 445.00 ns/op 444.00 ns/op 1.00
aggregationBits binary -> tree_backed 1.9620 us/op 1.9550 us/op 1.00
aggregationBits struct -> tree_backed 2.2560 us/op 2.3230 us/op 0.97
aggregationBits tree_backed -> struct 900.00 ns/op 926.00 ns/op 0.97
aggregationBits struct -> binary 683.00 ns/op 686.00 ns/op 1.00
aggregationBits tree_backed -> binary 860.00 ns/op 871.00 ns/op 0.99
List(uint8) 100000 binary -> struct 1.5329 ms/op 1.6812 ms/op 0.91
List(uint8) 100000 binary -> tree_backed 87.294 us/op 93.101 us/op 0.94
List(uint8) 100000 struct -> tree_backed 1.1366 ms/op 1.1024 ms/op 1.03
List(uint8) 100000 tree_backed -> struct 1.1154 ms/op 1.0546 ms/op 1.06
List(uint8) 100000 struct -> binary 1.0284 ms/op 989.83 us/op 1.04
List(uint8) 100000 tree_backed -> binary 88.758 us/op 89.572 us/op 0.99
List(uint64Number) 100000 binary -> struct 1.1597 ms/op 1.1648 ms/op 1.00
List(uint64Number) 100000 binary -> tree_backed 3.1350 ms/op 2.5600 ms/op 1.22
List(uint64Number) 100000 struct -> tree_backed 4.6010 ms/op 4.0852 ms/op 1.13
List(uint64Number) 100000 tree_backed -> struct 2.1656 ms/op 2.0743 ms/op 1.04
List(uint64Number) 100000 struct -> binary 1.2460 ms/op 1.3325 ms/op 0.94
List(uint64Number) 100000 tree_backed -> binary 826.27 us/op 853.53 us/op 0.97
List(Uint64Bigint) 100000 binary -> struct 3.2557 ms/op 3.5539 ms/op 0.92
List(Uint64Bigint) 100000 binary -> tree_backed 2.6327 ms/op 3.2431 ms/op 0.81
List(Uint64Bigint) 100000 struct -> tree_backed 5.0095 ms/op 5.5215 ms/op 0.91
List(Uint64Bigint) 100000 tree_backed -> struct 4.5832 ms/op 4.5138 ms/op 1.02
List(Uint64Bigint) 100000 struct -> binary 2.0761 ms/op 2.0407 ms/op 1.02
List(Uint64Bigint) 100000 tree_backed -> binary 875.32 us/op 934.11 us/op 0.94
Vector(Root) 100000 binary -> struct 29.240 ms/op 32.244 ms/op 0.91
Vector(Root) 100000 binary -> tree_backed 29.506 ms/op 27.173 ms/op 1.09
Vector(Root) 100000 struct -> tree_backed 42.111 ms/op 45.955 ms/op 0.92
Vector(Root) 100000 tree_backed -> struct 49.262 ms/op 48.127 ms/op 1.02
Vector(Root) 100000 struct -> binary 2.8951 ms/op 2.6239 ms/op 1.10
Vector(Root) 100000 tree_backed -> binary 9.5454 ms/op 8.2375 ms/op 1.16
List(Validator) 100000 binary -> struct 105.58 ms/op 104.93 ms/op 1.01
List(Validator) 100000 binary -> tree_backed 286.65 ms/op 285.76 ms/op 1.00
List(Validator) 100000 struct -> tree_backed 314.90 ms/op 309.56 ms/op 1.02
List(Validator) 100000 tree_backed -> struct 200.70 ms/op 210.31 ms/op 0.95
List(Validator) 100000 struct -> binary 28.797 ms/op 26.723 ms/op 1.08
List(Validator) 100000 tree_backed -> binary 105.99 ms/op 115.26 ms/op 0.92
List(Validator-NS) 100000 binary -> struct 102.85 ms/op 98.543 ms/op 1.04
List(Validator-NS) 100000 binary -> tree_backed 147.38 ms/op 144.42 ms/op 1.02
List(Validator-NS) 100000 struct -> tree_backed 170.11 ms/op 185.09 ms/op 0.92
List(Validator-NS) 100000 tree_backed -> struct 147.03 ms/op 165.67 ms/op 0.89
List(Validator-NS) 100000 struct -> binary 7.4331 ms/op 27.056 ms/op 0.27
List(Validator-NS) 100000 tree_backed -> binary 12.354 ms/op 31.887 ms/op 0.39
get epochStatuses - MutableVector 112.82 us/op 106.38 us/op 1.06
get epochStatuses - ViewDU 205.51 us/op 203.15 us/op 1.01
set epochStatuses - ListTreeView 2.1111 ms/op 2.3891 ms/op 0.88
set epochStatuses - ListTreeView - set() 446.11 us/op 459.01 us/op 0.97
set epochStatuses - ListTreeView - commit() 1.0159 ms/op 555.56 us/op 1.83
bitstring 648.28 ns/op 641.14 ns/op 1.01
bit mask 13.923 ns/op 13.521 ns/op 1.03
struct - increase slot to 1000000 928.09 us/op 927.98 us/op 1.00
UintNumberType - increase slot to 1000000 21.407 ms/op 21.987 ms/op 0.97
UintBigintType - increase slot to 1000000 163.82 ms/op 159.36 ms/op 1.03
UintBigint8 x 100000 tree_deserialize 4.3076 ms/op 4.6560 ms/op 0.93
UintBigint8 x 100000 tree_serialize 1.0921 ms/op 1.0943 ms/op 1.00
UintBigint16 x 100000 tree_deserialize 4.3735 ms/op 4.7401 ms/op 0.92
UintBigint16 x 100000 tree_serialize 1.2139 ms/op 1.2154 ms/op 1.00
UintBigint32 x 100000 tree_deserialize 4.7990 ms/op 5.0010 ms/op 0.96
UintBigint32 x 100000 tree_serialize 1.2434 ms/op 1.2239 ms/op 1.02
UintBigint64 x 100000 tree_deserialize 5.5438 ms/op 5.2174 ms/op 1.06
UintBigint64 x 100000 tree_serialize 1.6034 ms/op 1.5732 ms/op 1.02
UintBigint8 x 100000 value_deserialize 433.24 us/op 521.80 us/op 0.83
UintBigint8 x 100000 value_serialize 708.54 us/op 666.63 us/op 1.06
UintBigint16 x 100000 value_deserialize 464.97 us/op 462.13 us/op 1.01
UintBigint16 x 100000 value_serialize 748.95 us/op 724.99 us/op 1.03
UintBigint32 x 100000 value_deserialize 433.80 us/op 433.07 us/op 1.00
UintBigint32 x 100000 value_serialize 731.81 us/op 698.01 us/op 1.05
UintBigint64 x 100000 value_deserialize 496.92 us/op 495.96 us/op 1.00
UintBigint64 x 100000 value_serialize 873.80 us/op 874.42 us/op 1.00
UintBigint8 x 100000 deserialize 2.9569 ms/op 2.9727 ms/op 0.99
UintBigint8 x 100000 serialize 1.7383 ms/op 1.6040 ms/op 1.08
UintBigint16 x 100000 deserialize 3.0451 ms/op 3.0156 ms/op 1.01
UintBigint16 x 100000 serialize 1.4740 ms/op 1.6335 ms/op 0.90
UintBigint32 x 100000 deserialize 3.0718 ms/op 3.0710 ms/op 1.00
UintBigint32 x 100000 serialize 2.6968 ms/op 2.7979 ms/op 0.96
UintBigint64 x 100000 deserialize 3.7599 ms/op 3.9455 ms/op 0.95
UintBigint64 x 100000 serialize 1.5457 ms/op 1.5557 ms/op 0.99
UintBigint128 x 100000 deserialize 5.0731 ms/op 5.4218 ms/op 0.94
UintBigint128 x 100000 serialize 14.872 ms/op 13.879 ms/op 1.07
UintBigint256 x 100000 deserialize 8.2260 ms/op 8.0535 ms/op 1.02
UintBigint256 x 100000 serialize 43.641 ms/op 41.441 ms/op 1.05
Slice from Uint8Array x25000 1.2849 ms/op 1.2979 ms/op 0.99
Slice from ArrayBuffer x25000 15.123 ms/op 15.006 ms/op 1.01
Slice from ArrayBuffer x25000 + new Uint8Array 16.064 ms/op 16.000 ms/op 1.00
Copy Uint8Array 100000 iterate 1.7022 ms/op 1.6460 ms/op 1.03
Copy Uint8Array 100000 slice 111.88 us/op 116.05 us/op 0.96
Copy Uint8Array 100000 Uint8Array.prototype.slice.call 112.07 us/op 117.91 us/op 0.95
Copy Buffer 100000 Uint8Array.prototype.slice.call 113.68 us/op 117.93 us/op 0.96
Copy Uint8Array 100000 slice + set 193.63 us/op 178.57 us/op 1.08
Copy Uint8Array 100000 subarray + set 113.56 us/op 117.59 us/op 0.97
Copy Uint8Array 100000 slice arrayBuffer 112.35 us/op 116.92 us/op 0.96
Uint64 deserialize 100000 - iterate Uint8Array 1.9181 ms/op 1.8024 ms/op 1.06
Uint64 deserialize 100000 - by Uint32A 2.1155 ms/op 1.7828 ms/op 1.19
Uint64 deserialize 100000 - by DataView.getUint32 x2 1.9633 ms/op 1.7841 ms/op 1.10
Uint64 deserialize 100000 - by DataView.getBigUint64 4.7816 ms/op 5.2553 ms/op 0.91
Uint64 deserialize 100000 - by byte 39.779 ms/op 39.228 ms/op 1.01

by benchmarkbot/action

* feat: implement merkleizeBlockArray

* fix: support padFor=1 for merkleizeBlockArray

* feat: add blockLimit param to merkleizeBlockArray() api

* feat: implement ByteListType.hashTreeRoot() using merkleizeBlockArray()

* fix: assign this.blocksBuffer in a more straightforward way

* chore: refactor chunkBytes to blockBytes

* fix: blockLimit usage in doMerkleizeBlockArray

* feat: implement ListComposite.hashTreeRoot() using merkleizeBlockArray api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate inconsistent performance of hashTreeRoot()
3 participants