Skip to content

Results from benchmarking Serenity state processing with Lighthouse.

Notifications You must be signed in to change notification settings

sigp/serenity-benches

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 

Repository files navigation

Serenity Benches

This document contains the results of running benchmarks on the Lighthouse Eth2.0 client.

Included alongside the benchmarks are descriptions of the functions being measured, lessons learned about optimising and instructions on how to run the benchmarks locally.

Table of Contents:

Info/Disclaimers:

  • All benchmarks were performed using the Lighthouse Eth2 client (Rust).
  • The majority of these benches are stress-tests (worst case scenarios).
  • Benchmarks are not limited to a single-core -- concurrent functions will run across multiple cores.
  • These benchmarks are up-to-date with spec v0.4.0.
  • It is almost a certainty that this implementation will have bugs -- test vectors for these functions are soon-to-be-released, until then it is very difficult to find bugs.
  • All benches are purely functional -- there are no "read from disk" or "fetch from network" times included.
  • There has been only a mild amount of effort taken to optimise this code. There's likely a lot of room for improvement.
  • Our optimisations are currently focussed towards an all-vaildators-active scenario -- we will likely need to adjust our optimisations to suit a more diverse set of scenarios.
  • There is no tree hashing caching in any tests.
  • The ordering of some of the columns is a little odd, sorry.

Results:

There are three scenarios benched. Each should be exactly the same, except the validator count is changed. The following codes map to each scenario:

  • 16K: 16,384 validators (all active).
  • 300K: 300,032 validators (all active).
  • 4M: 4,000,000 validators (all active).

Note: when a result is - it means the value has not changed since the benchmark with the next smallest validator count. E.g., the value for 300k validators hasn't changed from the value for 16k validators.

Epoch Processing

Desktop

Benchmark 16K Desktop 300K Desktop
process_eth1_data 294.34 ns 1.1548 ÎĽs
initialize_validator_statuses 982.77 ÎĽs 33.931 ms
process_justification 276.59 ns 1.2140 ÎĽs
process_crosslinks 1.0204 ms 30.306 ms
process_rewards_and_penalties 804.03 ÎĽs 15.609 ms
*process_ejections 187.60 ÎĽs 3.2257 ms
*process_validator_registry 187.88 ÎĽs 8.2305 ms
^update_active_tree_index_roots 3.0462 ms 82.961 ms
*update_latest_slashed_balances 400.25 ns 952.32 ns
clean_attestations 23.615 ÎĽs 322.11 ÎĽs
per_epoch_processing 6.4554 ms 187.98 ms

Laptop

Benchmark 16K Laptop 300K Laptop
process_eth1_data 415.11 ns 1.3387 ÎĽs
initialize_validator_statuses 1.2862 ms 46.470 ms
process_justification 411.57 ns 1.3773 ÎĽs
process_crosslinks 1.2308 ms 39.095 ms
process_rewards_and_penalties 1.0781 ms 22.406 ms
*process_ejections 171.52 ÎĽs 3.0041 ms
*process_validator_registry 376.89 ÎĽs 9.9943 ms
^update_active_tree_index_roots 4.5167 ÎĽs 119.122 ms
*update_latest_slashed_balances 606.76 ns 1.3709 ÎĽs
clean_attestations 28.675 ÎĽs 248.50 ÎĽs
per_epoch_processing 9.5174 ms 262.29 ms

* We did not add an ejections or registry changes. These times are best-case (not worst-case).

^ This time is tree-hashing the entire active validator indices. Technically it doesn't need to run at all because they haven't changed since the previous epoch.

Block Processing

The block-processing benches are tagged with the following codes:

  • BIG: Worst-case.
    • Maximum operations (e.g., MAX_ATTESTATIONS attestations, etc.)
  • SML: Reasonable-case
    • 0 slashings
    • 16 full attestations
    • 2 deposits
    • 2 exits
    • 2 transfers

Desktop

Benchmark 16K BIG Desktop 300K BIG Desktop 300K SML Desktop 4M SML Desktop 4M BIG Desktop
verify_block_signature 5.3024 ms - - - -
process_randao 5.2679 ms - - - -
process_eth1_data 229.31 ns 1.4178 ÎĽs - - -
process_proposer_slashings 37.108 ms - 1.4005 ÎĽs - -
process_attester_slashings 147.83 ms - 1.3960 ÎĽs - -
process_attestations 193.86 ms 309.15ms 48.02 ms 393.63 ms 2.7259 s
*process_deposits 18.492 ms - 8.0843 ms 20.233 ms 30.160 s
process_exits 18.835 ms - 6.6976 ms - -
process_transfers 18.686 ms - 6.4966 ms - -
per_block_processing 440.63 ms 553.25 ms 79.544 ms 433.55 ms 2.9739 s

Note: 16K SML per_block_processing comes in at 64.077 ms.

Laptop

Benchmark 16K BIG Laptop 300K BIG Laptop 300K SML Laptop 4M SML Laptop 4M BIG Laptop
verify_block_signature 7.1359 ms - - - -
process_randao 7.0675 ms - - - -
process_eth1_data 330.12 ns 1.5468 ÎĽs - 4.2278 ÎĽs -
process_proposer_slashings 119.46 ms - 1.8209 ÎĽs - -
process_attester_slashings 211.74 ms - 1.8208 ÎĽs - -
process_attestations 833.23 ms 1.3424 s 159.46 ms 1.4417 s 11.920 s
*process_deposits 60.300 ms - 9.7835 ms 23.987 ms 74.589 ms
process_exits 60.163 ms - 7.8856 ms - -
process_transfers 60.163 ms - 8.2653 ms - -
per_block_processing 1.3885 s 1.819 s 207.07 ms 1.5126 s 12.494 s

* Merkle roots are not verified.

Note: 16K SML per_block_processing comes in at 152.70 ms.

Note: 4M BIG per_block_processing comes in at ~12 s, where 11.920 s is attestation verification.

Cache Builds

All the previous benchmarks were done with pre-built committee and pubkey caches. These are the times to build those caches.

Benchmark 16K Desktop 300K Desktop 16K Laptop 300K Laptop 4M Desktop
build_epoch_committee_cache 7.4939 ms 264.02 ms 14.477 ms 323.01 ms 3.9832 s
build_pubkey_cache 14.874 ms 339.41 ms 30.178 ms 488.15 ms 5.0710 s

Note: I once benched shuffling 4M validators at 1.2s. Epoch cache build times seem unreasonably high, I suspect they can be improved..

Tree Hashing

This is a full tree-hash without caching.

Benchmark 16K Desktop 300K Desktop 16K Laptop 300K Laptop
tree_hash_state 81.444 ms 1.3884 s 121.80 ms 1.8679 s
tree_hash_block 3.0570 ms (BIG) 3.4629 ms (BIG) 4.5881 ms (BIG) 4.7180 ms (BIG)

BLS

Benchmark Desktop Laptop
verify_a_signature 5.2158 ms 6.7476 ms
aggregate_a_public_key 32.311 ÎĽs 42.000 ÎĽs
pubkey_from_uncompressed_bytes 826.60 ns NA
pubkey_from_compressed_bytes 39.934 ÎĽs NA

Lessons Learned

Here we share the lessons we learned during optimisation. We hope this information will save time for other implementers.

Per-Epoch Processing

We found it useful to consider rewards and penalties as a map against state.validator_balances. This allowed us to very easily use rayon to do this map concurrently. The specification is well designed for this purpose -- updating one validator's balance only mutates state related to that validator.

Where we found speed improvements:

  • Processing the validator rewards in parallel (as mentioned earlier). This
  • Removing all ~O(n^2) time blow-ups like inclusion_distance.

Per-Block Processing

Most notable gains here were from introducing concurrency where hashing or signature verification is involved.

We found a ~50% increase in processing AttesterSlashings by first verifying all SlashableAttestations in parallel before verifying each AttesterSlashing.

We found some difficulties in quickly generating a HashMap for checking that some pubkey already exists in the validator registry. We found that going from our BLS libraries object into bytes (for hashing) was rather slow. As such, we now maintain a PublicKey -> ValidatorIndex map alongside the state.

Details

Per-Epoch Processing

process_eth1_data

Benching Setup

Function is run with the following inputs:

  • There is only a single Eth1DataVote.

Function Description

  1. Attempts to find an Eth1Data with a super-majority vote and update the state.

initialize_validator_statuses

Benching Setup

Function is run with the following inputs:

  • A single pending attestation with full participation for each committee that is able to include an attestation in the state.

Function Description

Performs a loop over all validators and determines

  • Active validators
  • Current balances
  • Previous balances

Then loops through all the attestations and determines:

  • Current epoch attesters
  • Current epoch boundary attesters
  • Previous epoch attesters
  • Previous epoch boundary attesters
  • Previous epoch head attesters

This info is used later on during epoch processing.

process_justification

Benching Setup

Function is run with the following inputs:

  • Previous justified epoch is the previous epoch - 1.
  • Justified epoch is the previous epoch.
  • Justification bitfield has all bits set true.
  • Previous and current boundary attesting balances are 100% of respective total balances.

Function Description

  1. Performs the actions described here.

process_crosslinks

Benching Setup

Function is run with the following inputs:

  • A single pending attestation with full participation for each committee that is able to include an attestation in the state.

Function Description

  1. Attempts to determine the winning_root for each shard.
  2. Updates state_latest_crosslinks where there is a winning root.
  3. Builds a hashmap of Shard -> WinningRoot for later lookups.

process_rewards_and_penalties

Benching Setup

Function is run with the following inputs:

  • All validators are active.
  • All validators performed their duties exactly and agreed upon everything.
  • Epochs since finality is 4.

Function Description

  1. Does some misc setups (determine latest epoch attestations, reward quotients, etc).
  2. Does as a loop across all participants in all committees in the previous epoch and determines if/how they should be rewarded for winning root participation.
  3. Does an parallel map across state.validator_balances and applies the justification and finalization, attestation inclusion and crosslinks rewards.
  4. Loops through all validators and rewards the block proposer if they included a block.

process_ejections

Benching Setup

Function is run with the following inputs:

  • There are no ejections.

Function Description

  1. Performs the actions described here.

process_validator_registry

Benching Setup

Function is run with the following inputs:

  • state.finalized_epoch > state.validator_registry_update_epoch
  • The epoch for all crosslinks is > state.validator_registry_update_epoch
  • There are no slashings.
  • There are no exits.

Function Description

  1. Performs the actions described here.

update_active_tree_index_roots

Benching Setup

Function is run with the following inputs:

  • All validators are active.

Function Description

  1. Performs a tree-hash on the active validator indices.
  2. Updates state.latest_active_index_roots.

update_latest_slashed_balances

Benching Setup

NA.

Function Description

  1. Rotates state.latest_slashed_balances.

clean_attestations

Benching Setup

NA.

Function Description

  1. Removes all attestations from the previous epoch from state.pending_attestations.

per_epoch_processing

Benching Setup

This is a full run of the epoch processing function upon a state with has the following characteristics:

  • All validators are active.
  • Epochs since finality == 4.
  • Previous justified epoch == current_epoch - 3.
  • Justified epoch == current_epoch - 2.
  • Finalized epoch == current_epoch - 3.
  • Full justification bitfield.
  • All validators performed their duties exactly and agreed upon everything.
  • Pending attestations includes a single attestation with full participation for each committee that is able to include an attestation in the state.

Note: this does not include tree-hashing the state.

Function Description

  1. Runs all of the functions listed in this section (except tree hashing the state). Hopefully fully adheres to the per-slot processing.

Per-Block Processing

This section provides detail on each benchmark. The following is provided for each benchmark:

  • Benching Setup: details on the setup for the benchmark. E.g., how many objects, were verified etc.
  • Function Description: a brief description how the benched function operates. E.g., concurrent operations, etc.

verify_block_signature

Benching Setup

Function is run with the following inputs:

  • A valid block signature.

Function Description

  1. Determines the slot's block producer.
  2. Produces a signed root of the block's Proposal.
  3. Verifies the block producer signed the proposal root.

process_randao

Benching Setup

Function is run with the following inputs:

  • A valid block.randao_reveal.

Function Description

  1. Determines the block proposer for the present slot.
  2. Verifies that randao_reveal is the proposer's signature across hash_tree_root(state.current_epoch()).
  3. Updates state.latest_randao_mixes with the new reveal.

process_eth1_data

Benching Setup

Function is run with the following inputs:

  • A matching Eth1DataVote does not exist in the state.

Function Description

  1. Searches for a matching Eth1DataVote in the state.
  2. If exists, increments the vote for that data. Otherwise adds a new Eth1DataVote.

process_proposer_slashings

Benching Setup

Function is run with the following inputs:

  • MAX_PROPOSER_COUNT count valid ProposerSlashings.

Function Description

  1. Verifies each ProposerSlashing in parallel.
  2. If all are valid, slashes each validator sequentially.

process_attester_slashings

Benching Setup

Function is run with the following inputs:

  • MAX_ATTESTER_SLASHINGS count valid AttesterSlashings.
  • Each AttesterSlashing has MAX_INDICES_PER_SLASHABLE_VOTE indices.

Function Description

  1. Builds a list of references to each SlashableAttestation in the AttesterSlashings (there are two in each).
  2. Verifies all the SlashableAttestations in parallel.
  3. If all SlashableAttestations are valid, performs the following actions sequentially:
    • Verifies each AttesterSlashing.
    • Slashes all appropriate validators.

process_attestations

Benching Setup

Function is run with the following inputs:

  • MAX_ATTESTATIONS count valid Attestations. In the case that there are
  • not MAX_ATTESTATION committees available for the block, committees are split into two signing-groups until there are enough Attestations.

Function Description

  1. Ensures the previous epoch cache is built (it is always built for these benches).
  2. Verifies each Attestation in parallel.
  3. If all are valid, updates the state sequentially.

process_deposits

Function is run with the following inputs:

  • MAX_DEPOSITS count valid Deposits.

Function Description

  1. Partially-verifies each deposit in parallel.
  2. If all were valid, performs the following actions sequentially:
    1. Updates the already-built PublicKey -> ValidatorIndex map (for look-up of pre-existing validators).
    2. Verifies the deposit index against state.deposit_index.
    3. Loads the validator index (if any) from the hashmap.
    4. If validator exists, checks withdrawal credentials and updates balance. Otherwise, creates new validator.

process_exits

Benching Setup

Function is run with the following inputs:

  • MAX_VOLUNTARY_EXITS count valid VoluntaryExits.

Function Description

  1. Verifies each exit in parallel.
  2. Updates the state sequentially.

process_transfers

Benching Setup

Function is run with the following inputs:

  • MAX_TRANSFERS count valid Transfers.
  • Each transfer transfers 1 Gwei from a withdrawn validator to itself.

Function Description

  1. Verifies each transfer in parallel.
  2. Updates the state sequentially.

per_block_processing

Function is run with the following inputs:

  • A valid block.
  • Maximum possible operations (Attestations, Transfers, etc.). Each has the same characteristics as the process_... functions above.

Function Description

If any of the following functions return an error (e.g., invalid object) execution returns immediately.

  1. Checks the block slot against the state slot.
  2. Ensures epoch caches are built (they are always built for these benches).
  3. Verifies block signature
  4. Processes the randao reveal.
  5. Processes the eth1 data.
  6. Processes proposer slashings.
  7. Processes attester slashings.
  8. Processes attestations.
  9. Processes deposits.
  10. Processes exits.
  11. Processes transfers.

Epoch Cache Builds

Our epoch caches are on a per-epoch basis and contain:

  • Each committee for each slot of the epoch.
  • A map of ValidatorIndex -> (AttestationSlot, AttestationShard, CommitteeIndex) for easy access of a validator's attestation duties.
  • A map of Shard -> (EpochCommitteesIndex, SlotCommitteesIndex) for easy access of a shard's committee.

Pubkey Cache Builds

The pubkey cache is a map of PublicKey -> ValidatorIndex. It is used to speed up deposit processing (checking to see if a pubkey exists in the validator registry).

It is presently very slow because we need to convert our BLS libraries PublicKey struct into bytes. We have chosen to use uncompressed bytes for the key of the hashmap as it is quicker.

Tree Hashing

tree_hash_state

Benching Setup

Function is run with the following inputs:

Function Description

  1. Hashes as per the tree hash spec.

tree_hash_block

Benching Setup

Function is run with the following inputs:

Function Description

  1. Hashes as per the tree hash spec.

Computer Details

Desktop

  • Arch Linux
  • 6-core i7-8700K.
  • 16GB DDR4 2400 Mhz.

Laptop

  • Lenovo X1 Carbon 5th Gen
  • Arch Linux
  • 2-core i5 7300U
  • 16GB LPDDR3 1866 Mhz.

Running the benchmarks

You can run these benchmarks on your local machine if you're an advanced user. There are three main steps:

  1. Setup the repo by cloning it and ensuring you have all the deps.
  2. Generate a file of keypairs to speed-up the setup for the benches. This can be skipped but it will greatly increase your setup time (e.g., it takes 5 mins to generate 4M keypairs on 6-cores however it takes 1s to read them from file.)
  3. Run the benches

1. Setup the repo

  1. Clone the https://github.com/sigp/lighthouse repo at sane-case.
  2. Follow the Running instructions. You just need a standard Rust setup (rustup) and additionally clang and protobuf.
  3. Check you can build by running the tests: $ cargo test --all

2. Generate the keypairs file

This will create the following file: $HOME/.lighthouse/keypairs.raw_keypairs. It contains uncompressed, unencrypted BLS keypairs and massively speeds up setup times for benches.

  1. Navigate to beacon_node/beacon_chain/test_harness
  2. Run cargo run --release -- gen_keys -n KEYS where KEYS is the maximum number of validators you wish to bench with. If your keypairs file isn't big enough you'll get a panic about being unable to fill a buffer during benching.

Note: if you omit --release key generation will be very slow.

Note: it takes my desktop 5 minutes to build 4M keys.

3. Run the benches

  1. Navigate to eth2/state_processing
  2. Run cargo bench

Note: you can filter benches. Using cargo bench block will only run benches where block is in the title.

If you want to change the number of validators, change the 'VALIDATOR_COUNT' variable.

About

Results from benchmarking Serenity state processing with Lighthouse.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published