This document contains the results of running benchmarks on the Lighthouse Eth2.0 client.
Included alongside the benchmarks are descriptions of the functions being measured, lessons learned about optimising and instructions on how to run the benchmarks locally.
- Results
- Per-epoch processing details
- Per-block processing details
- Computer details
- Running the benchmarks
- All benchmarks were performed using the Lighthouse Eth2 client (Rust).
- The majority of these benches are stress-tests (worst case scenarios).
- Benchmarks are not limited to a single-core -- concurrent functions will run across multiple cores.
- These benchmarks are up-to-date with spec v0.4.0.
- It is almost a certainty that this implementation will have bugs -- test vectors for these functions are soon-to-be-released, until then it is very difficult to find bugs.
- All benches are purely functional -- there are no "read from disk" or "fetch from network" times included.
- There has been only a mild amount of effort taken to optimise this code. There's likely a lot of room for improvement.
- Our optimisations are currently focussed towards an all-vaildators-active scenario -- we will likely need to adjust our optimisations to suit a more diverse set of scenarios.
- There is no tree hashing caching in any tests.
- The ordering of some of the columns is a little odd, sorry.
There are three scenarios benched. Each should be exactly the same, except the validator count is changed. The following codes map to each scenario:
- 16K: 16,384 validators (all active).
- 300K: 300,032 validators (all active).
- 4M: 4,000,000 validators (all active).
Note: when a result is -
it means the value has not changed since the
benchmark with the next smallest validator count. E.g., the value for 300k
validators hasn't changed from the value for 16k validators.
Benchmark | 16K Desktop | 300K Desktop |
---|---|---|
process_eth1_data | 294.34 ns | 1.1548 ÎĽs |
initialize_validator_statuses | 982.77 ÎĽs | 33.931 ms |
process_justification | 276.59 ns | 1.2140 ÎĽs |
process_crosslinks | 1.0204 ms | 30.306 ms |
process_rewards_and_penalties | 804.03 ÎĽs | 15.609 ms |
*process_ejections | 187.60 ÎĽs | 3.2257 ms |
*process_validator_registry | 187.88 ÎĽs | 8.2305 ms |
^update_active_tree_index_roots | 3.0462 ms | 82.961 ms |
*update_latest_slashed_balances | 400.25 ns | 952.32 ns |
clean_attestations | 23.615 ÎĽs | 322.11 ÎĽs |
per_epoch_processing | 6.4554 ms | 187.98 ms |
Benchmark | 16K Laptop | 300K Laptop |
---|---|---|
process_eth1_data | 415.11 ns | 1.3387 ÎĽs |
initialize_validator_statuses | 1.2862 ms | 46.470 ms |
process_justification | 411.57 ns | 1.3773 ÎĽs |
process_crosslinks | 1.2308 ms | 39.095 ms |
process_rewards_and_penalties | 1.0781 ms | 22.406 ms |
*process_ejections | 171.52 ÎĽs | 3.0041 ms |
*process_validator_registry | 376.89 ÎĽs | 9.9943 ms |
^update_active_tree_index_roots | 4.5167 ÎĽs | 119.122 ms |
*update_latest_slashed_balances | 606.76 ns | 1.3709 ÎĽs |
clean_attestations | 28.675 ÎĽs | 248.50 ÎĽs |
per_epoch_processing | 9.5174 ms | 262.29 ms |
* We did not add an ejections or registry changes. These times are best-case (not worst-case).
^ This time is tree-hashing the entire active validator indices. Technically it doesn't need to run at all because they haven't changed since the previous epoch.
The block-processing benches are tagged with the following codes:
- BIG: Worst-case.
- Maximum operations (e.g.,
MAX_ATTESTATIONS
attestations, etc.)
- Maximum operations (e.g.,
- SML: Reasonable-case
- 0 slashings
- 16 full attestations
- 2 deposits
- 2 exits
- 2 transfers
Benchmark | 16K BIG Desktop | 300K BIG Desktop | 300K SML Desktop | 4M SML Desktop | 4M BIG Desktop |
---|---|---|---|---|---|
verify_block_signature | 5.3024 ms | - | - | - | - |
process_randao | 5.2679 ms | - | - | - | - |
process_eth1_data | 229.31 ns | 1.4178 ÎĽs | - | - | - |
process_proposer_slashings | 37.108 ms | - | 1.4005 ÎĽs | - | - |
process_attester_slashings | 147.83 ms | - | 1.3960 ÎĽs | - | - |
process_attestations | 193.86 ms | 309.15ms | 48.02 ms | 393.63 ms | 2.7259 s |
*process_deposits | 18.492 ms | - | 8.0843 ms | 20.233 ms | 30.160 s |
process_exits | 18.835 ms | - | 6.6976 ms | - | - |
process_transfers | 18.686 ms | - | 6.4966 ms | - | - |
per_block_processing | 440.63 ms | 553.25 ms | 79.544 ms | 433.55 ms | 2.9739 s |
Note: 16K SML per_block_processing comes in at 64.077 ms.
Benchmark | 16K BIG Laptop | 300K BIG Laptop | 300K SML Laptop | 4M SML Laptop | 4M BIG Laptop |
---|---|---|---|---|---|
verify_block_signature | 7.1359 ms | - | - | - | - |
process_randao | 7.0675 ms | - | - | - | - |
process_eth1_data | 330.12 ns | 1.5468 ÎĽs | - | 4.2278 ÎĽs | - |
process_proposer_slashings | 119.46 ms | - | 1.8209 ÎĽs | - | - |
process_attester_slashings | 211.74 ms | - | 1.8208 ÎĽs | - | - |
process_attestations | 833.23 ms | 1.3424 s | 159.46 ms | 1.4417 s | 11.920 s |
*process_deposits | 60.300 ms | - | 9.7835 ms | 23.987 ms | 74.589 ms |
process_exits | 60.163 ms | - | 7.8856 ms - | - | |
process_transfers | 60.163 ms | - | 8.2653 ms | - | - |
per_block_processing | 1.3885 s | 1.819 s | 207.07 ms | 1.5126 s | 12.494 s |
* Merkle roots are not verified.
Note: 16K SML per_block_processing comes in at 152.70 ms.
Note: 4M BIG per_block_processing comes in at ~12 s, where 11.920 s is attestation verification.
All the previous benchmarks were done with pre-built committee and pubkey caches. These are the times to build those caches.
Benchmark | 16K Desktop | 300K Desktop | 16K Laptop | 300K Laptop | 4M Desktop |
---|---|---|---|---|---|
build_epoch_committee_cache | 7.4939 ms | 264.02 ms | 14.477 ms | 323.01 ms | 3.9832 s |
build_pubkey_cache | 14.874 ms | 339.41 ms | 30.178 ms | 488.15 ms | 5.0710 s |
Note: I once benched shuffling 4M validators at 1.2s. Epoch cache build times seem unreasonably high, I suspect they can be improved..
This is a full tree-hash without caching.
Benchmark | 16K Desktop | 300K Desktop | 16K Laptop | 300K Laptop |
---|---|---|---|---|
tree_hash_state | 81.444 ms | 1.3884 s | 121.80 ms | 1.8679 s |
tree_hash_block | 3.0570 ms (BIG) | 3.4629 ms (BIG) | 4.5881 ms (BIG) | 4.7180 ms (BIG) |
Benchmark | Desktop | Laptop |
---|---|---|
verify_a_signature | 5.2158 ms | 6.7476 ms |
aggregate_a_public_key | 32.311 ÎĽs | 42.000 ÎĽs |
pubkey_from_uncompressed_bytes | 826.60 ns | NA |
pubkey_from_compressed_bytes | 39.934 ÎĽs | NA |
Here we share the lessons we learned during optimisation. We hope this information will save time for other implementers.
We found it useful to consider rewards and penalties as a map against
state.validator_balances
. This allowed us to very easily use
rayon to do this map concurrently. The
specification is well designed for this purpose -- updating one validator's
balance only mutates state related to that validator.
Where we found speed improvements:
- Processing the validator rewards in parallel (as mentioned earlier). This
- Removing all
~O(n^2)
time blow-ups likeinclusion_distance
.
Most notable gains here were from introducing concurrency where hashing or signature verification is involved.
We found a ~50% increase in
processing AttesterSlashings
by first verifying all SlashableAttestations
in parallel before verifying each AttesterSlashing
.
We found some difficulties in quickly generating a HashMap
for checking that some
pubkey already exists in the validator registry. We found that going from our
BLS libraries object into bytes (for hashing) was rather slow. As such, we now
maintain a PublicKey -> ValidatorIndex
map alongside the state.
Function is run with the following inputs:
- There is only a single
Eth1DataVote
.
- Attempts to find an
Eth1Data
with a super-majority vote and update the state.
Function is run with the following inputs:
- A single pending attestation with full participation for each committee that is able to include an attestation in the state.
Performs a loop over all validators and determines
- Active validators
- Current balances
- Previous balances
Then loops through all the attestations and determines:
- Current epoch attesters
- Current epoch boundary attesters
- Previous epoch attesters
- Previous epoch boundary attesters
- Previous epoch head attesters
This info is used later on during epoch processing.
Function is run with the following inputs:
- Previous justified epoch is the previous epoch - 1.
- Justified epoch is the previous epoch.
- Justification bitfield has all bits set true.
- Previous and current boundary attesting balances are 100% of respective total balances.
- Performs the actions described here.
Function is run with the following inputs:
- A single pending attestation with full participation for each committee that is able to include an attestation in the state.
- Attempts to determine the winning_root for each shard.
- Updates
state_latest_crosslinks
where there is a winning root. - Builds a hashmap of
Shard -> WinningRoot
for later lookups.
Function is run with the following inputs:
- All validators are active.
- All validators performed their duties exactly and agreed upon everything.
- Epochs since finality is 4.
- Does some misc setups (determine latest epoch attestations, reward quotients, etc).
- Does as a loop across all participants in all committees in the previous epoch and determines if/how they should be rewarded for winning root participation.
- Does an parallel map across
state.validator_balances
and applies the justification and finalization, attestation inclusion and crosslinks rewards. - Loops through all validators and rewards the block proposer if they included a block.
Function is run with the following inputs:
- There are no ejections.
- Performs the actions described here.
Function is run with the following inputs:
state.finalized_epoch > state.validator_registry_update_epoch
- The epoch for all crosslinks is
> state.validator_registry_update_epoch
- There are no slashings.
- There are no exits.
- Performs the actions described here.
Function is run with the following inputs:
- All validators are active.
- Performs a tree-hash on the active validator indices.
- Updates
state.latest_active_index_roots
.
NA.
- Rotates
state.latest_slashed_balances
.
NA.
- Removes all attestations from the previous epoch from
state.pending_attestations
.
This is a full run of the epoch processing function upon a state with has the following characteristics:
- All validators are active.
- Epochs since finality == 4.
- Previous justified epoch == current_epoch - 3.
- Justified epoch == current_epoch - 2.
- Finalized epoch == current_epoch - 3.
- Full justification bitfield.
- All validators performed their duties exactly and agreed upon everything.
- Pending attestations includes a single attestation with full participation for each committee that is able to include an attestation in the state.
Note: this does not include tree-hashing the state.
- Runs all of the functions listed in this section (except tree hashing the state). Hopefully fully adheres to the per-slot processing.
This section provides detail on each benchmark. The following is provided for each benchmark:
- Benching Setup: details on the setup for the benchmark. E.g., how many objects, were verified etc.
- Function Description: a brief description how the benched function operates. E.g., concurrent operations, etc.
Function is run with the following inputs:
- A valid block signature.
- Determines the slot's block producer.
- Produces a signed root of the block's
Proposal
. - Verifies the block producer signed the proposal root.
Function is run with the following inputs:
- A valid
block.randao_reveal
.
- Determines the block proposer for the present slot.
- Verifies that
randao_reveal
is the proposer's signature acrosshash_tree_root(state.current_epoch())
. - Updates
state.latest_randao_mixes
with the new reveal.
Function is run with the following inputs:
- A matching
Eth1DataVote
does not exist in the state.
- Searches for a matching
Eth1DataVote
in the state. - If exists, increments the
vote
for that data. Otherwise adds a newEth1DataVote
.
Function is run with the following inputs:
MAX_PROPOSER_COUNT
count validProposerSlashings
.
- Verifies each
ProposerSlashing
in parallel. - If all are valid, slashes each validator sequentially.
Function is run with the following inputs:
MAX_ATTESTER_SLASHINGS
count validAttesterSlashings
.- Each
AttesterSlashing
hasMAX_INDICES_PER_SLASHABLE_VOTE
indices.
- Builds a list of references to each
SlashableAttestation
in theAttesterSlashings
(there are two in each). - Verifies all the
SlashableAttestations
in parallel. - If all
SlashableAttestations
are valid, performs the following actions sequentially:- Verifies each
AttesterSlashing
. - Slashes all appropriate validators.
- Verifies each
Function is run with the following inputs:
MAX_ATTESTATIONS
count validAttestations
. In the case that there are- not
MAX_ATTESTATION
committees available for the block, committees are split into two signing-groups until there are enoughAttestations
.
- Ensures the previous epoch cache is built (it is always built for these benches).
- Verifies each
Attestation
in parallel. - If all are valid, updates the state sequentially.
Function is run with the following inputs:
MAX_DEPOSITS
count validDeposits
.
- Partially-verifies each deposit in parallel.
- If all were valid, performs the following actions sequentially:
- Updates the already-built
PublicKey -> ValidatorIndex
map (for look-up of pre-existing validators). - Verifies the deposit index against
state.deposit_index
. - Loads the validator index (if any) from the hashmap.
- If validator exists, checks withdrawal credentials and updates balance. Otherwise, creates new validator.
- Updates the already-built
Function is run with the following inputs:
MAX_VOLUNTARY_EXITS
count validVoluntaryExits
.
- Verifies each exit in parallel.
- Updates the state sequentially.
Function is run with the following inputs:
MAX_TRANSFERS
count validTransfers
.- Each transfer transfers 1 Gwei from a withdrawn validator to itself.
- Verifies each transfer in parallel.
- Updates the state sequentially.
Function is run with the following inputs:
- A valid block.
- Maximum possible operations (
Attestations
,Transfers
, etc.). Each has the same characteristics as theprocess_...
functions above.
If any of the following functions return an error (e.g., invalid object) execution returns immediately.
- Checks the block slot against the state slot.
- Ensures epoch caches are built (they are always built for these benches).
- Verifies block signature
- Processes the randao reveal.
- Processes the eth1 data.
- Processes proposer slashings.
- Processes attester slashings.
- Processes attestations.
- Processes deposits.
- Processes exits.
- Processes transfers.
Our epoch caches are on a per-epoch basis and contain:
- Each committee for each slot of the epoch.
- A map of
ValidatorIndex -> (AttestationSlot, AttestationShard, CommitteeIndex)
for easy access of a validator's attestation duties. - A map of
Shard -> (EpochCommitteesIndex, SlotCommitteesIndex)
for easy access of a shard's committee.
The pubkey cache is a map of PublicKey -> ValidatorIndex
. It is used to speed
up deposit processing (checking to see if a pubkey exists in the validator
registry).
It is presently very slow because we need to convert our BLS libraries
PublicKey
struct into bytes. We have chosen to use uncompressed bytes for
the key of the hashmap as it is quicker.
Function is run with the following inputs:
- The state which is used in the epoch processing tests. See per_epoch_processing for details.
- Hashes as per the tree hash spec.
Function is run with the following inputs:
- The block which is used in the block processing tests. See per_block_processing for details.
- Hashes as per the tree hash spec.
- Arch Linux
- 6-core i7-8700K.
- 16GB DDR4 2400 Mhz.
- Lenovo X1 Carbon 5th Gen
- Arch Linux
- 2-core i5 7300U
- 16GB LPDDR3 1866 Mhz.
You can run these benchmarks on your local machine if you're an advanced user. There are three main steps:
- Setup the repo by cloning it and ensuring you have all the deps.
- Generate a file of keypairs to speed-up the setup for the benches. This can be skipped but it will greatly increase your setup time (e.g., it takes 5 mins to generate 4M keypairs on 6-cores however it takes 1s to read them from file.)
- Run the benches
- Clone the https://github.com/sigp/lighthouse repo at
sane-case
. - Follow the Running
instructions. You just need a standard Rust setup (
rustup
) and additionallyclang
andprotobuf
. - Check you can build by running the tests:
$ cargo test --all
This will create the following file: $HOME/.lighthouse/keypairs.raw_keypairs
.
It contains uncompressed, unencrypted BLS keypairs and massively speeds up setup
times for benches.
- Navigate to
beacon_node/beacon_chain/test_harness
- Run
cargo run --release -- gen_keys -n KEYS
whereKEYS
is the maximum number of validators you wish to bench with. If your keypairs file isn't big enough you'll get a panic about being unable to fill a buffer during benching.
Note: if you omit --release
key generation will be very slow.
Note: it takes my desktop 5 minutes to build 4M keys.
- Navigate to
eth2/state_processing
- Run
cargo bench
Note: you can filter benches. Using cargo bench block
will only run benches
where block
is in the title.
If you want to change the number of validators, change the 'VALIDATOR_COUNT' variable.