Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block processing time estimates at scale #103

Closed
djrtwo opened this issue Sep 18, 2018 · 8 comments
Closed

Block processing time estimates at scale #103

djrtwo opened this issue Sep 18, 2018 · 8 comments
Labels

Comments

@djrtwo
Copy link
Contributor

djrtwo commented Sep 18, 2018

Issue

In the last eth2.0 implementers call, we decided it would be worthwhile to run some timing analysis on processing blocks with real-world amounts of attestations.

It would be great to get results from at least one other client. I know not everyone has a working BLS aggregate implementation yet, but anyone that does should give this a try and report results.

Proposed Implementation

Assuming 10M eth deposited puts us at ~300k validators. With 64 slots, that is ~5000 validators per slot. With 1000 shards divided across the 64 slots, that is ~16 shards per slot.

If all of the validators coordinate and vote on the same crosslink and their attestations are aggregated and include in the next slot, then there will be 16 attestations of ~300 validators each per block. This is a good place to start.

We can then make this estimate a worse case by assuming the validators split their votes across 2, 3, 4, or even 5 different crosslink candidates. If all committees split their votes across 2 candidates, then there would be 32 attestations per block each with ~150 validators each.

EDIT
My estimates on number of committees and size of committees were a bit off in practice. When using BenchmarkParams { total_validators: 312500, cycle_length: 64, shard_count: 1024, shards_per_slot: 16, validators_per_shard: 305, min_committee_size: 128 }, each slot has approximately 20 committees of size 244 (rather than 16 of ~300). This shouldn't drastically change the output, but is a better target because it reflects the actual shuffling alg
(cc: @paulhauner)

EDIT2
My original assumption was correct and the spec was incorrect! Go with the original estimates

@djrtwo djrtwo changed the title Block processing estimates Block processing time estimates Sep 18, 2018
@djrtwo djrtwo changed the title Block processing time estimates Block processing time estimates at scale at scale Sep 18, 2018
@ChihChengLiang ChihChengLiang changed the title Block processing time estimates at scale at scale Block processing time estimates at scale Sep 18, 2018
@paulhauner
Copy link

paulhauner commented Sep 18, 2018

For the record, lighthouse is making this a priority. Thanks for the details on what to test :)

We'll come back here with any questions/comments.

@djrtwo
Copy link
Contributor Author

djrtwo commented Sep 27, 2018

Current python benchmarks (on standard consumer laptop with tons of tabs open and music playing)

num_attestations validators_per_attestation total_validators total_deposits block_process_seconds crystallized_state_bytes active_state_bytes block_bytes
2 244 31250 1000000 0.1017 4056416 4494 562
16 305 312500 10000000 1.0099 40074336 7324 3392
16 3051 3125000 100000000 10.1837 400074336 12812 8880

raw csv here https://gist.github.com/djrtwo/663a031c984ef4796a9aff2ba68d03e5

Notes:

  • the active state size is after one block has been processed
  • Only in memory. No on-disk DB used
  • no concurrency used

@paulhauner
Copy link

paulhauner commented Sep 27, 2018

Here are some timings from Lighthouse (I'll add @djrtwo's first scenario once I complete it):

Computer: Lenovo X1 Carbon 5th Gen with an Intel i5-7300U @ 2.60GHz running Arch Linux with each core idling around 2-8% before tests.

num_attestations validators_per_attestation total_validators total_deposits block_process_seconds
2 244 31250 1000000 N/A
16 305 312500 10000000 0.066773104
16 3051 3125000 100000000 0.249065226

Note: this is using an in-memory database. @djrtwo were you using an on-disk DB at all?

Note: we're using concurrency for attestation validation.

@djrtwo
Copy link
Contributor Author

djrtwo commented Sep 27, 2018

Curious about if you see a closer 10x different when you remove the concurrency @paulhauner

@paulhauner
Copy link

Without concurrency:

num_attestations validators_per_attestation total_validators total_deposits block_process_seconds
2 244 31250 1000000 N/A
16 305 312500 10000000 0.125273217
16 3051 3125000 100000000 0.450318683

That ~4x difference still holds.

In these benches I'm starting with a SSZ serialized block and then de-serializing it (and all the AttestationRecords) inside this benchmark. Are you doing the same thing @djrtwo? If not, maybe we're seeing a constant SSZ de-serialization overhead in lighthouse that we're not seeing in beacon_chain?

@djrtwo
Copy link
Contributor Author

djrtwo commented Sep 29, 2018

That's it. I'm not clocking the deserialization.

Both are interesting numbers. I was looking specifically for block validity and primarily at the signatures because this was our estimated bottleneck when designing the protocol.

Let's see what it is without the deserialize.

@paulhauner
Copy link

paulhauner commented Sep 29, 2018

Presently lighthouse is structured to do "just-in-time" deserialization, where each AttestationRecord is deserialized immediately before it is verified. The idea is that if someone sends us a bad block we de-serialize at little as possible before discovering that it's bad.

I mention this for two reasons; (a) cause it's a fun fact and (b) to indicate that it'll take some amount of hacky refactoring to make them "no deserialize" tests and therefore I can get these done later today or tomorrow morning :)

On a side note; at some point it would be useful to get "bad block" benchmarks from clients. I.e., how quickly can you reject a bad block? I'm well on a tangent now, but it would also be worth considering introducing some form of entropy into the order in which AttestationRecords are verified inside a client so that there's not some "ideal resource-consuming block" that can be formed by an attacker. (E.g., make the last attestation bad and you know they'll check each one before it). Probably just maybe doing concurrency (based on # of available cores) and maybe reversing the order (based on "coin-flip") would be enough.

@kaibakker
Copy link

Looks great! Would performance increase when a vote wouldn't include a source as described here: https://ethresear.ch/t/should-we-simplify-casper-votes-to-remove-the-source-param/3549 ?

@hwwhww hwwhww closed this as completed Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants
@djrtwo @kaibakker @paulhauner @hwwhww and others