Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Add a benchmark for snapshot::account::to_fat_rlps() #11185

Merged
merged 6 commits into from
Nov 15, 2019

Conversation

dvdplm
Copy link
Collaborator

@dvdplm dvdplm commented Oct 21, 2019

to_fat_rlps() is a hot call during snapshots. I don't think it has a perf problem per se, but better to have benchmark for it.

The RLP data used in the benchmark is collected from Mainnet/Ropsten and sized to the 99-percentile of account sizes.

Example results:

to_fat_rlps, 5282 bytes, (0x1041…cf2d)
                        time:   [7.0673 us 7.1645 us 7.2607 us]
                        change: [-4.7132% -1.9970% +0.9411%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

to_fat_rlps, 5908 bytes, (0x104f…f0ea)
                        time:   [9.4877 us 9.6229 us 9.7568 us]
                        change: [-30.253% -26.201% -21.615%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

to_fat_rlps, 6344 bytes, (0x2032…8275)
                        time:   [9.3738 us 9.5009 us 9.6372 us]
                        change: [-3.3891% -0.7888% +1.9781%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

to_fat_rlps, 6723 bytes, (0x2075…0dd7)
                        time:   [11.733 us 11.864 us 12.005 us]
                        change: [-8.5366% -5.4752% -2.5367%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

to_fat_rlps, 6936 bytes, (0x3042…f803)
                        time:   [935.05 ns 946.57 ns 958.71 ns]
                        change: [-4.2292% -1.3360% +1.5940%] (p = 0.39 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

The last chunk, 6936 bytes, (0x3042…f803), is from mainnet. I'm a little surprised to see such a large difference between Ropsten and mainnet, but maybe there's a reason for that?

`to_fat_rlps()` is a hot call during snapshots. I don't think it has a perf problem per se, but better to have benchmark for it.

The data used is a piece of Ropsten data sized to the ~95% percentile of account size on that network.
@dvdplm dvdplm self-assigned this Oct 22, 2019
@dvdplm dvdplm added A0-pleasereview 🤓 Pull request needs code review. A2-insubstantial 👶 Pull request requires no code review (e.g., a sub-repository hash update). F4-tests 💻 Tests need fixing, improving or augmenting. labels Oct 22, 2019
@dvdplm dvdplm marked this pull request as ready for review October 22, 2019 11:11
Copy link
Collaborator

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one question.

let basic_account: BasicAccount = rlp::decode(&*account_data).expect("rlp from disk is ok");
let account_db = AccountDB::from_hash(hashdb, account_hash);
let progress = Progress::new();
let mut used_code = HashSet::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should used_code and progress be moved inside b.iter? not sure which case we're trying to measure

Copy link
Collaborator Author

@dvdplm dvdplm Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you mean "which case"? You mean inside to_fat_rlps()? It's a big method and I'm not specifically targeting a particular branch but it started more from the observation of "wow, a loop in a loop in there, and I have no idea how many iterations this typically does" and then, looking at the tests which are rather artificial, I thought I'd collect some real data and see if call in its entirety is actually slow or not.
Turns out it's not, ~10us (and I'm using an input that is much bigger than the average). I ran a few tests to check what typical loop counts are: outer loop once, sometimes twice, inner loop was like 5 times. Then I collected histograms of the # of bytes returned from here and it's mostly small and the 16 parts of the state seem mostly homogeneous in this sense. If there are "lumps" in the data I didn't find it.

Not sure if I have answered your question though! I don't think either needs to be in b.iter() but maybe I'm misunderstanding you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean inside to_fat_rlps. The perf is different depending on the content of used_code and you're passing the same reference to the used_code to b.iter() which runs the code in a loop.
I've got the following difference when moving the used_code inside b.iter()

to_fat_rlps, 5282 bytes, (0x1041…cf2d)                                                                               
                        time:   [6.2453 us 6.2689 us 6.2954 us]
                        change: [+18.545% +20.243% +23.000%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

to_fat_rlps, 5908 bytes, (0x104f…f0ea)                                                                               
                        time:   [7.8668 us 7.8816 us 7.8975 us]
                        change: [+6.5079% +7.2548% +7.9737%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

to_fat_rlps, 6344 bytes, (0x20328275)                                                                               
                        time:   [7.8879 us 7.9030 us 7.9183 us]
                        change: [+10.102% +10.399% +10.715%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

to_fat_rlps, 6723 bytes, (0x20750dd7)                                                                               
                        time:   [9.8169 us 9.8328 us 9.8490 us]
                        change: [+8.6896% +9.0660% +9.4109%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

to_fat_rlps, 6936 bytes, (0x3042…f803)                                                                               
                        time:   [1.1720 us 1.1759 us 1.1808 us]
                        change: [+175.43% +177.79% +179.97%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

the biggest difference is for the last chunk

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, good catch. Ok, let's move it inside then.

That last chunk is from mainnet btw, the others from ropsten.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, seeing the same effect:

     Running /Users/dvd/dev/parity/parity-ethereum/target/release/deps/to_fat_rlps-25811198c1c2ab93
to_fat_rlps, 5282 bytes, (0x1041…cf2d)
                        time:   [8.6684 us 8.8360 us 9.0185 us]
                        change: [+19.126% +22.714% +26.009%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

to_fat_rlps, 5908 bytes, (0x104f…f0ea)
                        time:   [10.474 us 10.588 us 10.710 us]
                        change: [+8.7095% +13.323% +17.690%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) high mild
  1 (1.00%) high severe

to_fat_rlps, 6344 bytes, (0x2032…8275)
                        time:   [10.401 us 10.536 us 10.687 us]
                        change: [+8.6806% +12.271% +15.996%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

to_fat_rlps, 6723 bytes, (0x2075…0dd7)
                        time:   [12.859 us 13.089 us 13.350 us]
                        change: [+9.0429% +12.423% +16.180%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

to_fat_rlps, 6936 bytes, (0x3042…f803)
                        time:   [2.0104 us 2.0549 us 2.1035 us]
                        change: [+113.79% +122.66% +131.16%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looking again I have doubts: in chunk_state() the use_code is instantiate once per part and re-used for all the millions of calls to to_fat_rlps() so maybe I got it right the first time? :/

@ordian ordian added this to the 2.7 milestone Oct 23, 2019
* master: (21 commits)
  ropsten #6631425 foundation #8798209 (#11201)
  Update list of bootnodes for xDai chain (#11236)
  ethcore/res: add mordor testnet configuration (#11200)
  [chain specs]: activate `Istanbul` on mainnet (#11228)
  [builtin]: support `multiple prices and activations` in chain spec (#11039)
  Insert explicit warning into the panic hook (#11225)
  Snapshot restoration overhaul (#11219)
  Fix docker centos build (#11226)
  retry on gitlab system failures (#11222)
  Update bootnodes. (#11203)
  Use provided usd-per-eth value if an endpoint is specified (#11209)
  Use a lock instead of atomics for snapshot Progress (#11197)
  [informant]: `MillisecondDuration` -> `as_millis()` (#11211)
  Step duration map configuration parameter ported from the POA Network fork (#10902)
  Upgrade jsonrpc to latest (#11206)
  [export hardcoded sync]: use debug for `H256` (#11204)
  Pause pruning while snapshotting (#11178)
  Type annotation for next_key() matching of json filter options (#11192)
  Crypto primitives removed from ethkey (#11174)
  Made ecrecover implementation trait public (#11188)
  ...
@dvdplm dvdplm requested a review from ordian November 6, 2019 10:36
@ordian ordian added A7-looksgoodtestsfail 🤖 Pull request is reviewed well, but cannot be merged due to tests failing. and removed A0-pleasereview 🤓 Pull request needs code review. labels Nov 8, 2019
@dvdplm dvdplm merged commit eb565a7 into master Nov 15, 2019
@ordian ordian deleted the dp/chore/add-bench-for-to_fat_rlps branch November 15, 2019 11:34
dvdplm added a commit that referenced this pull request Nov 20, 2019
* master:
  Clarify what first_block `None` means (#11269)
  removed redundant VMType enum with one variant (#11266)
  Ensure jsonrpc threading settings are sane (#11267)
  Return Ok(None) when the registrar contract returns empty slice (#11257)
  Add a benchmark for snapshot::account::to_fat_rlps() (#11185)
  Fix misc compile warnings (#11258)
  simplify verification (#11249)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A2-insubstantial 👶 Pull request requires no code review (e.g., a sub-repository hash update). A7-looksgoodtestsfail 🤖 Pull request is reviewed well, but cannot be merged due to tests failing. F4-tests 💻 Tests need fixing, improving or augmenting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants