Add a benchmark for snapshot::account::to_fat_rlps() #11185

dvdplm · 2019-10-21T13:46:30Z

to_fat_rlps() is a hot call during snapshots. I don't think it has a perf problem per se, but better to have benchmark for it.

The RLP data used in the benchmark is collected from Mainnet/Ropsten and sized to the 99-percentile of account sizes.

Example results:

to_fat_rlps, 5282 bytes, (0x1041…cf2d)
                        time:   [7.0673 us 7.1645 us 7.2607 us]
                        change: [-4.7132% -1.9970% +0.9411%] (p = 0.19 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

to_fat_rlps, 5908 bytes, (0x104f…f0ea)
                        time:   [9.4877 us 9.6229 us 9.7568 us]
                        change: [-30.253% -26.201% -21.615%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

to_fat_rlps, 6344 bytes, (0x2032…8275)
                        time:   [9.3738 us 9.5009 us 9.6372 us]
                        change: [-3.3891% -0.7888% +1.9781%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

to_fat_rlps, 6723 bytes, (0x2075…0dd7)
                        time:   [11.733 us 11.864 us 12.005 us]
                        change: [-8.5366% -5.4752% -2.5367%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

to_fat_rlps, 6936 bytes, (0x3042…f803)
                        time:   [935.05 ns 946.57 ns 958.71 ns]
                        change: [-4.2292% -1.3360% +1.5940%] (p = 0.39 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

The last chunk, 6936 bytes, (0x3042…f803), is from mainnet. I'm a little surprised to see such a large difference between Ropsten and mainnet, but maybe there's a reason for that?

`to_fat_rlps()` is a hot call during snapshots. I don't think it has a perf problem per se, but better to have benchmark for it. The data used is a piece of Ropsten data sized to the ~95% percentile of account size on that network.

ordian

Looks good, just one question.

ordian · 2019-10-23T16:10:31Z

ethcore/snapshot/benches/to_fat_rlps.rs

+		let basic_account: BasicAccount = rlp::decode(&*account_data).expect("rlp from disk is ok");
+		let account_db = AccountDB::from_hash(hashdb, account_hash);
+		let progress = Progress::new();
+		let mut used_code = HashSet::new();


should used_code and progress be moved inside b.iter? not sure which case we're trying to measure

How do you mean "which case"? You mean inside to_fat_rlps()? It's a big method and I'm not specifically targeting a particular branch but it started more from the observation of "wow, a loop in a loop in there, and I have no idea how many iterations this typically does" and then, looking at the tests which are rather artificial, I thought I'd collect some real data and see if call in its entirety is actually slow or not.
Turns out it's not, ~10us (and I'm using an input that is much bigger than the average). I ran a few tests to check what typical loop counts are: outer loop once, sometimes twice, inner loop was like 5 times. Then I collected histograms of the # of bytes returned from here and it's mostly small and the 16 parts of the state seem mostly homogeneous in this sense. If there are "lumps" in the data I didn't find it.

Not sure if I have answered your question though! I don't think either needs to be in b.iter() but maybe I'm misunderstanding you.

Yes, I mean inside to_fat_rlps. The perf is different depending on the content of used_code and you're passing the same reference to the used_code to b.iter() which runs the code in a loop.
I've got the following difference when moving the used_code inside b.iter()

to_fat_rlps, 5282 bytes, (0x1041…cf2d) time: [6.2453 us 6.2689 us 6.2954 us] change: [+18.545% +20.243% +23.000%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe to_fat_rlps, 5908 bytes, (0x104f…f0ea) time: [7.8668 us 7.8816 us 7.8975 us] change: [+6.5079% +7.2548% +7.9737%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe to_fat_rlps, 6344 bytes, (0x2032…8275) time: [7.8879 us 7.9030 us 7.9183 us] change: [+10.102% +10.399% +10.715%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe to_fat_rlps, 6723 bytes, (0x2075…0dd7) time: [9.8169 us 9.8328 us 9.8490 us] change: [+8.6896% +9.0660% +9.4109%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild to_fat_rlps, 6936 bytes, (0x3042…f803) time: [1.1720 us 1.1759 us 1.1808 us] change: [+175.43% +177.79% +179.97%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe

the biggest difference is for the last chunk

Interesting, good catch. Ok, let's move it inside then.

That last chunk is from mainnet btw, the others from ropsten.

Yup, seeing the same effect:

Running /Users/dvd/dev/parity/parity-ethereum/target/release/deps/to_fat_rlps-25811198c1c2ab93 to_fat_rlps, 5282 bytes, (0x1041…cf2d) time: [8.6684 us 8.8360 us 9.0185 us] change: [+19.126% +22.714% +26.009%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe to_fat_rlps, 5908 bytes, (0x104f…f0ea) time: [10.474 us 10.588 us 10.710 us] change: [+8.7095% +13.323% +17.690%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) high mild 1 (1.00%) high severe to_fat_rlps, 6344 bytes, (0x2032…8275) time: [10.401 us 10.536 us 10.687 us] change: [+8.6806% +12.271% +15.996%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe to_fat_rlps, 6723 bytes, (0x2075…0dd7) time: [12.859 us 13.089 us 13.350 us] change: [+9.0429% +12.423% +16.180%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 7 (7.00%) high mild 7 (7.00%) high severe to_fat_rlps, 6936 bytes, (0x3042…f803) time: [2.0104 us 2.0549 us 2.1035 us] change: [+113.79% +122.66% +131.16%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe

Hmm, looking again I have doubts: in chunk_state() the use_code is instantiate once per part and re-used for all the millions of calls to to_fat_rlps() so maybe I got it right the first time? :/

This reverts commit cff33ab.

* master: (21 commits) ropsten #6631425 foundation #8798209 (#11201) Update list of bootnodes for xDai chain (#11236) ethcore/res: add mordor testnet configuration (#11200) [chain specs]: activate `Istanbul` on mainnet (#11228) [builtin]: support `multiple prices and activations` in chain spec (#11039) Insert explicit warning into the panic hook (#11225) Snapshot restoration overhaul (#11219) Fix docker centos build (#11226) retry on gitlab system failures (#11222) Update bootnodes. (#11203) Use provided usd-per-eth value if an endpoint is specified (#11209) Use a lock instead of atomics for snapshot Progress (#11197) [informant]: `MillisecondDuration` -> `as_millis()` (#11211) Step duration map configuration parameter ported from the POA Network fork (#10902) Upgrade jsonrpc to latest (#11206) [export hardcoded sync]: use debug for `H256` (#11204) Pause pruning while snapshotting (#11178) Type annotation for next_key() matching of json filter options (#11192) Crypto primitives removed from ethkey (#11174) Made ecrecover implementation trait public (#11188) ...

* master: Clarify what first_block `None` means (#11269) removed redundant VMType enum with one variant (#11266) Ensure jsonrpc threading settings are sane (#11267) Return Ok(None) when the registrar contract returns empty slice (#11257) Add a benchmark for snapshot::account::to_fat_rlps() (#11185) Fix misc compile warnings (#11258) simplify verification (#11249)

dvdplm added 2 commits October 21, 2019 15:44

Add a benchmark for snapshot::account::to_fat_rlps()

ef2f0f4

`to_fat_rlps()` is a hot call during snapshots. I don't think it has a perf problem per se, but better to have benchmark for it. The data used is a piece of Ropsten data sized to the ~95% percentile of account size on that network.

Benchmark with more chunks, including mainnet data

6ea7cf8

dvdplm self-assigned this Oct 22, 2019

dvdplm added A0-pleasereview 🤓 Pull request needs code review. A2-insubstantial 👶 Pull request requires no code review (e.g., a sub-repository hash update). F4-tests 💻 Tests need fixing, improving or augmenting. labels Oct 22, 2019

dvdplm marked this pull request as ready for review October 22, 2019 11:11

whitespace

b1964c8

ordian approved these changes Oct 23, 2019

View reviewed changes

ordian added this to the 2.7 milestone Oct 23, 2019

dvdplm added 3 commits October 23, 2019 21:45

Move used_code inside the benchmark iteration

cff33ab

Revert "Move used_code inside the benchmark iteration"

1dad95c

This reverts commit cff33ab.

dvdplm requested a review from ordian November 6, 2019 10:36

ordian approved these changes Nov 7, 2019

View reviewed changes

seunlanlege approved these changes Nov 8, 2019

View reviewed changes

ordian added A7-looksgoodtestsfail 🤖 Pull request is reviewed well, but cannot be merged due to tests failing. and removed A0-pleasereview 🤓 Pull request needs code review. labels Nov 8, 2019

dvdplm merged commit eb565a7 into master Nov 15, 2019

ordian deleted the dp/chore/add-bench-for-to_fat_rlps branch November 15, 2019 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a benchmark for snapshot::account::to_fat_rlps() #11185

Add a benchmark for snapshot::account::to_fat_rlps() #11185

dvdplm commented Oct 21, 2019 •

edited

Loading

ordian left a comment

ordian Oct 23, 2019

dvdplm Oct 23, 2019 •

edited

Loading

ordian Oct 23, 2019

dvdplm Oct 23, 2019

dvdplm Oct 23, 2019

dvdplm Oct 23, 2019

Add a benchmark for snapshot::account::to_fat_rlps() #11185

Add a benchmark for snapshot::account::to_fat_rlps() #11185

Conversation

dvdplm commented Oct 21, 2019 • edited Loading

ordian left a comment

Choose a reason for hiding this comment

ordian Oct 23, 2019

Choose a reason for hiding this comment

dvdplm Oct 23, 2019 • edited Loading

Choose a reason for hiding this comment

ordian Oct 23, 2019

Choose a reason for hiding this comment

dvdplm Oct 23, 2019

Choose a reason for hiding this comment

dvdplm Oct 23, 2019

Choose a reason for hiding this comment

dvdplm Oct 23, 2019

Choose a reason for hiding this comment

dvdplm commented Oct 21, 2019 •

edited

Loading

dvdplm Oct 23, 2019 •

edited

Loading