feat: simple bench target for measuring DB reads using ReadView #2433

netrome · 2024-11-13T15:18:54Z

Linked Issues/PRs

closes chore: Benchmark multi-get implementation #2422

Description

This PR adds a benchmark target for measuring end to end query times for transaction queries. The benchmark is a simple binary that submits a bunch of transactions, and later queries these transactions by owner several times. It is intended to be used together with benchmarking tools such as hyperfine.

netrome · 2024-11-13T15:20:15Z

Interesting observation, the current benchmark is slower with the multi-get implementation than with the plain db lookups.

On my build server I get the following results (having saved the end_to_end_query_times_reference and end_to_end_query_times_multi_get binaries from previous runs with cargo bench --bench end_to_end_query_times)

hyperfine /tmp/end_to_end_query_times_reference /tmp/end_to_end_query_times_multi_get
Benchmark 1: /tmp/end_to_end_query_times_reference
  Time (mean ± σ):      7.052 s ±  0.200 s    [User: 7.726 s, System: 0.937 s]
  Range (min … max):    6.768 s …  7.246 s    10 runs

Benchmark 2: /tmp/end_to_end_query_times_multi_get
  Time (mean ± σ):      7.172 s ±  0.119 s    [User: 7.833 s, System: 0.990 s]
  Range (min … max):    6.901 s …  7.288 s    10 runs

Summary
  '/tmp/end_to_end_query_times_reference' ran
    1.02 ± 0.03 times faster than '/tmp/end_to_end_query_times_multi_get'

netrome · 2024-11-14T21:16:08Z

Bumped the number of blocks (and therefore number of transactions to query) by a factor 10, but I still can't observe a notable difference between the multi-get implementation and the previous one.

hyperfine /tmp/end_to_end_query_times_long_reference /tmp/end_to_end_query_times_long_multi_get
Benchmark 1: /tmp/end_to_end_query_times_long_reference
  Time (mean ± σ):     68.403 s ±  0.345 s    [User: 77.285 s, System: 8.956 s]
  Range (min … max):   67.802 s … 69.064 s    10 runs

Benchmark 2: /tmp/end_to_end_query_times_long_multi_get
  Time (mean ± σ):     68.726 s ±  0.499 s    [User: 77.652 s, System: 8.834 s]
  Range (min … max):   67.866 s … 69.568 s    10 runs

Summary
  '/tmp/end_to_end_query_times_long_reference' ran
    1.00 ± 0.01 times faster than '/tmp/end_to_end_query_times_long_multi_get'

This could be due to a number of factors.

One loose theory is that most (or all) of the data is still within the RocksDB memtable, and that this makes the speed gain by batching with multi-get negligible.
Another theory is that the overhead of boxing the iterators outweighs the performance gain of the multi-get operation for these workloads.
It could also just be the case that database lookup times isn't the hot path of the current workloads. If so, we could consider writing more isolated benchmarks (though we already have the db_lookup_times), but we should also question the value of this optimization if it's not in the hot path for the graphql queries since that's where it's used right now.
Anything else not considered yet.

netrome · 2024-11-15T13:09:11Z

I've rewritten the workload right now to only measure the database interactions through the ReadDatabase type, which isolates the effect of the multi-get implementation more than the full GraphQL interaction. This just confirms what I've already observed in the end-to-end benchmarks; the multi-get implementation is slower (for this workload).

(With the implementation from 4e7d3b357cf3b7edec95ea2859df63cf14459a35)

hyperfine /tmp/batch_lookup_times_coins_reference /tmp/batch_lookup_times_coins_multi_get
Benchmark 1: /tmp/batch_lookup_times_coins_reference
  Time (mean ± σ):      6.418 s ±  0.131 s    [User: 6.258 s, System: 0.059 s]
  Range (min … max):    6.281 s …  6.647 s    10 runs

Benchmark 2: /tmp/batch_lookup_times_coins_multi_get
  Time (mean ± σ):      7.365 s ±  0.082 s    [User: 6.163 s, System: 1.105 s]
  Range (min … max):    7.239 s …  7.539 s    10 runs

netrome · 2024-11-15T20:48:21Z

Running the latest benches on my laptop only amplifies the difference:

Baseline:

hyperfine target/release/deps/end_to_end_query_times-731a676331ca9c3d
Benchmark 1: target/release/deps/end_to_end_query_times-731a676331ca9c3d
  Time (mean ± σ):     13.720 s ±  0.228 s    [User: 13.642 s, System: 0.059 s]
  Range (min … max):   13.427 s … 14.121 s    10 runs

With multi-get

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     24.500 s ±  1.737 s    [User: 23.881 s, System: 0.562 s]
  Range (min … max):   22.666 s … 27.756 s    10 runs

I've also experimented with disabling the rocksdb block cache, which didn't make any big difference for this workload.

For this workload, I can now safely say that the multi-get implementation is strictly worse. I suspect that this is due to the overhead caused by using dynamic dispatch for the iterators, but have yet to confirm this theory.

netrome · 2024-11-15T21:00:10Z

Reading up on multi-get, one performance benefit is reduced lock contention for concurrent reads - so I'll adjust the workload to see if I can measure anything interesting with many concurrent reads. https://github.com/facebook/rocksdb/wiki/MultiGet-Performance

netrome · 2024-11-15T21:32:19Z

Reading up on multi-get, one performance benefit is reduced lock contention for concurrent reads - so I'll adjust the workload to see if I can measure anything interesting with many concurrent reads. https://github.com/facebook/rocksdb/wiki/MultiGet-Performance

Multi-get is still slower. Measured at commit 5c261ee9897282ab4e5bbee6f79a030bac5d217d.

Reference:

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):      1.638 s ±  0.120 s    [User: 29.559 s, System: 0.233 s]
  Range (min … max):    1.389 s …  1.743 s    10 runs

Multi-get:

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):      1.962 s ±  0.039 s    [User: 20.858 s, System: 3.210 s]
  Range (min … max):    1.887 s …  2.019 s    10 runs

xgreenx

What batch size are you using by default? Based on the benchmarks from the @rymnc in #2142 , the multi-get is slower for small batch size. Have you tried batch size 250/500/1000?

xgreenx

Also, maybe you want to try to select random coins, not sequential.

netrome · 2024-11-18T08:19:55Z

What batch size are you using by default? Based on the benchmarks from the @rymnc in #2142 , the multi-get is slower for small batch size. Have you tried batch size 250/500/1000?

I've typically requested batches of 100k coins and transactions for most runs, but I have varied this parameter (num_queries * utxo_count_per_block) and haven't spotted any difference so far.

Also, maybe you want to try to select random coins, not sequential.

Good idea, I'll try that 👍

netrome · 2024-11-18T09:09:56Z

Tried selecting random coins (by shuffling the ID vec before each query), didn't notice any significant speed degradation in the base case.

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):      1.619 s ±  0.119 s    [User: 29.584 s, System: 0.247 s]
  Range (min … max):    1.415 s …  1.749 s    10 runs

diff relative 5c261ee9897282ab4e5bbee6f79a030bac5d217d

diff --git a/benches/benches/read_view_get_coins.rs b/benches/benches/read_view_get_coins.rs
index 0327b88937..75862a0210 100644
--- a/benches/benches/read_view_get_coins.rs
+++ b/benches/benches/read_view_get_coins.rs
@@ -24,6 +24,7 @@ use fuel_core_types::{
 };
 use rand::{
     rngs::StdRng,
+    seq::SliceRandom,
     SeedableRng,
 };

@@ -93,6 +94,9 @@ impl<Rng: rand::RngCore + rand::CryptoRng + Send> Harness<Rng> {
             transaction.commit().unwrap();
         }

+        // Shuffle so random reads aren't sequential
+        utxo_ids.shuffle(&mut self.rng);
+
         Ok(utxo_ids)
     }

@@ -105,7 +109,8 @@ impl<Rng: rand::RngCore + rand::CryptoRng + Send> Harness<Rng> {
         for _ in 0..self.params.num_queries {
             let on_chain_db = self.db.on_chain().clone();
             let off_chain_db = self.db.off_chain().clone();
-            let utxo_ids = utxo_ids.to_vec();
+            let mut utxo_ids = utxo_ids.to_vec();
+            utxo_ids.shuffle(&mut self.rng);

             let handle = tokio::spawn(async move {
                 let read_database =

netrome · 2024-11-18T21:32:04Z

Tried a new variant proposed by Green in a status meeting today. Only query a subset of all coins each time. With a database of 1M coins, querying 10k coins 1k times I get the following (at 87c4b01):

reference

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     11.798 s ±  3.070 s    [User: 39.691 s, System: 8.286 s]
  Range (min … max):   10.156 s … 20.117 s    10 runs

multi-get

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     12.922 s ±  3.347 s    [User: 44.348 s, System: 9.119 s]
  Range (min … max):   10.869 s … 20.437 s    10 runs

So multi-get still seems to be slower, but the difference seems less tangible than most recent runs. I'll see if I can find a parameter combination that changes this outcome.

netrome · 2024-11-18T22:21:25Z

Tried again with new parameters (querying 10k coins 1k times from a database with 10M coins) at c1f2605, multi-get is still slower.

Reference

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     57.364 s ±  5.088 s    [User: 612.581 s, System: 793.212 s]
  Range (min … max):   50.165 s … 62.967 s    10 runs

Multi-get

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     64.673 s ±  6.245 s    [User: 666.976 s, System: 977.544 s]
  Range (min … max):   54.355 s … 71.464 s    10 runs

netrome linked an issue Nov 13, 2024 that may be closed by this pull request

chore: Benchmark multi-get implementation #2422

Open

netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch 2 times, most recently from c930805 to 8a2a490 Compare November 14, 2024 09:28

netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch 2 times, most recently from f15f180 to 4e7d3b3 Compare November 15, 2024 13:06

netrome changed the title ~~feat: simple bench target for measuring end to end query times~~ feat: simple bench target for measuring DB reads using ReadView Nov 15, 2024

netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch from 4e7d3b3 to 7a2be1b Compare November 15, 2024 20:58

feat: simple bench target for measuring coin batch lookup times

5c261ee

netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch from 7a2be1b to 5c261ee Compare November 15, 2024 21:31

netrome mentioned this pull request Nov 15, 2024

perf: Add support for multi get operation for database queries #2396

Open

2 tasks

xgreenx reviewed Nov 16, 2024

View reviewed changes

feat: Shuffle utxo IDs

5717e59

feat: Only query a subset of all utxos each time

87c4b01

feat: Use .choose() to select random utxos + parameter tuning

c1f2605

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: simple bench target for measuring DB reads using ReadView #2433

feat: simple bench target for measuring DB reads using ReadView #2433

netrome commented Nov 13, 2024

netrome commented Nov 13, 2024

netrome commented Nov 14, 2024 •

edited

Loading

netrome commented Nov 15, 2024

netrome commented Nov 15, 2024 •

edited

Loading

netrome commented Nov 15, 2024

netrome commented Nov 15, 2024

xgreenx left a comment

xgreenx left a comment

netrome commented Nov 18, 2024

netrome commented Nov 18, 2024 •

edited

Loading

netrome commented Nov 18, 2024 •

edited

Loading

netrome commented Nov 18, 2024

feat: simple bench target for measuring DB reads using ReadView #2433

Are you sure you want to change the base?

feat: simple bench target for measuring DB reads using ReadView #2433

Conversation

netrome commented Nov 13, 2024

Linked Issues/PRs

Description

netrome commented Nov 13, 2024

netrome commented Nov 14, 2024 • edited Loading

netrome commented Nov 15, 2024

netrome commented Nov 15, 2024 • edited Loading

netrome commented Nov 15, 2024

netrome commented Nov 15, 2024

xgreenx left a comment

Choose a reason for hiding this comment

xgreenx left a comment

Choose a reason for hiding this comment

netrome commented Nov 18, 2024

netrome commented Nov 18, 2024 • edited Loading

netrome commented Nov 18, 2024 • edited Loading

netrome commented Nov 18, 2024

netrome commented Nov 14, 2024 •

edited

Loading

netrome commented Nov 15, 2024 •

edited

Loading

netrome commented Nov 18, 2024 •

edited

Loading

netrome commented Nov 18, 2024 •

edited

Loading