Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: simple bench target for measuring DB reads using ReadView #2433

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

netrome
Copy link
Contributor

@netrome netrome commented Nov 13, 2024

Linked Issues/PRs

Description

This PR adds a benchmark target for measuring end to end query times for transaction queries. The benchmark is a simple binary that submits a bunch of transactions, and later queries these transactions by owner several times. It is intended to be used together with benchmarking tools such as hyperfine.

@netrome netrome linked an issue Nov 13, 2024 that may be closed by this pull request
@netrome
Copy link
Contributor Author

netrome commented Nov 13, 2024

Interesting observation, the current benchmark is slower with the multi-get implementation than with the plain db lookups.

On my build server I get the following results (having saved the end_to_end_query_times_reference and end_to_end_query_times_multi_get binaries from previous runs with cargo bench --bench end_to_end_query_times)

hyperfine /tmp/end_to_end_query_times_reference /tmp/end_to_end_query_times_multi_get
Benchmark 1: /tmp/end_to_end_query_times_reference
  Time (mean ± σ):      7.052 s ±  0.200 s    [User: 7.726 s, System: 0.937 s]
  Range (min … max):    6.768 s …  7.246 s    10 runs

Benchmark 2: /tmp/end_to_end_query_times_multi_get
  Time (mean ± σ):      7.172 s ±  0.119 s    [User: 7.833 s, System: 0.990 s]
  Range (min … max):    6.901 s …  7.288 s    10 runs

Summary
  '/tmp/end_to_end_query_times_reference' ran
    1.02 ± 0.03 times faster than '/tmp/end_to_end_query_times_multi_get'

@netrome netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch 2 times, most recently from c930805 to 8a2a490 Compare November 14, 2024 09:28
@netrome
Copy link
Contributor Author

netrome commented Nov 14, 2024

Bumped the number of blocks (and therefore number of transactions to query) by a factor 10, but I still can't observe a notable difference between the multi-get implementation and the previous one.

hyperfine /tmp/end_to_end_query_times_long_reference /tmp/end_to_end_query_times_long_multi_get
Benchmark 1: /tmp/end_to_end_query_times_long_reference
  Time (mean ± σ):     68.403 s ±  0.345 s    [User: 77.285 s, System: 8.956 s]
  Range (min … max):   67.802 s … 69.064 s    10 runs

Benchmark 2: /tmp/end_to_end_query_times_long_multi_get
  Time (mean ± σ):     68.726 s ±  0.499 s    [User: 77.652 s, System: 8.834 s]
  Range (min … max):   67.866 s … 69.568 s    10 runs

Summary
  '/tmp/end_to_end_query_times_long_reference' ran
    1.00 ± 0.01 times faster than '/tmp/end_to_end_query_times_long_multi_get'

This could be due to a number of factors.

  1. One loose theory is that most (or all) of the data is still within the RocksDB memtable, and that this makes the speed gain by batching with multi-get negligible.
  2. Another theory is that the overhead of boxing the iterators outweighs the performance gain of the multi-get operation for these workloads.
  3. It could also just be the case that database lookup times isn't the hot path of the current workloads. If so, we could consider writing more isolated benchmarks (though we already have the db_lookup_times), but we should also question the value of this optimization if it's not in the hot path for the graphql queries since that's where it's used right now.
  4. Anything else not considered yet.

@netrome netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch 2 times, most recently from f15f180 to 4e7d3b3 Compare November 15, 2024 13:06
@netrome
Copy link
Contributor Author

netrome commented Nov 15, 2024

I've rewritten the workload right now to only measure the database interactions through the ReadDatabase type, which isolates the effect of the multi-get implementation more than the full GraphQL interaction. This just confirms what I've already observed in the end-to-end benchmarks; the multi-get implementation is slower (for this workload).

(With the implementation from 4e7d3b357cf3b7edec95ea2859df63cf14459a35)

hyperfine /tmp/batch_lookup_times_coins_reference /tmp/batch_lookup_times_coins_multi_get
Benchmark 1: /tmp/batch_lookup_times_coins_reference
  Time (mean ± σ):      6.418 s ±  0.131 s    [User: 6.258 s, System: 0.059 s]
  Range (min … max):    6.281 s …  6.647 s    10 runs

Benchmark 2: /tmp/batch_lookup_times_coins_multi_get
  Time (mean ± σ):      7.365 s ±  0.082 s    [User: 6.163 s, System: 1.105 s]
  Range (min … max):    7.239 s …  7.539 s    10 runs

@netrome netrome changed the title feat: simple bench target for measuring end to end query times feat: simple bench target for measuring DB reads using ReadView Nov 15, 2024
@netrome
Copy link
Contributor Author

netrome commented Nov 15, 2024

Running the latest benches on my laptop only amplifies the difference:

Baseline:

hyperfine target/release/deps/end_to_end_query_times-731a676331ca9c3d
Benchmark 1: target/release/deps/end_to_end_query_times-731a676331ca9c3d
  Time (mean ± σ):     13.720 s ±  0.228 s    [User: 13.642 s, System: 0.059 s]
  Range (min … max):   13.427 s … 14.121 s    10 runs

With multi-get

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     24.500 s ±  1.737 s    [User: 23.881 s, System: 0.562 s]
  Range (min … max):   22.666 s … 27.756 s    10 runs

I've also experimented with disabling the rocksdb block cache, which didn't make any big difference for this workload.

For this workload, I can now safely say that the multi-get implementation is strictly worse. I suspect that this is due to the overhead caused by using dynamic dispatch for the iterators, but have yet to confirm this theory.

@netrome netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch from 4e7d3b3 to 7a2be1b Compare November 15, 2024 20:58
@netrome
Copy link
Contributor Author

netrome commented Nov 15, 2024

Reading up on multi-get, one performance benefit is reduced lock contention for concurrent reads - so I'll adjust the workload to see if I can measure anything interesting with many concurrent reads. https://github.com/facebook/rocksdb/wiki/MultiGet-Performance

@netrome netrome force-pushed the 2422-chore-benchmark-multi-get-implementation branch from 7a2be1b to 5c261ee Compare November 15, 2024 21:31
@netrome
Copy link
Contributor Author

netrome commented Nov 15, 2024

Reading up on multi-get, one performance benefit is reduced lock contention for concurrent reads - so I'll adjust the workload to see if I can measure anything interesting with many concurrent reads. https://github.com/facebook/rocksdb/wiki/MultiGet-Performance

Multi-get is still slower. Measured at commit 5c261ee9897282ab4e5bbee6f79a030bac5d217d.

Reference:

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):      1.638 s ±  0.120 s    [User: 29.559 s, System: 0.233 s]
  Range (min … max):    1.389 s …  1.743 s    10 runs

Multi-get:

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):      1.962 s ±  0.039 s    [User: 20.858 s, System: 3.210 s]
  Range (min … max):    1.887 s …  2.019 s    10 runs

Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What batch size are you using by default? Based on the benchmarks from the @rymnc in #2142 , the multi-get is slower for small batch size. Have you tried batch size 250/500/1000?

Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe you want to try to select random coins, not sequential.

@netrome
Copy link
Contributor Author

netrome commented Nov 18, 2024

What batch size are you using by default? Based on the benchmarks from the @rymnc in #2142 , the multi-get is slower for small batch size. Have you tried batch size 250/500/1000?

I've typically requested batches of 100k coins and transactions for most runs, but I have varied this parameter (num_queries * utxo_count_per_block) and haven't spotted any difference so far.

Also, maybe you want to try to select random coins, not sequential.

Good idea, I'll try that 👍

@netrome
Copy link
Contributor Author

netrome commented Nov 18, 2024

Tried selecting random coins (by shuffling the ID vec before each query), didn't notice any significant speed degradation in the base case.

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):      1.619 s ±  0.119 s    [User: 29.584 s, System: 0.247 s]
  Range (min … max):    1.415 s …  1.749 s    10 runs

diff relative 5c261ee9897282ab4e5bbee6f79a030bac5d217d

diff --git a/benches/benches/read_view_get_coins.rs b/benches/benches/read_view_get_coins.rs
index 0327b88937..75862a0210 100644
--- a/benches/benches/read_view_get_coins.rs
+++ b/benches/benches/read_view_get_coins.rs
@@ -24,6 +24,7 @@ use fuel_core_types::{
 };
 use rand::{
     rngs::StdRng,
+    seq::SliceRandom,
     SeedableRng,
 };

@@ -93,6 +94,9 @@ impl<Rng: rand::RngCore + rand::CryptoRng + Send> Harness<Rng> {
             transaction.commit().unwrap();
         }

+        // Shuffle so random reads aren't sequential
+        utxo_ids.shuffle(&mut self.rng);
+
         Ok(utxo_ids)
     }

@@ -105,7 +109,8 @@ impl<Rng: rand::RngCore + rand::CryptoRng + Send> Harness<Rng> {
         for _ in 0..self.params.num_queries {
             let on_chain_db = self.db.on_chain().clone();
             let off_chain_db = self.db.off_chain().clone();
-            let utxo_ids = utxo_ids.to_vec();
+            let mut utxo_ids = utxo_ids.to_vec();
+            utxo_ids.shuffle(&mut self.rng);

             let handle = tokio::spawn(async move {
                 let read_database =

@netrome
Copy link
Contributor Author

netrome commented Nov 18, 2024

Tried a new variant proposed by Green in a status meeting today. Only query a subset of all coins each time. With a database of 1M coins, querying 10k coins 1k times I get the following (at 87c4b01):

reference

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     11.798 s ±  3.070 s    [User: 39.691 s, System: 8.286 s]
  Range (min … max):   10.156 s … 20.117 s    10 runs

multi-get

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     12.922 s ±  3.347 s    [User: 44.348 s, System: 9.119 s]
  Range (min … max):   10.869 s … 20.437 s    10 runs

So multi-get still seems to be slower, but the difference seems less tangible than most recent runs. I'll see if I can find a parameter combination that changes this outcome.

@netrome
Copy link
Contributor Author

netrome commented Nov 18, 2024

Tried again with new parameters (querying 10k coins 1k times from a database with 10M coins) at c1f2605, multi-get is still slower.

Reference

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     57.364 s ±  5.088 s    [User: 612.581 s, System: 793.212 s]
  Range (min … max):   50.165 s … 62.967 s    10 runs

Multi-get

hyperfine target/release/deps/read_view_get_coins-ff72e5522e7d0f70
Benchmark 1: target/release/deps/read_view_get_coins-ff72e5522e7d0f70
  Time (mean ± σ):     64.673 s ±  6.245 s    [User: 666.976 s, System: 977.544 s]
  Range (min … max):   54.355 s … 71.464 s    10 runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chore: Benchmark multi-get implementation
2 participants