Skip to content

Commit

Permalink
core: Make NDV non-zero (#1985)
Browse files Browse the repository at this point in the history
  • Loading branch information
kim authored Nov 14, 2024
1 parent cccadd1 commit 33c4aab
Showing 1 changed file with 13 additions and 4 deletions.
17 changes: 13 additions & 4 deletions crates/core/src/db/datastore/locking_tx_datastore/tx.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ use crate::execution_context::ExecutionContext;
use spacetimedb_primitives::{ColList, TableId};
use spacetimedb_sats::AlgebraicValue;
use spacetimedb_schema::schema::TableSchema;
use std::num::NonZeroU64;
use std::sync::Arc;
use std::{
ops::RangeBounds,
Expand Down Expand Up @@ -65,9 +66,17 @@ impl TxId {

/// The Number of Distinct Values (NDV) for a column or list of columns,
/// if there's an index available on `cols`.
pub(crate) fn num_distinct_values(&self, table_id: TableId, cols: &ColList) -> Option<u64> {
self.committed_state_shared_lock
.get_table(table_id)
.and_then(|t| t.indexes.get(cols).map(|index| index.num_keys() as u64))
///
/// Returns `None` if:
/// - No such table as `table_id` exists.
/// - The table `table_id` does not have an index on exactly the `cols`.
/// - The table `table_id` contains zero rows (i.e. the index is empty).
//
// This method must never return 0, as it's used as the divisor in quotients.
// Do not change its return type to a bare `u64`.
pub(crate) fn num_distinct_values(&self, table_id: TableId, cols: &ColList) -> Option<NonZeroU64> {
let table = self.committed_state_shared_lock.get_table(table_id)?;
let index = table.indexes.get(cols)?;
NonZeroU64::new(index.num_keys() as u64)
}
}

2 comments on commit 33c4aab

@github-actions
Copy link

@github-actions github-actions bot commented on 33c4aab Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Criterion benchmark results

Criterion benchmark report

YOU SHOULD PROBABLY IGNORE THESE RESULTS.

Criterion is a wall time based benchmarking system that is extremely noisy when run on CI. We collect these results for longitudinal analysis, but they are not reliable for comparing individual PRs.

Go look at the callgrind report instead.

empty

db on disk new latency old latency new throughput old throughput
sqlite 💿 414.9±1.39ns 424.5±2.44ns - -
sqlite 🧠 401.9±1.31ns 417.2±2.26ns - -
stdb_raw 💿 779.0±1.22ns 778.2±1.12ns - -
stdb_raw 🧠 776.1±1.65ns 773.5±2.56ns - -

insert_1

db on disk schema indices preload new latency old latency new throughput old throughput

insert_bulk

db on disk schema indices preload count new latency old latency new throughput old throughput
sqlite 💿 u32_u64_str btree_each_column 2048 256 602.7±59.03µs 585.0±0.63µs 1659 tx/sec 1709 tx/sec
sqlite 💿 u32_u64_str unique_0 2048 256 149.4±0.29µs 151.3±0.50µs 6.5 Ktx/sec 6.5 Ktx/sec
sqlite 💿 u32_u64_u64 btree_each_column 2048 256 464.1±0.64µs 467.2±0.55µs 2.1 Ktx/sec 2.1 Ktx/sec
sqlite 💿 u32_u64_u64 unique_0 2048 256 137.5±0.43µs 150.8±32.69µs 7.1 Ktx/sec 6.5 Ktx/sec
sqlite 🧠 u32_u64_str btree_each_column 2048 256 447.4±1.44µs 448.7±0.84µs 2.2 Ktx/sec 2.2 Ktx/sec
sqlite 🧠 u32_u64_str unique_0 2048 256 121.1±0.71µs 124.8±0.71µs 8.1 Ktx/sec 7.8 Ktx/sec
sqlite 🧠 u32_u64_u64 btree_each_column 2048 256 365.1±1.71µs 367.1±1.01µs 2.7 Ktx/sec 2.7 Ktx/sec
sqlite 🧠 u32_u64_u64 unique_0 2048 256 101.0±0.37µs 108.5±1.21µs 9.7 Ktx/sec 9.0 Ktx/sec
stdb_raw 💿 u32_u64_str btree_each_column 2048 256 410.2±64.92µs 601.1±23.14µs 2.4 Ktx/sec 1663 tx/sec
stdb_raw 💿 u32_u64_str unique_0 2048 256 489.9±11.24µs 478.3±18.87µs 2041 tx/sec 2.0 Ktx/sec
stdb_raw 💿 u32_u64_u64 btree_each_column 2048 256 393.5±7.49µs 365.8±9.22µs 2.5 Ktx/sec 2.7 Ktx/sec
stdb_raw 💿 u32_u64_u64 unique_0 2048 256 347.9±21.57µs 346.3±5.92µs 2.8 Ktx/sec 2.8 Ktx/sec
stdb_raw 🧠 u32_u64_str btree_each_column 2048 256 316.0±0.23µs 293.5±0.31µs 3.1 Ktx/sec 3.3 Ktx/sec
stdb_raw 🧠 u32_u64_str unique_0 2048 256 244.2±0.27µs 228.1±0.94µs 4.0 Ktx/sec 4.3 Ktx/sec
stdb_raw 🧠 u32_u64_u64 btree_each_column 2048 256 252.0±0.10µs 232.0±0.17µs 3.9 Ktx/sec 4.2 Ktx/sec
stdb_raw 🧠 u32_u64_u64 unique_0 2048 256 226.0±0.31µs 210.0±0.29µs 4.3 Ktx/sec 4.7 Ktx/sec

iterate

db on disk schema indices new latency old latency new throughput old throughput
sqlite 💿 u32_u64_str unique_0 23.0±0.17µs 23.6±0.35µs 42.5 Ktx/sec 41.3 Ktx/sec
sqlite 💿 u32_u64_u64 unique_0 21.7±0.16µs 21.6±0.07µs 45.0 Ktx/sec 45.2 Ktx/sec
sqlite 🧠 u32_u64_str unique_0 20.3±0.18µs 21.3±0.06µs 48.1 Ktx/sec 45.9 Ktx/sec
sqlite 🧠 u32_u64_u64 unique_0 19.3±0.09µs 19.0±0.07µs 50.7 Ktx/sec 51.5 Ktx/sec
stdb_raw 💿 u32_u64_str unique_0 4.9±0.00µs 4.9±0.00µs 199.6 Ktx/sec 199.5 Ktx/sec
stdb_raw 💿 u32_u64_u64 unique_0 4.8±0.00µs 4.8±0.00µs 204.1 Ktx/sec 203.8 Ktx/sec
stdb_raw 🧠 u32_u64_str unique_0 4.9±0.00µs 4.9±0.00µs 200.4 Ktx/sec 199.7 Ktx/sec
stdb_raw 🧠 u32_u64_u64 unique_0 4.8±0.00µs 4.8±0.00µs 204.3 Ktx/sec 203.9 Ktx/sec

find_unique

db on disk key type preload new latency old latency new throughput old throughput

filter

db on disk key type index strategy load count new latency old latency new throughput old throughput
sqlite 💿 string index 2048 256 70.7±0.17µs 69.0±0.20µs 13.8 Ktx/sec 14.1 Ktx/sec
sqlite 💿 u64 index 2048 256 66.9±0.11µs 64.6±0.13µs 14.6 Ktx/sec 15.1 Ktx/sec
sqlite 🧠 string index 2048 256 67.2±0.14µs 65.2±0.15µs 14.5 Ktx/sec 15.0 Ktx/sec
sqlite 🧠 u64 index 2048 256 61.2±0.18µs 57.7±0.11µs 16.0 Ktx/sec 16.9 Ktx/sec
stdb_raw 💿 string index 2048 256 4.9±0.01µs 4.9±0.00µs 197.7 Ktx/sec 198.5 Ktx/sec
stdb_raw 💿 u64 index 2048 256 4.9±0.01µs 4.8±0.00µs 201.0 Ktx/sec 201.4 Ktx/sec
stdb_raw 🧠 string index 2048 256 4.9±0.01µs 4.9±0.00µs 197.4 Ktx/sec 198.5 Ktx/sec
stdb_raw 🧠 u64 index 2048 256 4.9±0.00µs 4.8±0.00µs 201.2 Ktx/sec 201.6 Ktx/sec

serialize

schema format count new latency old latency new throughput old throughput
u32_u64_str bflatn_to_bsatn_fast_path 100 3.6±0.01µs 3.6±0.01µs 26.7 Mtx/sec 26.6 Mtx/sec
u32_u64_str bflatn_to_bsatn_slow_path 100 2.9±0.01µs 2.9±0.00µs 32.4 Mtx/sec 33.0 Mtx/sec
u32_u64_str bsatn 100 15.9±0.01ns 15.3±0.05ns 5.9 Gtx/sec 6.1 Gtx/sec
u32_u64_str bsatn 100 2.2±0.00µs 2.3±0.01µs 44.3 Mtx/sec 41.0 Mtx/sec
u32_u64_str json 100 5.3±0.02µs 5.2±0.03µs 18.1 Mtx/sec 18.4 Mtx/sec
u32_u64_str json 100 8.8±0.05µs 8.9±0.03µs 10.8 Mtx/sec 10.7 Mtx/sec
u32_u64_str product_value 100 1017.1±0.56ns 1018.6±3.16ns 93.8 Mtx/sec 93.6 Mtx/sec
u32_u64_u64 bflatn_to_bsatn_fast_path 100 983.0±0.74ns 949.9±2.18ns 97.0 Mtx/sec 100.4 Mtx/sec
u32_u64_u64 bflatn_to_bsatn_slow_path 100 2.4±0.00µs 2.4±0.00µs 39.1 Mtx/sec 39.7 Mtx/sec
u32_u64_u64 bsatn 100 15.1±0.33ns 6.8±0.02ns 6.2 Gtx/sec 13.8 Gtx/sec
u32_u64_u64 bsatn 100 1523.3±3.82ns 1816.0±27.18ns 62.6 Mtx/sec 52.5 Mtx/sec
u32_u64_u64 json 100 3.8±0.01µs 3.7±0.00µs 25.4 Mtx/sec 25.8 Mtx/sec
u32_u64_u64 json 100 5.6±0.07µs 5.9±0.03µs 16.9 Mtx/sec 16.1 Mtx/sec
u32_u64_u64 product_value 100 1014.6±0.62ns 1016.7±0.59ns 94.0 Mtx/sec 93.8 Mtx/sec
u64_u64_u32 bflatn_to_bsatn_fast_path 100 749.6±1.15ns 715.5±9.36ns 127.2 Mtx/sec 133.3 Mtx/sec
u64_u64_u32 bflatn_to_bsatn_slow_path 100 2.4±0.00µs 2.4±0.01µs 39.1 Mtx/sec 39.6 Mtx/sec
u64_u64_u32 bsatn 100 1536.8±66.79ns 1786.5±50.92ns 62.1 Mtx/sec 53.4 Mtx/sec
u64_u64_u32 bsatn 100 645.1±2.37ns 644.9±2.08ns 147.8 Mtx/sec 147.9 Mtx/sec
u64_u64_u32 json 100 3.7±0.01µs 3.7±0.04µs 25.9 Mtx/sec 25.9 Mtx/sec
u64_u64_u32 json 100 6.0±0.13µs 6.2±0.23µs 16.0 Mtx/sec 15.4 Mtx/sec
u64_u64_u32 product_value 100 1014.2±0.47ns 1015.8±0.64ns 94.0 Mtx/sec 93.9 Mtx/sec

stdb_module_large_arguments

arg size new latency old latency new throughput old throughput
64KiB 102.1±8.33µs 105.6±7.10µs - -

stdb_module_print_bulk

line count new latency old latency new throughput old throughput
1 54.0±5.85µs 54.2±7.85µs - -
100 597.0±7.96µs 595.5±4.65µs - -
1000 3.6±0.57ms 3.9±0.81ms - -

remaining

name new latency old latency new throughput old throughput
special/db_game/circles/load=10 42.1±7.76ms 47.0±3.98ms - -
special/db_game/circles/load=100 35.5±0.94ms 50.6±7.48ms - -
special/db_game/ia_loop/load=500 147.4±1.38ms 143.1±2.01ms - -
special/db_game/ia_loop/load=5000 5.3±0.03s 5.3±0.03s - -
sqlite/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 55.5±0.13µs 57.9±0.56µs 17.6 Ktx/sec 16.9 Ktx/sec
sqlite/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 49.7±0.03µs 48.8±0.14µs 19.7 Ktx/sec 20.0 Ktx/sec
sqlite/🧠/update_bulk/u32_u64_str/unique_0/load=2048/count=256 40.7±0.26µs 42.6±0.56µs 24.0 Ktx/sec 22.9 Ktx/sec
sqlite/🧠/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 38.4±0.09µs 37.4±0.20µs 25.5 Ktx/sec 26.1 Ktx/sec
stdb_module/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 1220.1±7.65µs 1225.8±18.57µs 819 tx/sec 815 tx/sec
stdb_module/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 1004.3±8.69µs 958.8±7.33µs 995 tx/sec 1043 tx/sec
stdb_raw/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 650.6±16.27µs 632.5±10.25µs 1537 tx/sec 1580 tx/sec
stdb_raw/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 495.5±5.54µs 419.9±7.34µs 2018 tx/sec 2.3 Ktx/sec
stdb_raw/🧠/update_bulk/u32_u64_str/unique_0/load=2048/count=256 377.8±0.45µs 365.6±0.64µs 2.6 Ktx/sec 2.7 Ktx/sec
stdb_raw/🧠/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 341.6±0.16µs 331.2±0.47µs 2.9 Ktx/sec 2.9 Ktx/sec

@github-actions
Copy link

@github-actions github-actions bot commented on 33c4aab Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callgrind benchmark results

Callgrind Benchmark Report

These benchmarks were run using callgrind,
an instruction-level profiler. They allow comparisons between sqlite (sqlite), SpacetimeDB running through a module (stdb_module), and the underlying SpacetimeDB data storage engine (stdb_raw). Callgrind emulates a CPU to collect the below estimates.

Measurement changes larger than five percent are in bold.

In-memory benchmarks

callgrind: empty transaction

db total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw 6397 6397 0.00% 6497 6497 0.00%
sqlite 5589 5589 0.00% 6007 6007 0.00%

callgrind: filter

db schema indices count preload _column data_type total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str no_index 64 128 1 u64 76592 76592 0.00% 77088 77116 -0.04%
stdb_raw u32_u64_str no_index 64 128 2 string 118834 118834 0.00% 119624 119632 -0.01%
stdb_raw u32_u64_str btree_each_column 64 128 2 string 25083 25081 0.01% 25693 25831 -0.53%
stdb_raw u32_u64_str btree_each_column 64 128 1 u64 24049 24049 0.00% 24529 24613 -0.34%
sqlite u32_u64_str no_index 64 128 2 string 144695 144695 0.00% 146227 146219 0.01%
sqlite u32_u64_str no_index 64 128 1 u64 124089 124044 0.04% 125383 125334 0.04%
sqlite u32_u64_str btree_each_column 64 128 1 u64 131361 131361 0.00% 132787 132919 -0.10%
sqlite u32_u64_str btree_each_column 64 128 2 string 134494 134494 0.00% 136210 136190 0.01%

callgrind: insert bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 64 128 872911 872325 0.07% 890811 889545 0.14%
stdb_raw u32_u64_str btree_each_column 64 128 1023705 1020180 0.35% 1053915 1052658 0.12%
sqlite u32_u64_str unique_0 64 128 398320 398320 0.00% 413928 415254 -0.32%
sqlite u32_u64_str btree_each_column 64 128 983637 983637 0.00% 1020707 1019611 0.11%

callgrind: iterate

db schema indices count total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 153721 153721 0.00% 153833 153833 0.00%
stdb_raw u32_u64_str unique_0 64 16746 16746 0.00% 16846 16846 0.00%
sqlite u32_u64_str unique_0 1024 1067255 1067255 0.00% 1070715 1070627 0.01%
sqlite u32_u64_str unique_0 64 76201 76201 0.00% 77255 77199 0.07%

callgrind: serialize_product_value

count format total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
64 json 47528 47528 0.00% 50214 50214 0.00%
64 bsatn 25509 25509 0.00% 27787 27821 -0.12%
16 bsatn 8200 8200 0.00% 9594 9628 -0.35%
16 json 12188 12188 0.00% 14126 14126 0.00%

callgrind: update bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 1024 20413765 19981315 2.16% 20922237 20486849 2.13%
stdb_raw u32_u64_str unique_0 64 128 1278714 1278415 0.02% 1313878 1315457 -0.12%
sqlite u32_u64_str unique_0 1024 1024 1802182 1802182 0.00% 1811438 1811446 -0.00%
sqlite u32_u64_str unique_0 64 128 128528 128528 0.00% 131340 131352 -0.01%
On-disk benchmarks

callgrind: empty transaction

db total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw 6402 6402 0.00% 6498 6498 0.00%
sqlite 5621 5621 0.00% 6069 6061 0.13%

callgrind: filter

db schema indices count preload _column data_type total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str no_index 64 128 1 u64 76597 76597 0.00% 77041 77085 -0.06%
stdb_raw u32_u64_str no_index 64 128 2 string 119928 118839 0.92% 120686 119677 0.84%
stdb_raw u32_u64_str btree_each_column 64 128 2 string 25086 25086 0.00% 25704 25800 -0.37%
stdb_raw u32_u64_str btree_each_column 64 128 1 u64 24054 24054 0.00% 24506 24594 -0.36%
sqlite u32_u64_str no_index 64 128 1 u64 125965 125965 0.00% 127559 127487 0.06%
sqlite u32_u64_str no_index 64 128 2 string 146616 146616 0.00% 148400 148444 -0.03%
sqlite u32_u64_str btree_each_column 64 128 2 string 136616 136616 0.00% 138654 138722 -0.05%
sqlite u32_u64_str btree_each_column 64 128 1 u64 133457 133457 0.00% 135237 135385 -0.11%

callgrind: insert bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 64 128 821603 820924 0.08% 869161 836612 3.89%
stdb_raw u32_u64_str btree_each_column 64 128 971929 967924 0.41% 1030629 998342 3.23%
sqlite u32_u64_str unique_0 64 128 415857 415857 0.00% 430639 432023 -0.32%
sqlite u32_u64_str btree_each_column 64 128 1021898 1021898 0.00% 1058644 1058142 0.05%

callgrind: iterate

db schema indices count total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 153726 153726 0.00% 153822 153822 0.00%
stdb_raw u32_u64_str unique_0 64 16751 16751 0.00% 16887 16847 0.24%
sqlite u32_u64_str unique_0 1024 1070323 1070323 0.00% 1074165 1074145 0.00%
sqlite u32_u64_str unique_0 64 77973 77973 0.00% 79295 79267 0.04%

callgrind: serialize_product_value

count format total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
64 json 47528 47528 0.00% 50214 50214 0.00%
64 bsatn 25509 25509 0.00% 27787 27821 -0.12%
16 bsatn 8200 8200 0.00% 9594 9628 -0.35%
16 json 12188 12188 0.00% 14126 14126 0.00%

callgrind: update bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 1024 18904012 18901673 0.01% 19440154 19473275 -0.17%
stdb_raw u32_u64_str unique_0 64 128 1231776 1231044 0.06% 1295984 1296962 -0.08%
sqlite u32_u64_str unique_0 1024 1024 1809743 1809743 0.00% 1818307 1818411 -0.01%
sqlite u32_u64_str unique_0 64 128 132654 132654 0.00% 135574 135694 -0.09%

Please sign in to comment.