Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor commit encoding / decoding #495

Merged
merged 6 commits into from
Nov 6, 2023
Merged

Refactor commit encoding / decoding #495

merged 6 commits into from
Nov 6, 2023

Conversation

kim
Copy link
Contributor

@kim kim commented Oct 30, 2023

Description of Changes

Use sats::{BufReader, BufWriter} for decoding / encoding of Commit
and associated types. This makes decode fallible (which is quite
desirable, instead of panicking).

As the DecodeError from sats is fairly sparse, also add some context
about where exactly decoding failed.

Also add documentation and property test.

API and ABI

  • This is a breaking change to the module ABI
  • This is a breaking change to the module API
  • This is a breaking change to the ClientAPI
  • This is a breaking change to the SDK API

If the API is breaking, please state below what will break

Expected complexity level and risk

1

kim added 2 commits October 27, 2023 12:27
Use `sats::{BufReader, BufWriter}` for decoding / encoding of `Commit`
and associated types. This makes `decode` fallible (which is quite
desirable, instead of panicking).

As the `DecodeError` from sats is fairly sparse, also add some context
about where exactly decoding failed.

Lastly, add some documentation.
Copy link
Contributor

@kulakowski kulakowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great.

I left a few minor sorts of comments about comments and names.

I'd also like to verify this isn't a performance regression. I don't think this code is a bottleneck for us anymore, but it sure used to be.

Is the restart smoke test failure related?

crates/core/src/db/messages/commit.rs Show resolved Hide resolved

pub fn datakey() -> impl Strategy<Value = DataKey> {
prop_oneof![
prop::collection::vec(any::<u8>(), 0..31).prop_map(DataKey::from_data),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd comment on the reason for the two ranges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/core/src/db/messages/write.rs Show resolved Hide resolved
pub struct Write {
pub operation: Operation,
pub set_id: u32,
pub set_id: u32, // aka table id
#[cfg_attr(test, proptest(strategy = "arbitrary::datakey()"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also comment about directing this to a hand-written strategy, or in general anywhere that the derive macro doesn't do the right thing on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to have those on the mod arbitrary intentionally kept close the to type definition? See ea62cf8

#[cfg(test)]
use proptest_derive::Arbitrary;

/// A commit is one record in the write-ahead log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A drive by thought: similarly to how we have a bunch of code that says set_id but commented "this is the table id", I would love to settle on saying "write ahead log" everywhere, including data type names etc. Or some other name. But let's pick one!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd love to settle on WAL / write ahead log, maybe even collapsing CommitLog and MessageLog into a single type. Possibly not for this PR, though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd love to have us try to take things that way. I agree on the naming, I think the commit log and message log names are implementation details and that "write ahead log" is the name that reflects the semantics the datastore needs to care about. But yeah not for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding my two cents here. I agree on coalescing on the name. I avoided WAL because historically that's referred to a log that is eventually compressed/deleted, but I think on balance WAL is probably the best name. We can include a comment about the fact that the WAL should never be deleted.

We should also note that it will include information which is not normally in a WAL, including reducer call event info and that it forms a Merkle DAG between commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term MessageLog was borrowed from Kafka.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted to include all of these in the upcoming formalization / design changes doc.

@Centril Centril self-requested a review October 30, 2023 15:49
@kim
Copy link
Contributor Author

kim commented Oct 30, 2023

I'd also like to verify this isn't a performance regression.

Hm, would this entail landing a separate patch with a benchmark before this, so we can track the difference?

@kulakowski
Copy link
Contributor

I'd also like to verify this isn't a performance regression.

Hm, would this entail landing a separate patch with a benchmark before this, so we can track the difference?

I don't think so. I meant regression at the level of in benchmarks / game performance, not in any new functionality or anything. Like if worldgen goes from N seconds to 1.1N seconds, I'd want to look at this again, but if it stays at N seconds, send it.

@kim
Copy link
Contributor Author

kim commented Oct 30, 2023

benchmarks please

@kim
Copy link
Contributor Author

kim commented Oct 30, 2023

Let's try this :)

@github-actions
Copy link

github-actions bot commented Oct 30, 2023

Benchmark results

Benchmark Report

Legend:

  • load: number of rows pre-loaded into the database
  • count: number of rows touched by the transaction
  • index types:
    • unique: a single index on the id column
    • non_unique: no indexes
    • multi_index: non-unique index on every column
  • schemas:
    • person(id: u32, name: String, age: u64)
    • location(id: u32, x: u64, y: u64)

All throughputs are single-threaded.

Empty transaction

db on disk new latency old latency new throughput old throughput
sqlite 💿 - 425.5±2.12ns - -
sqlite 🧠 - 420.1±0.55ns - -
stdb_module 💿 16.2±0.35µs 18.7±1.35µs - -
stdb_module 🧠 16.4±0.50µs 17.5±0.81µs - -
stdb_raw 💿 200.3±1.40ns 734.9±2.04ns - -
stdb_raw 🧠 190.3±0.15ns 731.6±1.38ns - -

Single-row insertions

db on disk schema index type load new latency old latency new throughput old throughput
sqlite 💿 location multi_index 0 - 14.9±0.69µs - 65.4 Ktx/sec
sqlite 💿 location multi_index 1000 - 15.8±0.08µs - 61.7 Ktx/sec
sqlite 💿 location non_unique 0 - 7.3±0.57µs - 134.5 Ktx/sec
sqlite 💿 location non_unique 1000 - 7.0±0.02µs - 140.3 Ktx/sec
sqlite 💿 location unique 0 - 7.2±0.02µs - 135.3 Ktx/sec
sqlite 💿 location unique 1000 - 7.0±0.04µs - 139.1 Ktx/sec
sqlite 💿 person multi_index 0 - 14.7±0.74µs - 66.2 Ktx/sec
sqlite 💿 person multi_index 1000 - 16.1±0.11µs - 60.6 Ktx/sec
sqlite 💿 person non_unique 0 - 7.3±0.04µs - 134.1 Ktx/sec
sqlite 💿 person non_unique 1000 - 7.2±0.05µs - 135.0 Ktx/sec
sqlite 💿 person unique 0 - 7.4±0.55µs - 132.7 Ktx/sec
sqlite 💿 person unique 1000 - 7.3±0.05µs - 134.4 Ktx/sec
sqlite 🧠 location multi_index 0 - 4.0±0.01µs - 245.0 Ktx/sec
sqlite 🧠 location multi_index 1000 - 5.3±0.07µs - 185.2 Ktx/sec
sqlite 🧠 location non_unique 0 - 1847.2±4.33ns - 528.7 Ktx/sec
sqlite 🧠 location non_unique 1000 - 1911.9±8.02ns - 510.8 Ktx/sec
sqlite 🧠 location unique 0 - 1817.3±3.44ns - 537.4 Ktx/sec
sqlite 🧠 location unique 1000 - 1941.0±9.04ns - 503.1 Ktx/sec
sqlite 🧠 person multi_index 0 - 3.7±0.01µs - 266.2 Ktx/sec
sqlite 🧠 person multi_index 1000 - 5.4±0.03µs - 180.2 Ktx/sec
sqlite 🧠 person non_unique 0 - 1908.0±7.49ns - 511.8 Ktx/sec
sqlite 🧠 person non_unique 1000 - 1977.5±11.82ns - 493.8 Ktx/sec
sqlite 🧠 person unique 0 - 1897.2±3.67ns - 514.8 Ktx/sec
sqlite 🧠 person unique 1000 - 2.0±0.01µs - 483.5 Ktx/sec
stdb_module 💿 location multi_index 0 48.4±4.93µs 50.4±3.90µs 20.2 Ktx/sec 19.4 Ktx/sec
stdb_module 💿 location multi_index 1000 108.6±6.77µs 155.5±12.70µs 9.0 Ktx/sec 6.3 Ktx/sec
stdb_module 💿 location non_unique 0 38.7±4.23µs 44.7±4.63µs 25.2 Ktx/sec 21.9 Ktx/sec
stdb_module 💿 location non_unique 1000 172.8±41.00µs 136.3±45.32µs 5.7 Ktx/sec 7.2 Ktx/sec
stdb_module 💿 location unique 0 44.9±6.47µs 46.0±3.16µs 21.7 Ktx/sec 21.2 Ktx/sec
stdb_module 💿 location unique 1000 189.5±70.74µs 141.5±30.18µs 5.2 Ktx/sec 6.9 Ktx/sec
stdb_module 💿 person multi_index 0 60.6±5.22µs 62.5±4.76µs 16.1 Ktx/sec 15.6 Ktx/sec
stdb_module 💿 person multi_index 1000 151.8±45.31µs 126.0±18.75µs 6.4 Ktx/sec 7.8 Ktx/sec
stdb_module 💿 person non_unique 0 39.7±3.79µs 43.6±6.39µs 24.6 Ktx/sec 22.4 Ktx/sec
stdb_module 💿 person non_unique 1000 185.6±35.77µs 258.5±95.07µs 5.3 Ktx/sec 3.8 Ktx/sec
stdb_module 💿 person unique 0 50.4±5.19µs 49.9±5.66µs 19.4 Ktx/sec 19.6 Ktx/sec
stdb_module 💿 person unique 1000 246.1±88.83µs 125.0±7.30µs 4.0 Ktx/sec 7.8 Ktx/sec
stdb_module 🧠 location multi_index 0 30.6±3.10µs 36.7±3.54µs 31.9 Ktx/sec 26.6 Ktx/sec
stdb_module 🧠 location multi_index 1000 113.7±35.59µs 200.4±46.97µs 8.6 Ktx/sec 4.9 Ktx/sec
stdb_module 🧠 location non_unique 0 26.8±1.59µs 27.2±1.72µs 36.5 Ktx/sec 35.9 Ktx/sec
stdb_module 🧠 location non_unique 1000 171.8±7.37µs 248.0±8.25µs 5.7 Ktx/sec 3.9 Ktx/sec
stdb_module 🧠 location unique 0 28.4±2.33µs 31.5±3.13µs 34.4 Ktx/sec 31.0 Ktx/sec
stdb_module 🧠 location unique 1000 182.3±11.24µs 104.3±0.72µs 5.4 Ktx/sec 9.4 Ktx/sec
stdb_module 🧠 person multi_index 0 36.5±3.21µs 44.5±5.38µs 26.8 Ktx/sec 22.0 Ktx/sec
stdb_module 🧠 person multi_index 1000 139.0±6.71µs 319.9±13.08µs 7.0 Ktx/sec 3.1 Ktx/sec
stdb_module 🧠 person non_unique 0 27.3±1.61µs 31.8±2.84µs 35.7 Ktx/sec 30.7 Ktx/sec
stdb_module 🧠 person non_unique 1000 323.5±14.34µs 295.8±2.12µs 3.0 Ktx/sec 3.3 Ktx/sec
stdb_module 🧠 person unique 0 30.3±2.48µs 37.2±3.61µs 32.2 Ktx/sec 26.2 Ktx/sec
stdb_module 🧠 person unique 1000 243.5±16.41µs 186.4±12.01µs 4.0 Ktx/sec 5.2 Ktx/sec
stdb_raw 💿 location multi_index 0 6.6±0.01µs 7.1±0.41µs 148.2 Ktx/sec 136.8 Ktx/sec
stdb_raw 💿 location multi_index 1000 32.8±236.91µs 10.1±2.46µs 29.8 Ktx/sec 96.8 Ktx/sec
stdb_raw 💿 location non_unique 0 4.2±0.01µs 4.7±0.02µs 232.1 Ktx/sec 207.8 Ktx/sec
stdb_raw 💿 location non_unique 1000 5.6±0.14µs 6.2±0.35µs 174.8 Ktx/sec 157.5 Ktx/sec
stdb_raw 💿 location unique 0 5.5±0.06µs 6.1±0.02µs 177.3 Ktx/sec 160.5 Ktx/sec
stdb_raw 💿 location unique 1000 7.8±0.16µs 27.7±191.28µs 125.9 Ktx/sec 35.3 Ktx/sec
stdb_raw 💿 person multi_index 0 10.3±0.03µs 10.9±0.01µs 95.2 Ktx/sec 89.8 Ktx/sec
stdb_raw 💿 person multi_index 1000 13.3±0.16µs 63.6±493.29µs 73.3 Ktx/sec 15.4 Ktx/sec
stdb_raw 💿 person non_unique 0 4.8±0.20µs 5.3±0.01µs 203.3 Ktx/sec 185.1 Ktx/sec
stdb_raw 💿 person non_unique 1000 18.8±125.20µs 6.9±0.10µs 51.9 Ktx/sec 141.3 Ktx/sec
stdb_raw 💿 person unique 0 7.2±0.65µs 7.7±0.02µs 136.5 Ktx/sec 127.5 Ktx/sec
stdb_raw 💿 person unique 1000 30.9±211.95µs 10.4±0.11µs 31.6 Ktx/sec 93.8 Ktx/sec
stdb_raw 🧠 location multi_index 0 3.7±0.01µs 4.2±0.01µs 267.5 Ktx/sec 232.8 Ktx/sec
stdb_raw 🧠 location multi_index 1000 5.0±0.04µs 5.8±0.03µs 194.1 Ktx/sec 168.7 Ktx/sec
stdb_raw 🧠 location non_unique 0 1403.7±4.02ns 1939.5±5.03ns 695.7 Ktx/sec 503.5 Ktx/sec
stdb_raw 🧠 location non_unique 1000 1821.4±18.78ns 2.5±0.03µs 536.2 Ktx/sec 393.2 Ktx/sec
stdb_raw 🧠 location unique 0 2.7±0.01µs 3.2±0.00µs 367.3 Ktx/sec 308.5 Ktx/sec
stdb_raw 🧠 location unique 1000 3.7±0.02µs 4.4±0.02µs 265.9 Ktx/sec 223.8 Ktx/sec
stdb_raw 🧠 person multi_index 0 7.3±0.08µs 7.8±0.01µs 134.1 Ktx/sec 125.8 Ktx/sec
stdb_raw 🧠 person multi_index 1000 9.2±0.04µs 10.0±0.09µs 105.7 Ktx/sec 97.6 Ktx/sec
stdb_raw 🧠 person non_unique 0 1974.3±5.97ns 2.4±0.01µs 494.6 Ktx/sec 399.8 Ktx/sec
stdb_raw 🧠 person non_unique 1000 2.5±0.01µs 3.1±0.02µs 384.5 Ktx/sec 310.1 Ktx/sec
stdb_raw 🧠 person unique 0 4.2±0.01µs 4.7±0.05µs 232.5 Ktx/sec 207.7 Ktx/sec
stdb_raw 🧠 person unique 1000 5.5±0.05µs 6.2±0.04µs 177.7 Ktx/sec 157.8 Ktx/sec

Multi-row insertions

db on disk schema index type load count new latency old latency new throughput old throughput
sqlite 💿 location multi_index 0 100 - 133.2±6.53µs - 7.3 Ktx/sec
sqlite 💿 location multi_index 1000 100 - 202.6±1.13µs - 4.8 Ktx/sec
sqlite 💿 location non_unique 0 100 - 50.9±0.38µs - 19.2 Ktx/sec
sqlite 💿 location non_unique 1000 100 - 52.6±0.24µs - 18.6 Ktx/sec
sqlite 💿 location unique 0 100 - 54.0±1.88µs - 18.1 Ktx/sec
sqlite 💿 location unique 1000 100 - 57.6±0.24µs - 17.0 Ktx/sec
sqlite 💿 person multi_index 0 100 - 118.6±2.00µs - 8.2 Ktx/sec
sqlite 💿 person multi_index 1000 100 - 246.9±94.84µs - 4.0 Ktx/sec
sqlite 💿 person non_unique 0 100 - 49.5±2.80µs - 19.7 Ktx/sec
sqlite 💿 person non_unique 1000 100 - 59.9±0.37µs - 16.3 Ktx/sec
sqlite 💿 person unique 0 100 - 50.5±0.49µs - 19.3 Ktx/sec
sqlite 💿 person unique 1000 100 - 56.3±0.28µs - 17.3 Ktx/sec
sqlite 🧠 location multi_index 0 100 - 120.1±0.42µs - 8.1 Ktx/sec
sqlite 🧠 location multi_index 1000 100 - 169.4±0.28µs - 5.8 Ktx/sec
sqlite 🧠 location non_unique 0 100 - 43.8±0.33µs - 22.3 Ktx/sec
sqlite 🧠 location non_unique 1000 100 - 44.4±0.37µs - 22.0 Ktx/sec
sqlite 🧠 location unique 0 100 - 47.4±0.36µs - 20.6 Ktx/sec
sqlite 🧠 location unique 1000 100 - 49.6±0.26µs - 19.7 Ktx/sec
sqlite 🧠 person multi_index 0 100 - 107.7±0.44µs - 9.1 Ktx/sec
sqlite 🧠 person multi_index 1000 100 - 189.7±0.61µs - 5.1 Ktx/sec
sqlite 🧠 person non_unique 0 100 - 42.0±0.31µs - 23.3 Ktx/sec
sqlite 🧠 person non_unique 1000 100 - 46.4±0.24µs - 21.0 Ktx/sec
sqlite 🧠 person unique 0 100 - 44.6±0.35µs - 21.9 Ktx/sec
sqlite 🧠 person unique 1000 100 - 48.4±0.44µs - 20.2 Ktx/sec
stdb_module 💿 location multi_index 0 100 891.3±83.58µs 959.4±69.44µs 1121 tx/sec 1042 tx/sec
stdb_module 💿 location multi_index 1000 100 1117.7±163.42µs 994.2±123.30µs 894 tx/sec 1005 tx/sec
stdb_module 💿 location non_unique 0 100 462.1±89.99µs 436.4±81.89µs 2.1 Ktx/sec 2.2 Ktx/sec
stdb_module 💿 location non_unique 1000 100 640.7±37.98µs 548.0±47.50µs 1560 tx/sec 1824 tx/sec
stdb_module 💿 location unique 0 100 716.1±74.79µs 783.4±2.21µs 1396 tx/sec 1276 tx/sec
stdb_module 💿 location unique 1000 100 937.1±123.46µs 568.5±1.88µs 1067 tx/sec 1758 tx/sec
stdb_module 💿 person multi_index 0 100 1088.4±154.95µs 1005.2±170.26µs 918 tx/sec 994 tx/sec
stdb_module 💿 person multi_index 1000 100 1236.5±125.61µs 1294.6±18.48µs 808 tx/sec 772 tx/sec
stdb_module 💿 person non_unique 0 100 553.3±52.23µs 650.6±72.48µs 1807 tx/sec 1537 tx/sec
stdb_module 💿 person non_unique 1000 100 991.3±7.71µs 889.0±44.83µs 1008 tx/sec 1124 tx/sec
stdb_module 💿 person unique 0 100 647.1±56.51µs 629.9±11.81µs 1545 tx/sec 1587 tx/sec
stdb_module 💿 person unique 1000 100 998.9±44.09µs 703.0±49.76µs 1001 tx/sec 1422 tx/sec
stdb_module 🧠 location multi_index 0 100 639.2±113.82µs 559.2±162.85µs 1564 tx/sec 1788 tx/sec
stdb_module 🧠 location multi_index 1000 100 710.1±108.92µs 650.2±13.83µs 1408 tx/sec 1537 tx/sec
stdb_module 🧠 location non_unique 0 100 356.2±5.39µs 332.0±20.80µs 2.7 Ktx/sec 2.9 Ktx/sec
stdb_module 🧠 location non_unique 1000 100 315.5±7.64µs 357.5±3.22µs 3.1 Ktx/sec 2.7 Ktx/sec
stdb_module 🧠 location unique 0 100 555.6±1.47µs 369.4±49.31µs 1799 tx/sec 2.6 Ktx/sec
stdb_module 🧠 location unique 1000 100 653.6±3.67µs 454.6±91.74µs 1529 tx/sec 2.1 Ktx/sec
stdb_module 🧠 person multi_index 0 100 895.7±1.41µs 843.3±1.79µs 1116 tx/sec 1185 tx/sec
stdb_module 🧠 person multi_index 1000 100 937.7±23.42µs 1131.0±14.42µs 1066 tx/sec 884 tx/sec
stdb_module 🧠 person non_unique 0 100 491.7±22.04µs 407.4±16.38µs 2033 tx/sec 2.4 Ktx/sec
stdb_module 🧠 person non_unique 1000 100 488.5±52.12µs 601.3±33.84µs 2047 tx/sec 1663 tx/sec
stdb_module 🧠 person unique 0 100 574.8±1.36µs 554.3±3.94µs 1739 tx/sec 1804 tx/sec
stdb_module 🧠 person unique 1000 100 668.3±58.71µs 827.5±37.15µs 1496 tx/sec 1208 tx/sec
stdb_raw 💿 location multi_index 0 100 384.3±1.31µs 380.3±0.55µs 2.5 Ktx/sec 2.6 Ktx/sec
stdb_raw 💿 location multi_index 1000 100 412.4±2.26µs 406.2±1.50µs 2.4 Ktx/sec 2.4 Ktx/sec
stdb_raw 💿 location non_unique 0 100 157.2±5.85µs 158.7±0.35µs 6.2 Ktx/sec 6.2 Ktx/sec
stdb_raw 💿 location non_unique 1000 100 168.1±83.07µs 170.4±97.09µs 5.8 Ktx/sec 5.7 Ktx/sec
stdb_raw 💿 location unique 0 100 283.0±0.33µs 282.0±0.25µs 3.5 Ktx/sec 3.5 Ktx/sec
stdb_raw 💿 location unique 1000 100 304.8±1.28µs 322.0±200.71µs 3.2 Ktx/sec 3.0 Ktx/sec
stdb_raw 💿 person multi_index 0 100 703.9±2.83µs 711.1±18.40µs 1420 tx/sec 1406 tx/sec
stdb_raw 💿 person multi_index 1000 100 777.5±423.71µs 791.8±522.65µs 1286 tx/sec 1262 tx/sec
stdb_raw 💿 person non_unique 0 100 213.2±0.41µs 216.0±0.14µs 4.6 Ktx/sec 4.5 Ktx/sec
stdb_raw 💿 person non_unique 1000 100 216.2±0.62µs 237.9±185.07µs 4.5 Ktx/sec 4.1 Ktx/sec
stdb_raw 💿 person unique 0 100 424.2±0.42µs 426.7±0.47µs 2.3 Ktx/sec 2.3 Ktx/sec
stdb_raw 💿 person unique 1000 100 446.2±1.54µs 471.3±243.28µs 2.2 Ktx/sec 2.1 Ktx/sec
stdb_raw 🧠 location multi_index 0 100 301.2±0.50µs 294.9±0.37µs 3.2 Ktx/sec 3.3 Ktx/sec
stdb_raw 🧠 location multi_index 1000 100 329.0±1.94µs 320.9±0.40µs 3.0 Ktx/sec 3.0 Ktx/sec
stdb_raw 🧠 location non_unique 0 100 75.2±0.10µs 73.8±0.11µs 13.0 Ktx/sec 13.2 Ktx/sec
stdb_raw 🧠 location non_unique 1000 100 76.7±0.14µs 76.1±0.15µs 12.7 Ktx/sec 12.8 Ktx/sec
stdb_raw 🧠 location unique 0 100 201.5±0.49µs 197.2±0.26µs 4.8 Ktx/sec 5.0 Ktx/sec
stdb_raw 🧠 location unique 1000 100 221.6±0.51µs 216.6±0.33µs 4.4 Ktx/sec 4.5 Ktx/sec
stdb_raw 🧠 person multi_index 0 100 615.5±0.95µs 615.0±0.82µs 1624 tx/sec 1626 tx/sec
stdb_raw 🧠 person multi_index 1000 100 646.5±1.09µs 644.3±0.77µs 1546 tx/sec 1552 tx/sec
stdb_raw 🧠 person non_unique 0 100 126.8±0.14µs 126.6±0.11µs 7.7 Ktx/sec 7.7 Ktx/sec
stdb_raw 🧠 person non_unique 1000 100 129.1±0.22µs 129.7±0.40µs 7.6 Ktx/sec 7.5 Ktx/sec
stdb_raw 🧠 person unique 0 100 337.6±0.39µs 336.3±0.21µs 2.9 Ktx/sec 2.9 Ktx/sec
stdb_raw 🧠 person unique 1000 100 358.2±1.84µs 357.0±2.01µs 2.7 Ktx/sec 2.7 Ktx/sec

Full table iterate

db on disk schema index type new latency old latency new throughput old throughput
sqlite 💿 location unique - 9.0±0.14µs - 108.3 Ktx/sec
sqlite 💿 person unique - 9.5±0.11µs - 103.2 Ktx/sec
sqlite 🧠 location unique - 7.8±0.11µs - 125.9 Ktx/sec
sqlite 🧠 person unique - 8.4±0.09µs - 116.7 Ktx/sec
stdb_module 💿 location unique 46.8±3.02µs 49.2±4.53µs 20.9 Ktx/sec 19.8 Ktx/sec
stdb_module 💿 person unique 57.0±10.28µs 53.4±10.70µs 17.1 Ktx/sec 18.3 Ktx/sec
stdb_module 🧠 location unique 47.2±4.34µs 48.8±5.30µs 20.7 Ktx/sec 20.0 Ktx/sec
stdb_module 🧠 person unique 60.9±9.26µs 62.6±10.45µs 16.0 Ktx/sec 15.6 Ktx/sec
stdb_raw 💿 location unique 9.2±0.06µs 9.3±0.15µs 106.2 Ktx/sec 105.5 Ktx/sec
stdb_raw 💿 person unique 9.2±0.08µs 9.3±0.15µs 106.1 Ktx/sec 105.3 Ktx/sec
stdb_raw 🧠 location unique 9.2±0.05µs 9.5±0.12µs 106.3 Ktx/sec 102.9 Ktx/sec
stdb_raw 🧠 person unique 9.2±0.03µs 9.3±0.19µs 106.2 Ktx/sec 105.1 Ktx/sec

Find unique key

db on disk key type load new latency old latency new throughput old throughput
sqlite 💿 u32 1000 - 2.3±0.01µs - 420.5 Ktx/sec
sqlite 🧠 u32 1000 - 1134.0±4.65ns - 861.2 Ktx/sec
stdb_module 💿 u32 1000 19.7±0.83µs 23.3±2.28µs 49.6 Ktx/sec 41.8 Ktx/sec
stdb_module 🧠 u32 1000 19.7±0.95µs 22.1±1.12µs 49.5 Ktx/sec 44.1 Ktx/sec
stdb_raw 💿 u32 1000 884.5±8.24ns 1422.6±6.41ns 1104.1 Ktx/sec 686.5 Ktx/sec
stdb_raw 🧠 u32 1000 883.1±4.51ns 1419.3±3.90ns 1105.8 Ktx/sec 688.0 Ktx/sec

Filter

db on disk key type index strategy load count new latency old latency new throughput old throughput
sqlite 💿 string indexed 1000 10 - 5.5±0.02µs - 176.2 Ktx/sec
sqlite 💿 string non_indexed 1000 10 - 50.8±0.40µs - 19.2 Ktx/sec
sqlite 💿 u64 indexed 1000 10 - 5.4±0.03µs - 181.0 Ktx/sec
sqlite 💿 u64 non_indexed 1000 10 - 32.9±0.07µs - 29.7 Ktx/sec
sqlite 🧠 string indexed 1000 10 - 4.2±0.02µs - 233.9 Ktx/sec
sqlite 🧠 string non_indexed 1000 10 - 48.1±0.59µs - 20.3 Ktx/sec
sqlite 🧠 u64 indexed 1000 10 - 4.0±0.03µs - 241.6 Ktx/sec
sqlite 🧠 u64 non_indexed 1000 10 - 31.7±0.06µs - 30.8 Ktx/sec
stdb_module 💿 string indexed 1000 10 29.8±2.41µs 34.0±2.23µs 32.7 Ktx/sec 28.7 Ktx/sec
stdb_module 💿 string non_indexed 1000 10 174.8±1.77µs 181.1±1.11µs 5.6 Ktx/sec 5.4 Ktx/sec
stdb_module 💿 u64 indexed 1000 10 24.6±1.88µs 29.7±1.93µs 39.6 Ktx/sec 32.9 Ktx/sec
stdb_module 💿 u64 non_indexed 1000 10 147.2±1.55µs 153.4±1.00µs 6.6 Ktx/sec 6.4 Ktx/sec
stdb_module 🧠 string indexed 1000 10 30.4±2.06µs 34.7±2.23µs 32.1 Ktx/sec 28.2 Ktx/sec
stdb_module 🧠 string non_indexed 1000 10 168.6±0.96µs 179.5±2.19µs 5.8 Ktx/sec 5.4 Ktx/sec
stdb_module 🧠 u64 indexed 1000 10 24.8±2.01µs 29.6±2.80µs 39.4 Ktx/sec 33.0 Ktx/sec
stdb_module 🧠 u64 non_indexed 1000 10 145.8±3.22µs 154.4±5.68µs 6.7 Ktx/sec 6.3 Ktx/sec
stdb_raw 💿 string indexed 1000 10 3.5±0.01µs 3.8±0.02µs 282.4 Ktx/sec 253.7 Ktx/sec
stdb_raw 💿 string non_indexed 1000 10 146.4±0.59µs 152.0±0.59µs 6.7 Ktx/sec 6.4 Ktx/sec
stdb_raw 💿 u64 indexed 1000 10 3.3±0.03µs 3.7±0.01µs 291.9 Ktx/sec 261.7 Ktx/sec
stdb_raw 💿 u64 non_indexed 1000 10 121.4±1.72µs 132.0±0.13µs 8.0 Ktx/sec 7.4 Ktx/sec
stdb_raw 🧠 string indexed 1000 10 3.5±0.01µs 3.8±0.03µs 283.0 Ktx/sec 254.2 Ktx/sec
stdb_raw 🧠 string non_indexed 1000 10 146.8±0.54µs 152.2±0.40µs 6.7 Ktx/sec 6.4 Ktx/sec
stdb_raw 🧠 u64 indexed 1000 10 3.3±0.01µs 3.7±0.01µs 292.9 Ktx/sec 261.7 Ktx/sec
stdb_raw 🧠 u64 non_indexed 1000 10 121.8±0.17µs 132.5±0.27µs 8.0 Ktx/sec 7.4 Ktx/sec

Serialize

schema format count new latency old latency new throughput old throughput
location bsatn 100 1655.8±26.70ns 1618.9±31.29ns 57.6 Mtx/sec 58.9 Mtx/sec
location json 100 3.3±0.06µs 3.2±0.01µs 28.7 Mtx/sec 30.0 Mtx/sec
location product_value 100 847.0±0.46ns 574.7±0.33ns 112.6 Mtx/sec 166.0 Mtx/sec
person bsatn 100 3.1±0.02µs 3.0±0.02µs 31.2 Mtx/sec 31.3 Mtx/sec
person json 100 4.9±0.04µs 5.0±0.03µs 19.5 Mtx/sec 19.3 Mtx/sec
person product_value 100 1123.6±0.58ns 1006.5±3.63ns 84.9 Mtx/sec 94.8 Mtx/sec

Module: invoke with large arguments

arg size new latency old latency new throughput old throughput
64KiB 75.3±6.44µs 79.1±4.72µs - -

Module: print bulk

line count new latency old latency new throughput old throughput
1 20.4±0.72µs 22.4±1.12µs - -
100 203.9±12.87µs 201.4±3.97µs - -
1000 1841.8±54.55µs 1875.7±77.42µs - -

Remaining benchmarks

name new latency old latency new throughput old throughput

@kim
Copy link
Contributor Author

kim commented Oct 30, 2023

Is the restart smoke test failure related?

Looks like it was a flake.

Comment on lines 114 to 116
let mut count = 0;

if self.parent_commit_hash.is_none() {
count += 1;
} else {
count += 1;
count += self.parent_commit_hash.unwrap().data.len();
count += 1; // tag for option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: collapse these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +32 to +33
// [`DataKey`] is defined in `lib`, so we can't have an [`Arbitrary`] impl
// for it just yet due to orphan rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd do in this situation is to move the impl to that crate and expose the impl under a proptest feature that can be enabled in dev-dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I tried to keep changes local for this one, but happy to send a follow-up defining Arbitrary for Hash and DataKey. Perhaps that'd even increase probability for folks to write property tests ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great 🚀

Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bug in encoded_len for Write.

// 1 for flags, 4 for set_id
let mut count = 1 + 4;
let mut count = self.operation.encoded_len();
count += 4; // set_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong. Shouldn't it be 5 based on the operation encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s operation.len + 4 = 1+4 = 5, like before no?

@@ -91,21 +131,36 @@ impl Commit {
count
}

pub fn encode(&self, bytes: &mut Vec<u8>) {
bytes.reserve(self.encoded_len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little unfortunate that we're losing this optimization, but maybe we can improve things by not using a Vec at all but a buffer pool or something in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I meant to make sure the caller takes care of this, will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kim kim requested a review from cloutiertyler November 1, 2023 18:35
@kim kim enabled auto-merge (squash) November 6, 2023 09:19
@kim kim dismissed cloutiertyler’s stale review November 6, 2023 09:23

It was not a bug, and I added a test to assure it's not.

@kim kim merged commit 9a263b0 into master Nov 6, 2023
5 checks passed
@kim kim deleted the kim/decode-commit branch November 6, 2023 09:23
kulakowski pushed a commit that referenced this pull request Nov 7, 2023
* core: Refactor commit encoding / decoding

Use `sats::{BufReader, BufWriter}` for decoding / encoding of `Commit`
and associated types. This makes `decode` fallible (which is quite
desirable, instead of panicking).

As the `DecodeError` from sats is fairly sparse, also add some context
about where exactly decoding failed.

Lastly, add some documentation and (property) tests.
kulakowski pushed a commit that referenced this pull request Nov 7, 2023
* core: Refactor commit encoding / decoding

Use `sats::{BufReader, BufWriter}` for decoding / encoding of `Commit`
and associated types. This makes `decode` fallible (which is quite
desirable, instead of panicking).

As the `DecodeError` from sats is fairly sparse, also add some context
about where exactly decoding failed.

Lastly, add some documentation and (property) tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants