Refactor commit encoding / decoding #495

kim · 2023-10-30T14:31:49Z

Description of Changes

Use sats::{BufReader, BufWriter} for decoding / encoding of Commit
and associated types. This makes decode fallible (which is quite
desirable, instead of panicking).

As the DecodeError from sats is fairly sparse, also add some context
about where exactly decoding failed.

Also add documentation and property test.

API and ABI

This is a breaking change to the module ABI
This is a breaking change to the module API
This is a breaking change to the ClientAPI
This is a breaking change to the SDK API

If the API is breaking, please state below what will break

Expected complexity level and risk

1

Use `sats::{BufReader, BufWriter}` for decoding / encoding of `Commit` and associated types. This makes `decode` fallible (which is quite desirable, instead of panicking). As the `DecodeError` from sats is fairly sparse, also add some context about where exactly decoding failed. Lastly, add some documentation.

crates/core/src/db/messages/commit.rs

kulakowski

Great.

I left a few minor sorts of comments about comments and names.

I'd also like to verify this isn't a performance regression. I don't think this code is a bottleneck for us anymore, but it sure used to be.

Is the restart smoke test failure related?

crates/core/src/db/messages/commit.rs

kulakowski · 2023-10-30T15:25:50Z

crates/core/src/db/messages/write.rs

+
+    pub fn datakey() -> impl Strategy<Value = DataKey> {
+        prop_oneof![
+            prop::collection::vec(any::<u8>(), 0..31).prop_map(DataKey::from_data),


I'd comment on the reason for the two ranges.

crates/core/src/db/messages/write.rs

kulakowski · 2023-10-30T15:30:50Z

crates/core/src/db/messages/write.rs

 pub struct Write {
    pub operation: Operation,
-    pub set_id: u32,
+    pub set_id: u32, // aka table id
+    #[cfg_attr(test, proptest(strategy = "arbitrary::datakey()"))]


I'd also comment about directing this to a hand-written strategy, or in general anywhere that the derive macro doesn't do the right thing on its own.

Is it fine to have those on the mod arbitrary intentionally kept close the to type definition? See ea62cf8

kulakowski · 2023-10-30T15:35:33Z

crates/core/src/db/messages/commit.rs

+#[cfg(test)]
+use proptest_derive::Arbitrary;
+
+/// A commit is one record in the write-ahead log.


A drive by thought: similarly to how we have a bunch of code that says set_id but commented "this is the table id", I would love to settle on saying "write ahead log" everywhere, including data type names etc. Or some other name. But let's pick one!

Yeah, I'd love to settle on WAL / write ahead log, maybe even collapsing CommitLog and MessageLog into a single type. Possibly not for this PR, though?

Yeah I'd love to have us try to take things that way. I agree on the naming, I think the commit log and message log names are implementation details and that "write ahead log" is the name that reflects the semantics the datastore needs to care about. But yeah not for this PR.

Just adding my two cents here. I agree on coalescing on the name. I avoided WAL because historically that's referred to a log that is eventually compressed/deleted, but I think on balance WAL is probably the best name. We can include a comment about the fact that the WAL should never be deleted.

We should also note that it will include information which is not normally in a WAL, including reducer call event info and that it forms a Merkle DAG between commits.

The term MessageLog was borrowed from Kafka.

Noted to include all of these in the upcoming formalization / design changes doc.

kim · 2023-10-30T16:21:35Z

I'd also like to verify this isn't a performance regression.

Hm, would this entail landing a separate patch with a benchmark before this, so we can track the difference?

kulakowski · 2023-10-30T16:27:26Z

I'd also like to verify this isn't a performance regression.

Hm, would this entail landing a separate patch with a benchmark before this, so we can track the difference?

I don't think so. I meant regression at the level of in benchmarks / game performance, not in any new functionality or anything. Like if worldgen goes from N seconds to 1.1N seconds, I'd want to look at this again, but if it stays at N seconds, send it.

kim · 2023-10-30T16:29:21Z

benchmarks please

kim · 2023-10-30T16:29:30Z

Let's try this :)

github-actions · 2023-10-30T16:29:35Z

Benchmark results

Benchmark Report

Legend:

load: number of rows pre-loaded into the database
count: number of rows touched by the transaction
index types:
- unique: a single index on the id column
- non_unique: no indexes
- multi_index: non-unique index on every column
schemas:
- person(id: u32, name: String, age: u64)
- location(id: u32, x: u64, y: u64)

All throughputs are single-threaded.

Empty transaction

db	on disk	new latency	old latency	new throughput	old throughput
sqlite	💿	-	425.5±2.12ns	-	-
sqlite	🧠	-	420.1±0.55ns	-	-
stdb_module	💿	16.2±0.35µs	18.7±1.35µs	-	-
stdb_module	🧠	16.4±0.50µs	17.5±0.81µs	-	-
stdb_raw	💿	200.3±1.40ns	734.9±2.04ns	-	-
stdb_raw	🧠	190.3±0.15ns	731.6±1.38ns	-	-

Single-row insertions

db	on disk	schema	index type	load	new latency	old latency	new throughput	old throughput
sqlite	💿	location	multi_index	0	-	14.9±0.69µs	-	65.4 Ktx/sec
sqlite	💿	location	multi_index	1000	-	15.8±0.08µs	-	61.7 Ktx/sec
sqlite	💿	location	non_unique	0	-	7.3±0.57µs	-	134.5 Ktx/sec
sqlite	💿	location	non_unique	1000	-	7.0±0.02µs	-	140.3 Ktx/sec
sqlite	💿	location	unique	0	-	7.2±0.02µs	-	135.3 Ktx/sec
sqlite	💿	location	unique	1000	-	7.0±0.04µs	-	139.1 Ktx/sec
sqlite	💿	person	multi_index	0	-	14.7±0.74µs	-	66.2 Ktx/sec
sqlite	💿	person	multi_index	1000	-	16.1±0.11µs	-	60.6 Ktx/sec
sqlite	💿	person	non_unique	0	-	7.3±0.04µs	-	134.1 Ktx/sec
sqlite	💿	person	non_unique	1000	-	7.2±0.05µs	-	135.0 Ktx/sec
sqlite	💿	person	unique	0	-	7.4±0.55µs	-	132.7 Ktx/sec
sqlite	💿	person	unique	1000	-	7.3±0.05µs	-	134.4 Ktx/sec
sqlite	🧠	location	multi_index	0	-	4.0±0.01µs	-	245.0 Ktx/sec
sqlite	🧠	location	multi_index	1000	-	5.3±0.07µs	-	185.2 Ktx/sec
sqlite	🧠	location	non_unique	0	-	1847.2±4.33ns	-	528.7 Ktx/sec
sqlite	🧠	location	non_unique	1000	-	1911.9±8.02ns	-	510.8 Ktx/sec
sqlite	🧠	location	unique	0	-	1817.3±3.44ns	-	537.4 Ktx/sec
sqlite	🧠	location	unique	1000	-	1941.0±9.04ns	-	503.1 Ktx/sec
sqlite	🧠	person	multi_index	0	-	3.7±0.01µs	-	266.2 Ktx/sec
sqlite	🧠	person	multi_index	1000	-	5.4±0.03µs	-	180.2 Ktx/sec
sqlite	🧠	person	non_unique	0	-	1908.0±7.49ns	-	511.8 Ktx/sec
sqlite	🧠	person	non_unique	1000	-	1977.5±11.82ns	-	493.8 Ktx/sec
sqlite	🧠	person	unique	0	-	1897.2±3.67ns	-	514.8 Ktx/sec
sqlite	🧠	person	unique	1000	-	2.0±0.01µs	-	483.5 Ktx/sec
stdb_module	💿	location	multi_index	0	48.4±4.93µs	50.4±3.90µs	20.2 Ktx/sec	19.4 Ktx/sec
stdb_module	💿	location	multi_index	1000	108.6±6.77µs	155.5±12.70µs	9.0 Ktx/sec	6.3 Ktx/sec
stdb_module	💿	location	non_unique	0	38.7±4.23µs	44.7±4.63µs	25.2 Ktx/sec	21.9 Ktx/sec
stdb_module	💿	location	non_unique	1000	172.8±41.00µs	136.3±45.32µs	5.7 Ktx/sec	7.2 Ktx/sec
stdb_module	💿	location	unique	0	44.9±6.47µs	46.0±3.16µs	21.7 Ktx/sec	21.2 Ktx/sec
stdb_module	💿	location	unique	1000	189.5±70.74µs	141.5±30.18µs	5.2 Ktx/sec	6.9 Ktx/sec
stdb_module	💿	person	multi_index	0	60.6±5.22µs	62.5±4.76µs	16.1 Ktx/sec	15.6 Ktx/sec
stdb_module	💿	person	multi_index	1000	151.8±45.31µs	126.0±18.75µs	6.4 Ktx/sec	7.8 Ktx/sec
stdb_module	💿	person	non_unique	0	39.7±3.79µs	43.6±6.39µs	24.6 Ktx/sec	22.4 Ktx/sec
stdb_module	💿	person	non_unique	1000	185.6±35.77µs	258.5±95.07µs	5.3 Ktx/sec	3.8 Ktx/sec
stdb_module	💿	person	unique	0	50.4±5.19µs	49.9±5.66µs	19.4 Ktx/sec	19.6 Ktx/sec
stdb_module	💿	person	unique	1000	246.1±88.83µs	125.0±7.30µs	4.0 Ktx/sec	7.8 Ktx/sec
stdb_module	🧠	location	multi_index	0	30.6±3.10µs	36.7±3.54µs	31.9 Ktx/sec	26.6 Ktx/sec
stdb_module	🧠	location	multi_index	1000	113.7±35.59µs	200.4±46.97µs	8.6 Ktx/sec	4.9 Ktx/sec
stdb_module	🧠	location	non_unique	0	26.8±1.59µs	27.2±1.72µs	36.5 Ktx/sec	35.9 Ktx/sec
stdb_module	🧠	location	non_unique	1000	171.8±7.37µs	248.0±8.25µs	5.7 Ktx/sec	3.9 Ktx/sec
stdb_module	🧠	location	unique	0	28.4±2.33µs	31.5±3.13µs	34.4 Ktx/sec	31.0 Ktx/sec
stdb_module	🧠	location	unique	1000	182.3±11.24µs	104.3±0.72µs	5.4 Ktx/sec	9.4 Ktx/sec
stdb_module	🧠	person	multi_index	0	36.5±3.21µs	44.5±5.38µs	26.8 Ktx/sec	22.0 Ktx/sec
stdb_module	🧠	person	multi_index	1000	139.0±6.71µs	319.9±13.08µs	7.0 Ktx/sec	3.1 Ktx/sec
stdb_module	🧠	person	non_unique	0	27.3±1.61µs	31.8±2.84µs	35.7 Ktx/sec	30.7 Ktx/sec
stdb_module	🧠	person	non_unique	1000	323.5±14.34µs	295.8±2.12µs	3.0 Ktx/sec	3.3 Ktx/sec
stdb_module	🧠	person	unique	0	30.3±2.48µs	37.2±3.61µs	32.2 Ktx/sec	26.2 Ktx/sec
stdb_module	🧠	person	unique	1000	243.5±16.41µs	186.4±12.01µs	4.0 Ktx/sec	5.2 Ktx/sec
stdb_raw	💿	location	multi_index	0	6.6±0.01µs	7.1±0.41µs	148.2 Ktx/sec	136.8 Ktx/sec
stdb_raw	💿	location	multi_index	1000	32.8±236.91µs	10.1±2.46µs	29.8 Ktx/sec	96.8 Ktx/sec
stdb_raw	💿	location	non_unique	0	4.2±0.01µs	4.7±0.02µs	232.1 Ktx/sec	207.8 Ktx/sec
stdb_raw	💿	location	non_unique	1000	5.6±0.14µs	6.2±0.35µs	174.8 Ktx/sec	157.5 Ktx/sec
stdb_raw	💿	location	unique	0	5.5±0.06µs	6.1±0.02µs	177.3 Ktx/sec	160.5 Ktx/sec
stdb_raw	💿	location	unique	1000	7.8±0.16µs	27.7±191.28µs	125.9 Ktx/sec	35.3 Ktx/sec
stdb_raw	💿	person	multi_index	0	10.3±0.03µs	10.9±0.01µs	95.2 Ktx/sec	89.8 Ktx/sec
stdb_raw	💿	person	multi_index	1000	13.3±0.16µs	63.6±493.29µs	73.3 Ktx/sec	15.4 Ktx/sec
stdb_raw	💿	person	non_unique	0	4.8±0.20µs	5.3±0.01µs	203.3 Ktx/sec	185.1 Ktx/sec
stdb_raw	💿	person	non_unique	1000	18.8±125.20µs	6.9±0.10µs	51.9 Ktx/sec	141.3 Ktx/sec
stdb_raw	💿	person	unique	0	7.2±0.65µs	7.7±0.02µs	136.5 Ktx/sec	127.5 Ktx/sec
stdb_raw	💿	person	unique	1000	30.9±211.95µs	10.4±0.11µs	31.6 Ktx/sec	93.8 Ktx/sec
stdb_raw	🧠	location	multi_index	0	3.7±0.01µs	4.2±0.01µs	267.5 Ktx/sec	232.8 Ktx/sec
stdb_raw	🧠	location	multi_index	1000	5.0±0.04µs	5.8±0.03µs	194.1 Ktx/sec	168.7 Ktx/sec
stdb_raw	🧠	location	non_unique	0	1403.7±4.02ns	1939.5±5.03ns	695.7 Ktx/sec	503.5 Ktx/sec
stdb_raw	🧠	location	non_unique	1000	1821.4±18.78ns	2.5±0.03µs	536.2 Ktx/sec	393.2 Ktx/sec
stdb_raw	🧠	location	unique	0	2.7±0.01µs	3.2±0.00µs	367.3 Ktx/sec	308.5 Ktx/sec
stdb_raw	🧠	location	unique	1000	3.7±0.02µs	4.4±0.02µs	265.9 Ktx/sec	223.8 Ktx/sec
stdb_raw	🧠	person	multi_index	0	7.3±0.08µs	7.8±0.01µs	134.1 Ktx/sec	125.8 Ktx/sec
stdb_raw	🧠	person	multi_index	1000	9.2±0.04µs	10.0±0.09µs	105.7 Ktx/sec	97.6 Ktx/sec
stdb_raw	🧠	person	non_unique	0	1974.3±5.97ns	2.4±0.01µs	494.6 Ktx/sec	399.8 Ktx/sec
stdb_raw	🧠	person	non_unique	1000	2.5±0.01µs	3.1±0.02µs	384.5 Ktx/sec	310.1 Ktx/sec
stdb_raw	🧠	person	unique	0	4.2±0.01µs	4.7±0.05µs	232.5 Ktx/sec	207.7 Ktx/sec
stdb_raw	🧠	person	unique	1000	5.5±0.05µs	6.2±0.04µs	177.7 Ktx/sec	157.8 Ktx/sec

Multi-row insertions

db	on disk	schema	index type	load	count	new latency	old latency	new throughput	old throughput
sqlite	💿	location	multi_index	0	100	-	133.2±6.53µs	-	7.3 Ktx/sec
sqlite	💿	location	multi_index	1000	100	-	202.6±1.13µs	-	4.8 Ktx/sec
sqlite	💿	location	non_unique	0	100	-	50.9±0.38µs	-	19.2 Ktx/sec
sqlite	💿	location	non_unique	1000	100	-	52.6±0.24µs	-	18.6 Ktx/sec
sqlite	💿	location	unique	0	100	-	54.0±1.88µs	-	18.1 Ktx/sec
sqlite	💿	location	unique	1000	100	-	57.6±0.24µs	-	17.0 Ktx/sec
sqlite	💿	person	multi_index	0	100	-	118.6±2.00µs	-	8.2 Ktx/sec
sqlite	💿	person	multi_index	1000	100	-	246.9±94.84µs	-	4.0 Ktx/sec
sqlite	💿	person	non_unique	0	100	-	49.5±2.80µs	-	19.7 Ktx/sec
sqlite	💿	person	non_unique	1000	100	-	59.9±0.37µs	-	16.3 Ktx/sec
sqlite	💿	person	unique	0	100	-	50.5±0.49µs	-	19.3 Ktx/sec
sqlite	💿	person	unique	1000	100	-	56.3±0.28µs	-	17.3 Ktx/sec
sqlite	🧠	location	multi_index	0	100	-	120.1±0.42µs	-	8.1 Ktx/sec
sqlite	🧠	location	multi_index	1000	100	-	169.4±0.28µs	-	5.8 Ktx/sec
sqlite	🧠	location	non_unique	0	100	-	43.8±0.33µs	-	22.3 Ktx/sec
sqlite	🧠	location	non_unique	1000	100	-	44.4±0.37µs	-	22.0 Ktx/sec
sqlite	🧠	location	unique	0	100	-	47.4±0.36µs	-	20.6 Ktx/sec
sqlite	🧠	location	unique	1000	100	-	49.6±0.26µs	-	19.7 Ktx/sec
sqlite	🧠	person	multi_index	0	100	-	107.7±0.44µs	-	9.1 Ktx/sec
sqlite	🧠	person	multi_index	1000	100	-	189.7±0.61µs	-	5.1 Ktx/sec
sqlite	🧠	person	non_unique	0	100	-	42.0±0.31µs	-	23.3 Ktx/sec
sqlite	🧠	person	non_unique	1000	100	-	46.4±0.24µs	-	21.0 Ktx/sec
sqlite	🧠	person	unique	0	100	-	44.6±0.35µs	-	21.9 Ktx/sec
sqlite	🧠	person	unique	1000	100	-	48.4±0.44µs	-	20.2 Ktx/sec
stdb_module	💿	location	multi_index	0	100	891.3±83.58µs	959.4±69.44µs	1121 tx/sec	1042 tx/sec
stdb_module	💿	location	multi_index	1000	100	1117.7±163.42µs	994.2±123.30µs	894 tx/sec	1005 tx/sec
stdb_module	💿	location	non_unique	0	100	462.1±89.99µs	436.4±81.89µs	2.1 Ktx/sec	2.2 Ktx/sec
stdb_module	💿	location	non_unique	1000	100	640.7±37.98µs	548.0±47.50µs	1560 tx/sec	1824 tx/sec
stdb_module	💿	location	unique	0	100	716.1±74.79µs	783.4±2.21µs	1396 tx/sec	1276 tx/sec
stdb_module	💿	location	unique	1000	100	937.1±123.46µs	568.5±1.88µs	1067 tx/sec	1758 tx/sec
stdb_module	💿	person	multi_index	0	100	1088.4±154.95µs	1005.2±170.26µs	918 tx/sec	994 tx/sec
stdb_module	💿	person	multi_index	1000	100	1236.5±125.61µs	1294.6±18.48µs	808 tx/sec	772 tx/sec
stdb_module	💿	person	non_unique	0	100	553.3±52.23µs	650.6±72.48µs	1807 tx/sec	1537 tx/sec
stdb_module	💿	person	non_unique	1000	100	991.3±7.71µs	889.0±44.83µs	1008 tx/sec	1124 tx/sec
stdb_module	💿	person	unique	0	100	647.1±56.51µs	629.9±11.81µs	1545 tx/sec	1587 tx/sec
stdb_module	💿	person	unique	1000	100	998.9±44.09µs	703.0±49.76µs	1001 tx/sec	1422 tx/sec
stdb_module	🧠	location	multi_index	0	100	639.2±113.82µs	559.2±162.85µs	1564 tx/sec	1788 tx/sec
stdb_module	🧠	location	multi_index	1000	100	710.1±108.92µs	650.2±13.83µs	1408 tx/sec	1537 tx/sec
stdb_module	🧠	location	non_unique	0	100	356.2±5.39µs	332.0±20.80µs	2.7 Ktx/sec	2.9 Ktx/sec
stdb_module	🧠	location	non_unique	1000	100	315.5±7.64µs	357.5±3.22µs	3.1 Ktx/sec	2.7 Ktx/sec
stdb_module	🧠	location	unique	0	100	555.6±1.47µs	369.4±49.31µs	1799 tx/sec	2.6 Ktx/sec
stdb_module	🧠	location	unique	1000	100	653.6±3.67µs	454.6±91.74µs	1529 tx/sec	2.1 Ktx/sec
stdb_module	🧠	person	multi_index	0	100	895.7±1.41µs	843.3±1.79µs	1116 tx/sec	1185 tx/sec
stdb_module	🧠	person	multi_index	1000	100	937.7±23.42µs	1131.0±14.42µs	1066 tx/sec	884 tx/sec
stdb_module	🧠	person	non_unique	0	100	491.7±22.04µs	407.4±16.38µs	2033 tx/sec	2.4 Ktx/sec
stdb_module	🧠	person	non_unique	1000	100	488.5±52.12µs	601.3±33.84µs	2047 tx/sec	1663 tx/sec
stdb_module	🧠	person	unique	0	100	574.8±1.36µs	554.3±3.94µs	1739 tx/sec	1804 tx/sec
stdb_module	🧠	person	unique	1000	100	668.3±58.71µs	827.5±37.15µs	1496 tx/sec	1208 tx/sec
stdb_raw	💿	location	multi_index	0	100	384.3±1.31µs	380.3±0.55µs	2.5 Ktx/sec	2.6 Ktx/sec
stdb_raw	💿	location	multi_index	1000	100	412.4±2.26µs	406.2±1.50µs	2.4 Ktx/sec	2.4 Ktx/sec
stdb_raw	💿	location	non_unique	0	100	157.2±5.85µs	158.7±0.35µs	6.2 Ktx/sec	6.2 Ktx/sec
stdb_raw	💿	location	non_unique	1000	100	168.1±83.07µs	170.4±97.09µs	5.8 Ktx/sec	5.7 Ktx/sec
stdb_raw	💿	location	unique	0	100	283.0±0.33µs	282.0±0.25µs	3.5 Ktx/sec	3.5 Ktx/sec
stdb_raw	💿	location	unique	1000	100	304.8±1.28µs	322.0±200.71µs	3.2 Ktx/sec	3.0 Ktx/sec
stdb_raw	💿	person	multi_index	0	100	703.9±2.83µs	711.1±18.40µs	1420 tx/sec	1406 tx/sec
stdb_raw	💿	person	multi_index	1000	100	777.5±423.71µs	791.8±522.65µs	1286 tx/sec	1262 tx/sec
stdb_raw	💿	person	non_unique	0	100	213.2±0.41µs	216.0±0.14µs	4.6 Ktx/sec	4.5 Ktx/sec
stdb_raw	💿	person	non_unique	1000	100	216.2±0.62µs	237.9±185.07µs	4.5 Ktx/sec	4.1 Ktx/sec
stdb_raw	💿	person	unique	0	100	424.2±0.42µs	426.7±0.47µs	2.3 Ktx/sec	2.3 Ktx/sec
stdb_raw	💿	person	unique	1000	100	446.2±1.54µs	471.3±243.28µs	2.2 Ktx/sec	2.1 Ktx/sec
stdb_raw	🧠	location	multi_index	0	100	301.2±0.50µs	294.9±0.37µs	3.2 Ktx/sec	3.3 Ktx/sec
stdb_raw	🧠	location	multi_index	1000	100	329.0±1.94µs	320.9±0.40µs	3.0 Ktx/sec	3.0 Ktx/sec
stdb_raw	🧠	location	non_unique	0	100	75.2±0.10µs	73.8±0.11µs	13.0 Ktx/sec	13.2 Ktx/sec
stdb_raw	🧠	location	non_unique	1000	100	76.7±0.14µs	76.1±0.15µs	12.7 Ktx/sec	12.8 Ktx/sec
stdb_raw	🧠	location	unique	0	100	201.5±0.49µs	197.2±0.26µs	4.8 Ktx/sec	5.0 Ktx/sec
stdb_raw	🧠	location	unique	1000	100	221.6±0.51µs	216.6±0.33µs	4.4 Ktx/sec	4.5 Ktx/sec
stdb_raw	🧠	person	multi_index	0	100	615.5±0.95µs	615.0±0.82µs	1624 tx/sec	1626 tx/sec
stdb_raw	🧠	person	multi_index	1000	100	646.5±1.09µs	644.3±0.77µs	1546 tx/sec	1552 tx/sec
stdb_raw	🧠	person	non_unique	0	100	126.8±0.14µs	126.6±0.11µs	7.7 Ktx/sec	7.7 Ktx/sec
stdb_raw	🧠	person	non_unique	1000	100	129.1±0.22µs	129.7±0.40µs	7.6 Ktx/sec	7.5 Ktx/sec
stdb_raw	🧠	person	unique	0	100	337.6±0.39µs	336.3±0.21µs	2.9 Ktx/sec	2.9 Ktx/sec
stdb_raw	🧠	person	unique	1000	100	358.2±1.84µs	357.0±2.01µs	2.7 Ktx/sec	2.7 Ktx/sec

Full table iterate

db	on disk	schema	index type	new latency	old latency	new throughput	old throughput
sqlite	💿	location	unique	-	9.0±0.14µs	-	108.3 Ktx/sec
sqlite	💿	person	unique	-	9.5±0.11µs	-	103.2 Ktx/sec
sqlite	🧠	location	unique	-	7.8±0.11µs	-	125.9 Ktx/sec
sqlite	🧠	person	unique	-	8.4±0.09µs	-	116.7 Ktx/sec
stdb_module	💿	location	unique	46.8±3.02µs	49.2±4.53µs	20.9 Ktx/sec	19.8 Ktx/sec
stdb_module	💿	person	unique	57.0±10.28µs	53.4±10.70µs	17.1 Ktx/sec	18.3 Ktx/sec
stdb_module	🧠	location	unique	47.2±4.34µs	48.8±5.30µs	20.7 Ktx/sec	20.0 Ktx/sec
stdb_module	🧠	person	unique	60.9±9.26µs	62.6±10.45µs	16.0 Ktx/sec	15.6 Ktx/sec
stdb_raw	💿	location	unique	9.2±0.06µs	9.3±0.15µs	106.2 Ktx/sec	105.5 Ktx/sec
stdb_raw	💿	person	unique	9.2±0.08µs	9.3±0.15µs	106.1 Ktx/sec	105.3 Ktx/sec
stdb_raw	🧠	location	unique	9.2±0.05µs	9.5±0.12µs	106.3 Ktx/sec	102.9 Ktx/sec
stdb_raw	🧠	person	unique	9.2±0.03µs	9.3±0.19µs	106.2 Ktx/sec	105.1 Ktx/sec

Find unique key

db	on disk	key type	load	new latency	old latency	new throughput	old throughput
sqlite	💿	u32	1000	-	2.3±0.01µs	-	420.5 Ktx/sec
sqlite	🧠	u32	1000	-	1134.0±4.65ns	-	861.2 Ktx/sec
stdb_module	💿	u32	1000	19.7±0.83µs	23.3±2.28µs	49.6 Ktx/sec	41.8 Ktx/sec
stdb_module	🧠	u32	1000	19.7±0.95µs	22.1±1.12µs	49.5 Ktx/sec	44.1 Ktx/sec
stdb_raw	💿	u32	1000	884.5±8.24ns	1422.6±6.41ns	1104.1 Ktx/sec	686.5 Ktx/sec
stdb_raw	🧠	u32	1000	883.1±4.51ns	1419.3±3.90ns	1105.8 Ktx/sec	688.0 Ktx/sec

Filter

db	on disk	key type	index strategy	load	count	new latency	old latency	new throughput	old throughput
sqlite	💿	string	indexed	1000	10	-	5.5±0.02µs	-	176.2 Ktx/sec
sqlite	💿	string	non_indexed	1000	10	-	50.8±0.40µs	-	19.2 Ktx/sec
sqlite	💿	u64	indexed	1000	10	-	5.4±0.03µs	-	181.0 Ktx/sec
sqlite	💿	u64	non_indexed	1000	10	-	32.9±0.07µs	-	29.7 Ktx/sec
sqlite	🧠	string	indexed	1000	10	-	4.2±0.02µs	-	233.9 Ktx/sec
sqlite	🧠	string	non_indexed	1000	10	-	48.1±0.59µs	-	20.3 Ktx/sec
sqlite	🧠	u64	indexed	1000	10	-	4.0±0.03µs	-	241.6 Ktx/sec
sqlite	🧠	u64	non_indexed	1000	10	-	31.7±0.06µs	-	30.8 Ktx/sec
stdb_module	💿	string	indexed	1000	10	29.8±2.41µs	34.0±2.23µs	32.7 Ktx/sec	28.7 Ktx/sec
stdb_module	💿	string	non_indexed	1000	10	174.8±1.77µs	181.1±1.11µs	5.6 Ktx/sec	5.4 Ktx/sec
stdb_module	💿	u64	indexed	1000	10	24.6±1.88µs	29.7±1.93µs	39.6 Ktx/sec	32.9 Ktx/sec
stdb_module	💿	u64	non_indexed	1000	10	147.2±1.55µs	153.4±1.00µs	6.6 Ktx/sec	6.4 Ktx/sec
stdb_module	🧠	string	indexed	1000	10	30.4±2.06µs	34.7±2.23µs	32.1 Ktx/sec	28.2 Ktx/sec
stdb_module	🧠	string	non_indexed	1000	10	168.6±0.96µs	179.5±2.19µs	5.8 Ktx/sec	5.4 Ktx/sec
stdb_module	🧠	u64	indexed	1000	10	24.8±2.01µs	29.6±2.80µs	39.4 Ktx/sec	33.0 Ktx/sec
stdb_module	🧠	u64	non_indexed	1000	10	145.8±3.22µs	154.4±5.68µs	6.7 Ktx/sec	6.3 Ktx/sec
stdb_raw	💿	string	indexed	1000	10	3.5±0.01µs	3.8±0.02µs	282.4 Ktx/sec	253.7 Ktx/sec
stdb_raw	💿	string	non_indexed	1000	10	146.4±0.59µs	152.0±0.59µs	6.7 Ktx/sec	6.4 Ktx/sec
stdb_raw	💿	u64	indexed	1000	10	3.3±0.03µs	3.7±0.01µs	291.9 Ktx/sec	261.7 Ktx/sec
stdb_raw	💿	u64	non_indexed	1000	10	121.4±1.72µs	132.0±0.13µs	8.0 Ktx/sec	7.4 Ktx/sec
stdb_raw	🧠	string	indexed	1000	10	3.5±0.01µs	3.8±0.03µs	283.0 Ktx/sec	254.2 Ktx/sec
stdb_raw	🧠	string	non_indexed	1000	10	146.8±0.54µs	152.2±0.40µs	6.7 Ktx/sec	6.4 Ktx/sec
stdb_raw	🧠	u64	indexed	1000	10	3.3±0.01µs	3.7±0.01µs	292.9 Ktx/sec	261.7 Ktx/sec
stdb_raw	🧠	u64	non_indexed	1000	10	121.8±0.17µs	132.5±0.27µs	8.0 Ktx/sec	7.4 Ktx/sec

Serialize

schema	format	count	new latency	old latency	new throughput	old throughput
location	bsatn	100	1655.8±26.70ns	1618.9±31.29ns	57.6 Mtx/sec	58.9 Mtx/sec
location	json	100	3.3±0.06µs	3.2±0.01µs	28.7 Mtx/sec	30.0 Mtx/sec
location	product_value	100	847.0±0.46ns	574.7±0.33ns	112.6 Mtx/sec	166.0 Mtx/sec
person	bsatn	100	3.1±0.02µs	3.0±0.02µs	31.2 Mtx/sec	31.3 Mtx/sec
person	json	100	4.9±0.04µs	5.0±0.03µs	19.5 Mtx/sec	19.3 Mtx/sec
person	product_value	100	1123.6±0.58ns	1006.5±3.63ns	84.9 Mtx/sec	94.8 Mtx/sec

Module: invoke with large arguments

arg size	new latency	old latency	new throughput	old throughput
64KiB	75.3±6.44µs	79.1±4.72µs	-	-

Module: print bulk

line count	new latency	old latency	new throughput	old throughput
1	20.4±0.72µs	22.4±1.12µs	-	-
100	203.9±12.87µs	201.4±3.97µs	-	-
1000	1841.8±54.55µs	1875.7±77.42µs	-	-

Remaining benchmarks

name	new latency	old latency	new throughput	old throughput

kim · 2023-10-30T16:34:25Z

Is the restart smoke test failure related?

Looks like it was a flake.

Centril · 2023-10-30T17:06:09Z

crates/core/src/db/messages/commit.rs

        let mut count = 0;

-        if self.parent_commit_hash.is_none() {
-            count += 1;
-        } else {
-            count += 1;
-            count += self.parent_commit_hash.unwrap().data.len();
+        count += 1; // tag for option


Nit: collapse these

Centril · 2023-10-30T17:20:31Z

crates/core/src/db/messages/write.rs

+    // [`DataKey`] is defined in `lib`, so we can't have an [`Arbitrary`] impl
+    // for it just yet due to orphan rules.


What I'd do in this situation is to move the impl to that crate and expose the impl under a proptest feature that can be enabled in dev-dependencies.

Sure. I tried to keep changes local for this one, but happy to send a follow-up defining Arbitrary for Hash and DataKey. Perhaps that'd even increase probability for folks to write property tests ;)

Sounds great 🚀

cloutiertyler

I think there's a bug in encoded_len for Write.

cloutiertyler · 2023-10-30T23:04:13Z

crates/core/src/db/messages/write.rs

-        // 1 for flags, 4 for set_id
-        let mut count = 1 + 4;
+        let mut count = self.operation.encoded_len();
+        count += 4; // set_id


This seems wrong. Shouldn't it be 5 based on the operation encoding?

It’s operation.len + 4 = 1+4 = 5, like before no?

cloutiertyler · 2023-10-30T23:05:13Z

crates/core/src/db/messages/commit.rs

@@ -91,21 +131,36 @@ impl Commit {
        count
    }

-    pub fn encode(&self, bytes: &mut Vec<u8>) {
-        bytes.reserve(self.encoded_len());


It's a little unfortunate that we're losing this optimization, but maybe we can improve things by not using a Vec at all but a buffer pool or something in the future.

Good point. I meant to make sure the caller takes care of this, will add.

Also comment on the scope of proptest_config.

It was not a bug, and I added a test to assure it's not.

* core: Refactor commit encoding / decoding Use `sats::{BufReader, BufWriter}` for decoding / encoding of `Commit` and associated types. This makes `decode` fallible (which is quite desirable, instead of panicking). As the `DecodeError` from sats is fairly sparse, also add some context about where exactly decoding failed. Lastly, add some documentation and (property) tests.

kim added 2 commits October 27, 2023 12:27

core: Add rountrip property test for commit encoding

b7db37d

kim added the release-0.8 label Oct 30, 2023

kim requested review from kulakowski and cloutiertyler October 30, 2023 14:31

kim commented Oct 30, 2023

View reviewed changes

crates/core/src/db/messages/commit.rs Show resolved Hide resolved

kulakowski approved these changes Oct 30, 2023

View reviewed changes

Centril self-requested a review October 30, 2023 15:49

Add commentary

ea62cf8

kim mentioned this pull request Oct 30, 2023

core: Track the number of bytes read when iterating over the WAL #496

Merged

4 tasks

Centril mentioned this pull request Oct 30, 2023

Misc refactoring preparing for multi col indices #497

Merged

4 tasks

Centril reviewed Oct 30, 2023

View reviewed changes

cloutiertyler previously requested changes Oct 30, 2023

View reviewed changes

kim added 3 commits November 1, 2023 14:09

Add prop test to assure encoded len is the encoded len

2ff8b4f

Also comment on the scope of proptest_config.

Use encoded_len to allocate buffer of the right capacity

e05f58e

Address review nit

6b66feb

kim mentioned this pull request Nov 1, 2023

lib: Provide Arbitrary impls for Hash and DataKey #517

Merged

4 tasks

kim requested a review from cloutiertyler November 1, 2023 18:35

kim enabled auto-merge (squash) November 6, 2023 09:19

kim merged commit 9a263b0 into master Nov 6, 2023
5 checks passed

kim deleted the kim/decode-commit branch November 6, 2023 09:23

jdetter mentioned this pull request Nov 15, 2023

release/v0.7.4 beta attempt4 #566

Closed

		// [`DataKey`] is defined in `lib`, so we can't have an [`Arbitrary`] impl
		// for it just yet due to orphan rules.

Refactor commit encoding / decoding #495

Refactor commit encoding / decoding #495

Conversation

kim commented Oct 30, 2023

Description of Changes

API and ABI

Expected complexity level and risk

kulakowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kim commented Oct 30, 2023

kulakowski commented Oct 30, 2023

kim commented Oct 30, 2023

kim commented Oct 30, 2023

github-actions bot commented Oct 30, 2023 • edited Loading

Benchmark Report

Empty transaction

Single-row insertions

Multi-row insertions

Full table iterate

Find unique key

Filter

Serialize

Module: invoke with large arguments

Module: print bulk

Remaining benchmarks

kim commented Oct 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloutiertyler left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 30, 2023 •

edited

Loading

cloutiertyler left a comment •

edited

Loading