A pure-Rust two-level dynamic order-statistic b-tree.
This crate implements a compact set data structure that preserves its elements' sorted order and allows lookups of entries by value or sorted order position.
Under the feature concurrent
you can find a version of the BTree that can be fearlessly shared between
threads.
Both the concurrent and single-threaded versions are meant to be drop-in replacements for the stdlib BTree. This is mostly true for the latter but not for the former, yet.
This was heavily inspired by indexmap
, and
python's sortedcontainers
.
It differs from both in that:
indexmap
is a hashmap that provides numerical lookups, but does not maintain order in case of removals, whileindexset
is a b-tree that irrespective of which mutating operation is run, always maintains order.sortecontainers
is similar in spirit, but utilizes a different routine for balancing the tree, and relies on a heap for numerical lookups.
indexset
provides the following features:
- As fast to iterate as a vec.
- Zero indirection.
- Lookups by position and range.
- Minimal amount of allocations.
select
(lookups by position) andrank
operations in near constant time.
BTreeSet
and BTreeMap
derive their performance much from how they are constructed, which is:
A two-level B-Tree with a fenwick tree as a low-cost index for numerical lookups
Each node is a leaf, and each leaf is a vec with a fixed capacity of size B
, with 1024
being the default.
The following hold:
- Iteration is very fast since it is done by inheriting vec's iter struct.
- Lookups only need two binary searches. One over
n/B
nodes and another overB
elements:O(log(n/B) + log(B)) = O(log(n))
. - Insertions are constant time
O(B)
in the best case andO(B^2)
in the worst. Removals areO(log(n))
.
The following numbers were obtained on a M3 macbook pro:
Command: cargo bench --bench stdlib --all-features
- Inserting 100k random usize
stdlib::collections::BTreeSet.insert(i)
: 8.9msindexset::BTreeSet.insert(i)
: 13.1msindexset::concurrent::set::BTreeSet.insert(i)
: 14.0ms
- Checking that each 100k random usize integers exist
stdlib::collections::BTreeSet.contains(i)
: 7.02msindexset::BTreeSet.contains(i)
: 5.22msindexset::concurrent::set::BTreeSet.contains(i)
: 5.27ms
- Getting all 100k i-th elements
stdlib::collections::BTreeSet.iter.nth(i)
: 13.28s yes, seconds!indexset::BTreeSet.get_index(i)
: 3.93ms
- Iterating over all 100k elements and then collecting it into a vec
Vec::from_iter(stdlib::collections::BTreeSet.iter())
: 227.28usVec::from_iter(indexset::BTreeSet.iter())
: 123.21.us
Getting the i-th element is 3400x faster than stdlib's btree, contains
is 25% faster, and iterating is twice
as fast, at the cost of insertions being 30% slower.
If your use case of std::collections::BTreeSet
and BTreeMap
i read-heavy, or if you really need to index by
sorted-order position, it might be worth checking out indexset
instead.
Command: cargo bench --bench concurrent --all-features
We benchmark concurrent operations with 40 threads, each conducting 100000 operations at the same time that vary from a ratio of 1% writes/reads to 50% writes/reads.
In this benchmark threads have high locality and tend to focus on specific parts of the data.
- 1% writes/99% reads:
indexset::concurrent::set::BTreeSet
: 170.5msscc::TreeIndex
: 128.7mscrossbeam_skiplist::SkipSet
: 161.3ms
- 10% writes/90% reads:
indexset::concurrent::set::BTreeSet
: 175.9msscc::TreeIndex
: 183.9mscrossbeam_skiplist::SkipSet
: 217.4ms
- 30% writes/70% reads:
indexset::concurrent::set::BTreeSet
: 190.9msscc::TreeIndex
: 313.2mscrossbeam_skiplist::SkipSet
: 261.8ms
- 50% writes/50% reads:
indexset::concurrent::set::BTreeSet
: 220.75msscc::TreeIndex
: 561.70mscrossbeam_skiplist::SkipSet
: 334.40ms
BTreeMap
is less polished thanBTreeSet
. This crate has been optimised for a leanerBTreeSet
.Concurrent
BTreeMap
andBtreeSet
do not supportserde
serialization and deserialization nor are they order-statistic trees.
This library is called indexset
, because the base data structure is BTreeSet
. BTreeMap
is a BTreeSet
with
a Pair<K, V>
item type.
See CHANGELOG.md.