This crate provides a fast implementation of ordered sets and maps using finite state machines. In particular, it makes use of finite state transducers to map keys to values as the machine is executed. Using finite state machines as data structures enables us to store keys in a compact format that is also easily searchable. For example, this crate leverages memory maps to make range queries very fast.
Check out my blog post Index 1,600,000,000 Keys with Automata and Rust for extensive background, examples and experiments.
Dual-licensed under MIT or the UNLICENSE.
The
regex-automata
crate provides implementations of the fst::Automata
trait when its
transducer
feature is enabled. This permits using DFAs compiled by
regex-automata
to search finite state transducers produced by this crate.
Simply add a corresponding entry to your Cargo.toml
dependency list:
[dependencies]
fst = "0.4"
This example demonstrates building a set in memory and executing a fuzzy query
against it. You'll need fst = "0.4"
with the levenshtein
feature enabled in
your Cargo.toml
.
use fst::{IntoStreamer, Set};
use fst::automaton::Levenshtein;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// A convenient way to create sets in memory.
let keys = vec!["fa", "fo", "fob", "focus", "foo", "food", "foul"];
let set = Set::from_iter(keys)?;
// Build our fuzzy query.
let lev = Levenshtein::new("foo", 1)?;
// Apply our fuzzy query to the set we built.
let stream = set.search(lev).into_stream();
let keys = stream.into_strs()?;
assert_eq!(keys, vec!["fo", "fob", "foo", "food"]);
Ok(())
}
Check out the documentation for a lot more examples!
levenshtein
- Disabled by default. This adds theLevenshtein
automaton to theautomaton
sub-module. This includes an additional dependency onutf8-ranges
.