This Rust crate contains functions for fuzzy string matching.
It exports two functions. The similarity
function returns the similarity of
two strings, and the find_words_iter
function returns an iterator of matches
for a smaller string (needle
) in a larger string (haystack
).
The similarity of strings is computed based on their trigrams, meaning their 3-character substrings: https://en.wikipedia.org/wiki/Trigram.
Here is how to run the examples:
$ cargo run --example similarity color colour
...
0.44444445
$ cargo run --example find_words_iter
bufalo
buffalow
Bungalo
biffalo
buffaloo
huffalo
snuffalo
fluffalo
Add this to your Cargo.toml
:
[dependencies]
trigram = "0.2.2"
and call it like this:
use trigram::similarity;
fn main() {
println!("{}", similarity(&"rustacean", &"crustacean"));
}
The similarity
function in this crate is a reverse-engineered approximation
of the similarity
function in the Postgresql pg_trgm extension:
https://www.postgresql.org/docs/9.1/pgtrgm.html. It gives exactly the same
answers in many cases, but may disagree in others (none known). If you find a
case where the answers don't match, please file an issue about it!
A good introduction to the Postgres version of this is given on Stack Overflow: https://stackoverflow.com/a/43161051/484529.