-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/70 preview in fuzzy search #74
base: main
Are you sure you want to change the base?
Conversation
…chner/litt into feat/70-preview-in-fuzzy-search
- get_preview now also returns the found match and an empty string if no match was found this way, zathura can search for the matched string or not at all (in case of no match found) - Adds tests for mystifiziert, but preview-fuzzy-search does not find Mystifizierung (dist=4) and mystifizierende (dist=5)
index/src/index.rs
Outdated
.split_whitespace() | ||
.map(|s| s.to_string()) | ||
.collect() | ||
fn split_text_into_words(body: &str) -> Result<PageIndex> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function creates and returns a PageIndex
, this should probably be reflected in its name. I would also prefer another name for the argument, as it would work on any kind of text, not just the body of a pdf page.
fn get_fuzzy_match( | ||
&self, | ||
term: &str, | ||
distance: &u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get it correctly, this is the maximum allowed distance for a fuzzy match, right? If so, let's rename the argument to make the code more comprehensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would stick with distance, since it is the standard term related to levenstheindistance, what do you think?
search/src/search.rs
Outdated
} else { | ||
let mut cur: (String, u32, u32) = ("".to_string(), 0, 0); | ||
let mut min_dist: usize = usize::MAX; | ||
for (word, matches) in pindex { | ||
let dist: usize = levenshtein(term, word); | ||
let dist: usize = levenshtein(term, &word); | ||
let dist = if word.contains(term) { 1 } else { dist }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we overwrite the value in dist
with 1
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that f.e. "Soledad" should be found when searching for "Sole". However, if you have a good reason for not doing so I'd also be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did however change the syntax to a onliner
… extension on mutable path Co-authored-by: SimonThormeyer <49559340+SimonThormeyer@users.noreply.github.com>
…existing extension on mutable path" This reverts commit 45b667a.
No description provided.