Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/70 preview in fuzzy search #74

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

georgbuechner
Copy link
Owner

No description provided.

- get_preview now also returns the found match and an empty string if no match was found
this way, zathura can search for the matched string or not at all (in case of no match found)
- Adds tests for mystifiziert, but preview-fuzzy-search does not find Mystifizierung (dist=4) and
mystifizierende (dist=5)
index/src/index.rs Outdated Show resolved Hide resolved
.split_whitespace()
.map(|s| s.to_string())
.collect()
fn split_text_into_words(body: &str) -> Result<PageIndex> {
Copy link
Collaborator

@SimonThormeyer SimonThormeyer Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function creates and returns a PageIndex, this should probably be reflected in its name. I would also prefer another name for the argument, as it would work on any kind of text, not just the body of a pdf page.

fn get_fuzzy_match(
&self,
term: &str,
distance: &u8,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get it correctly, this is the maximum allowed distance for a fuzzy match, right? If so, let's rename the argument to make the code more comprehensible.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stick with distance, since it is the standard term related to levenstheindistance, what do you think?

} else {
let mut cur: (String, u32, u32) = ("".to_string(), 0, 0);
let mut min_dist: usize = usize::MAX;
for (word, matches) in pindex {
let dist: usize = levenshtein(term, word);
let dist: usize = levenshtein(term, &word);
let dist = if word.contains(term) { 1 } else { dist };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we overwrite the value in dist with 1 in this case?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that f.e. "Soledad" should be found when searching for "Sole". However, if you have a good reason for not doing so I'd also be fine.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did however change the syntax to a onliner

georgbuechner and others added 5 commits October 6, 2024 01:39
… extension on mutable path

Co-authored-by: SimonThormeyer <49559340+SimonThormeyer@users.noreply.github.com>
…existing extension on mutable path"

This reverts commit 45b667a.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants