-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Lucene as search syntax #8206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick and the static checker found an unused import.
I think overall it would make sense to rework the search mechanism to use a Lucene index to improve speed. A few thoughts on that:
|
I sense a damn f**** cool possible new (or renewed) feature for JabRef here. Sort by lucene score and/or highlight/opaque entries by score - like floating results in ancient swing times. |
I agree. If done well, this could be a very neat feature and the speed-up would be icing on the cake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also in favor of completely removing our existing search infrastructure and migrate completely to Lucene.
Not sure whats more efficient: first only except Lucene syntax in the search query, and in a second PR throw out the old stuff; or rewrite everything in one go.
src/main/java/org/jabref/model/search/rules/ContainsBasedSearchRule.java
Outdated
Show resolved
Hide resolved
Current blocker - Lucene does not support
How to proceed?
|
I would completely drop our search syntax and only use lucenes one. If users really want an equals instead of contains search, then we can add this later (for example by adding a field alias with no tokenizer). |
Note to self: Read Lucene documentation thoroughly --> http://www.lucenetutorial.com/lucene-query-syntax.html What is a match?
It is OK for me to implement like that. The "only" concern I have is that it does not match the UX one has from WWW search enginges (Google, Bing, ...) |
I think the most efficient way would be to do a fuzzy search anyhow and then do the ... After writing this I did a short google search and it appears like exact matches are possible in lucene using quotes. https://stackoverflow.com/questions/37495639/how-to-match-exact-text-in-lucene-search |
Minor comment: I rebased this PR on main, because we merged #8636 |
# Conflicts: # src/main/java/org/jabref/logic/search/rules/ContainsBasedSearchRule.java # src/main/java/org/jabref/logic/search/rules/GrammarBasedSearchRule.java # src/main/java/org/jabref/logic/search/rules/RegexBasedSearchRule.java
public void findsCaseInSensitive(String query) { | ||
assertTrue(luceneBasedSearchRuleCaseInsensitive.validateSearchStrings(query)); | ||
assertTrue(luceneBasedSearchRuleCaseInsensitive.applyRule(query, bibEntry)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [reviewdog] <com.puppycrawl.tools.checkstyle.checks.regexp.RegexpMultilineCheck> reported by reviewdog 🐶
Blank line at end of block should be removed
void simpleFieldedLuceneQueryReturnsLuceneBasedSearchRule() { | ||
SearchRule searchRule = SearchRules.getSearchRuleByQuery("title:test", EnumSet.noneOf(SearchRules.SearchFlags.class)); | ||
assertInstanceOf(LuceneBasedSearchRule.class, searchRule); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 [reviewdog] <com.puppycrawl.tools.checkstyle.checks.regexp.RegexpMultilineCheck> reported by reviewdog 🐶
Blank line at end of block should be removed
I would object - based on https://www.lucenetutorial.com/lucene-query-syntax.html I would follow the suggestion of #8206 (comment) and implement the interpretation of the Lucene syntax "more relaxed". |
Question: getting rid of current search syntax will break groups that are configured to search based on
right? Is there a way the syntax could be backwards compatible? |
I never used that feature so I had to look it up first.
|
Superseeded by #8963 This PR can be used to check for similarities and differences by using the Lucene engine or (our) custom search. |
Fixes #1975
As far as I tested, the search syntax of groups can be reused (
=
signs also work at Lucene).The search using regular expressions is different in Lucene. One has to use
/.../
to indicate regular expressions. Thus, groups can now also use regular expressions.Left TODOs:
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)