-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix not escaping special characters in search pattern #5938
Conversation
fixes JabRef#5892 * add method to get search pattern for searched words with escaped javascript regexp special characters (for search without regular expressions) * in preview viewer use search pattern with escaped javascript regexp special characters
/* Returns a regular expression pattern in the form (w1)|(w2)| ... wi are escaped if no regular expression search is enabled | ||
* @param escapeSpecialCharsForJS whether to escape characters in wi for javascript regexp (escaping all special characters) or for java (using \Q and \E) | ||
*/ | ||
private Optional<Pattern> joinWordsToPattern(boolean escapeSpecialCharsForJS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better is to use an enum here, eg.. EscapMode, with java and javascript as values:
https://www.teamten.com/lawrence/programming/prefer-enums-over-booleans.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! In general the fix looks good to me, just some minor code improvements!
@@ -18,6 +18,9 @@ | |||
|
|||
public class SearchQuery implements SearchMatcher { | |||
|
|||
// regexp pattern for escaping special characters in javascript regex | |||
public static final String JAVASCRIPT_ESCAPED_CHARS_PATTERN = "[\\.\\*\\+\\?\\^\\$\\{\\}\\(\\)\\|\\[\\]\\\\/]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Pattern.compile to gain some performace improvements here
@@ -133,7 +148,7 @@ public boolean isRegularExpression() { | |||
// compile the words to a regular expression in the form (w1)|(w2)|(w3) | |||
StringJoiner joiner = new StringJoiner(")|(", "(", ")"); | |||
for (String word : words) { | |||
joiner.add(regularExpression ? word : Pattern.quote(word)); | |||
joiner.add(regularExpression ? word : (escapeSpecialCharsForJS ? word.replaceAll(JAVASCRIPT_ESCAPED_CHARS_PATTERN, "\\\\$0") : Pattern.quote(word))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract this to a regular if-else, it's easier to understand on the first look than this chained conditions
* use enum to specify special characters escape mode * use compiled regex pattern instead of string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good to me!
For external contributions we have the policy, that a second dev looks over your code before it's merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got two comments, everything else looks good to me.
A test would be great...
if (regularExpression) { | ||
joiner.add(word); | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't checkstyle complaining about putting this in two lines? Has no effect, but looks somewhat odd...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a mistake, I'll change it. Also looks odd to me. Checkstyle wasn't complaining.
} | ||
|
||
// Returns a regular expression pattern in the form (w1)|(w2)| ... wi are escaped for javascript if no regular expression search is enabled | ||
public Optional<Pattern> getJsPatternForWords() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, although "Js" is quite obvious, I would reword that to getJavaScriptPatternForWords, since you are creating a new Method just for the sole purpose to make its use more obvious in it's name. So why not go all the way?
public static final Pattern JAVASCRIPT_ESCAPED_CHARS_PATTERN = Pattern.compile("[\\.\\*\\+\\?\\^\\$\\{\\}\\(\\)\\|\\[\\]\\\\/]"); | ||
|
||
/** | ||
* Metod for escaping special characters in regular expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method
Ok, I will write a test. |
//first word contain all javascript special regex characters that should be escaped individually in text based search | ||
String queryText = "([{\\\\^$|]})?*+./ word1 word2."; | ||
SearchQuery textQueryWithSpecialChars = new SearchQuery(queryText, false, false); | ||
String pattern = "(\\(\\[\\{\\\\\\^\\$\\|\\]\\}\\)\\?\\*\\+\\.\\/)|(word1)|(word2\\.)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created tests but I'm not sure about readbility of this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this looks totally crazy with those escaping, but it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for creating the tests as well!
I think this is really great as well! Thank you! |
* upstream/master: followup fix Fixfetcher (#5948) Bump byte-buddy-parent from 1.10.7 to 1.10.8 (#5952) Added MenuButtons to IntegrityCheckDialog (#5955) Reimplement custom entry types dialog (#5799) Bump unirest-java from 3.4.03 to 3.5.00 (#5953) MySQL: Allow public key retrieval (#5909) Restructure and improve docs for setting up IntelliJ (#5960) Change syntax for Oracle multi-row insert SQL statement (#5837) Bump classgraph from 4.8.62 to 4.8.64 (#5954) Squashed 'src/main/resources/csl-styles/' changes from c531528..9e81857 Fix not escaping special characters in search pattern (#5938)
fixes #5892
The error when searching for "DOI 10.1210/endrev/bnz006" (or any phrases containing special javascript regular expression characters) is caused by not escaping special characters before passing search pattern to javascript script used for highlighting words in preview view.