-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
boolean search with NOT operator broken #246
Comments
Ok, I will take a look into it and try to fix this. I guess the best approach would be to throw an exception if the expression cannot be converted to RPN(Reverse Polish Notation), this way we wouldn't have to deal with missing parentheses etc. Also, putting this RPN logic outside would be a good idea. Another thing would be escaping the query, so we don't have to use tricks with spaces " -" for the NOT operator. |
Thank you for the message. I never knew that such thing as RPN existed. It's a little bit clearer to me what Expression class does and why. Regarding spaces, maybe it is best to first break query into tokens and only then apply boolean search? As far as I understand tokenizer is only applied for basic searches currently. |
If the search query is tokenized first, then we lose the boolean operators, parentheses etc. So the parsing and query construction needs to be done right away |
Looking at the code even further I think another issue is that NOT operator is not combined with any other result here Line 202 in 028c0ad
|
Ah yes, this indeed needs more thought. At least default tokenizer should not remove dashes anymore. "or" is normal word. We could safely assume a space is always AND in terms of boolean search. This leaves only parentheses... |
Boolean search with at least NOT operator is completely broken for a very long time now. All boolean search unit tests which are using NOT operator are also failing.
I have pinpointed that the issue is with this old commit 9773bd9 which I suppose assumes that boolean search always uses OR operator. Not sure what this commit fixes, but IMHO this is wrong. NOT should be combined with AND.
On top of that I also have found that search fails with PHP error with two NOT conditions.
Unfortunately Expression conversion code and Boolean search algorithm itself looks too complicated for me so I cannot fix it myself.
The text was updated successfully, but these errors were encountered: