Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boolean search with NOT operator broken #246

Open
ViliusS opened this issue Apr 11, 2021 · 5 comments
Open

boolean search with NOT operator broken #246

ViliusS opened this issue Apr 11, 2021 · 5 comments

Comments

@ViliusS
Copy link
Contributor

ViliusS commented Apr 11, 2021

Boolean search with at least NOT operator is completely broken for a very long time now. All boolean search unit tests which are using NOT operator are also failing.

I have pinpointed that the issue is with this old commit 9773bd9 which I suppose assumes that boolean search always uses OR operator. Not sure what this commit fixes, but IMHO this is wrong. NOT should be combined with AND.

On top of that I also have found that search fails with PHP error with two NOT conditions.

Unfortunately Expression conversion code and Boolean search algorithm itself looks too complicated for me so I cannot fix it myself.

@ViliusS ViliusS changed the title boolean search broken boolean search with NOT operator broken Apr 11, 2021
@nticaric
Copy link
Contributor

nticaric commented Apr 12, 2021

Ok, I will take a look into it and try to fix this.

I guess the best approach would be to throw an exception if the expression cannot be converted to RPN(Reverse Polish Notation), this way we wouldn't have to deal with missing parentheses etc. Also, putting this RPN logic outside would be a good idea.

Another thing would be escaping the query, so we don't have to use tricks with spaces " -" for the NOT operator.

@ViliusS
Copy link
Contributor Author

ViliusS commented Apr 12, 2021

Thank you for the message. I never knew that such thing as RPN existed. It's a little bit clearer to me what Expression class does and why.

Regarding spaces, maybe it is best to first break query into tokens and only then apply boolean search? As far as I understand tokenizer is only applied for basic searches currently.

@nticaric
Copy link
Contributor

If the search query is tokenized first, then we lose the boolean operators, parentheses etc. So the parsing and query construction needs to be done right away

@ViliusS
Copy link
Contributor Author

ViliusS commented Apr 12, 2021

Looking at the code even further I think another issue is that NOT operator is not combined with any other result here

if ($token == '~') {
It just returns results without combining them with anything in OR or AND stacks.

@ViliusS
Copy link
Contributor Author

ViliusS commented Apr 12, 2021

If the search query is tokenized first, then we lose the boolean operators, parentheses etc. So the parsing and query construction needs to be done right away

Ah yes, this indeed needs more thought. At least default tokenizer should not remove dashes anymore. "or" is normal word. We could safely assume a space is always AND in terms of boolean search. This leaves only parentheses...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants