Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix precedence for unknown operation and boost #89

Conversation

JSCU-CNI
Copy link
Contributor

@JSCU-CNI JSCU-CNI commented Jan 30, 2023

The precedence for the unknown (implied) operation is always lower than for AND and OR operations, as evidenced by how the current Apache Lucene library handles this. A simple TreeVisitor on a PrecedenceQueryParser shows the following:

Default operator: OR
Entered query:    1 2 AND 3 4
Parsed query:     1 (+2 +3) 4
--------
BooleanQuery(MUST): 1 (+2 +3) 4
  BooleanQuery(SHOULD): 1 (+2 +3) 4
    TermQuery: 4
    TermQuery: 1
    BooleanQuery(MUST): +2 +3
      TermQuery: 2
      TermQuery: 3

Default operator: OR
Entered query:    1 2 OR 3 4
Parsed query:     1 (2 3) 4
--------
BooleanQuery(MUST): 1 (2 3) 4
  BooleanQuery(SHOULD): 1 (2 3) 4
    TermQuery: 4
    TermQuery: 1
    BooleanQuery(MUST): 2 3
      BooleanQuery(SHOULD): 2 3
        TermQuery: 3
        TermQuery: 2

Default operator: OR
Entered query:    1 2 OR 3 4 AND 5 OR 6 7 8
Parsed query:     1 (2 3) ((+4 +5) 6) 7 8
--------
BooleanQuery(MUST): 1 (2 3) ((+4 +5) 6) 7 8
  BooleanQuery(SHOULD): 1 (2 3) ((+4 +5) 6) 7 8
    TermQuery: 8
    TermQuery: 1
    TermQuery: 7
    BooleanQuery(MUST): 2 3
      BooleanQuery(SHOULD): 2 3
        TermQuery: 3
        TermQuery: 2
    BooleanQuery(MUST): (+4 +5) 6
      BooleanQuery(SHOULD): (+4 +5) 6
        TermQuery: 6
        BooleanQuery(MUST): +4 +5
          TermQuery: 4
          TermQuery: 5

The results are similar when the default operator is changed to AND.

This commit adds a fictitious token for the invisible implied operation, giving it the lowest precedence. This should ensure that a b AND c d correctly translates to a (b AND c) d, contrasting the current (a (b AND (c d)).

To make this work, the list of precedence items had to shrink quite a bit, as there were several tokens in there that do not require disambiguation, and did cause issues with the fact that there's no lookahead token for the implied operation.

In the cleanup of this list, we also noticed that precedence of the boost operator was configured incorrectly, so that was fixed as well. This ensures that +2^1 becomes +(2^1).

@JSCU-CNI JSCU-CNI force-pushed the improvement/precedence-unknown-operation branch from 3fc083a to 0fc17ad Compare January 31, 2023 07:58
The precedence for the unknown (implied) operation is always lower than
for AND and OR operations, as evidenced by how the current Apache Lucene
library handles this.

This commit adds a fictitious token for the invisible implied operation,
giving it the lowest precedence. This should ensure that `a b AND c d`
correctly translates to `a (b AND c) d`, contrasting the current
`(a (b AND (c d))`.

To make this work, the list of precedence items had to shrink quite a
bit, as there were several tokens in there that do not require
disambiguation, and did cause issues with the fact that there's no
lookahead token for the implied operation.

In the cleanup of this list, we also noticed that precedence of the
boost operator was configured incorrectly, so that was fixed as well.
This ensures that +2^1 becomes +(2^1).
@JSCU-CNI JSCU-CNI force-pushed the improvement/precedence-unknown-operation branch from 0fc17ad to e5b9581 Compare January 31, 2023 08:02
Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a perfect PR 👏

Thanks @JSCU-CNI

I imagine this was creating a subtle bug somewhere in your software, hope it was not too hard to hunt down !

@alexgarel alexgarel merged commit 01092be into jurismarches:master Feb 7, 2023
@alexgarel
Copy link
Member

@mmoriniere hum, sorry I merged a bit rapidly, I might have asked you first.

But I think it's ok as all tests passes.

Maybe it could be worth a minor release.

@mmoriniere
Copy link
Contributor

@alexgarel It's OK, and I made a minor release.

https://pypi.org/project/luqum/0.12.1/

@JSCU-CNI JSCU-CNI deleted the improvement/precedence-unknown-operation branch February 17, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants