Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlendedTermQuery should ignore fields that don't exist in the index #41125

Merged
merged 2 commits into from
Apr 16, 2019

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Apr 11, 2019

Today the blended term query detects if a term exists in a field by looking at the term statistics in the index. However the value to indicate that a term has no occurence in a field have changed in Lucene.
A non-existing term now returns a doc and total term frequency of 0.
Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the requested fields.

Closes #41118

Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.

Closes elastic#41118
@jimczi jimczi added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.2.0 v7.0.1 labels Apr 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jimczi jimczi changed the title BlendedTermQuery should ignore fields that don't exists in the index BlendedTermQuery should ignore fields that don't exist in the index Apr 11, 2019
@jimczi jimczi requested a review from romseygeek April 11, 2019 17:05
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The lucene version of this query doesn't use the same calculations, so I don't think we need to worry about porting the fix there.

@jimczi jimczi merged commit 8ee9182 into elastic:master Apr 16, 2019
@jimczi jimczi deleted the bug/cross_fields_no_term branch April 16, 2019 11:02
jimczi added a commit that referenced this pull request Apr 16, 2019
…41125)

Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.

Closes #41118
jimczi added a commit that referenced this pull request Apr 16, 2019
…41125)

Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.

Closes #41118
jimczi added a commit to jimczi/elasticsearch that referenced this pull request May 8, 2019
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing elastic#41125.

Closes elastic#41934
jimczi added a commit that referenced this pull request May 9, 2019
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing #41125.

Closes #41934
jimczi added a commit that referenced this pull request May 9, 2019
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing #41125.

Closes #41934
jimczi added a commit that referenced this pull request May 9, 2019
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing #41125.

Closes #41934
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
…lastic#41125)

Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.

Closes elastic#41118
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing elastic#41125.

Closes elastic#41934
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v7.0.1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NullPointerException when performing multi_match, cross_fields search (es v7.0.0)
4 participants