Skip to content

Commit

Permalink
Deprecate CommonTermsQuery and cutoff_frequency (elastic#42619)
Browse files Browse the repository at this point in the history
* Deprecate CommonTermsQuery and cutoff_frequency

Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.

Relates to elastic#27096


(cherry picked from commit 04b7449)
  • Loading branch information
matriv committed May 30, 2019
1 parent 86b1a07 commit b23a132
Show file tree
Hide file tree
Showing 18 changed files with 145 additions and 50 deletions.
7 changes: 7 additions & 0 deletions docs/reference/query-dsl/common-terms-query.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
[[query-dsl-common-terms-query]]
=== Common Terms Query

deprecated[7.3.0,"Use <<query-dsl-match-query>> instead, which skips blocks of documents efficiently, without any configuration, provided that the total number of hits is not tracked."]

The `common` terms query is a modern alternative to stopwords which
improves the precision and recall of search results (by taking stopwords
into account), without sacrificing performance.
Expand Down Expand Up @@ -83,6 +85,7 @@ GET /_search
}
--------------------------------------------------
// CONSOLE
// TEST[warning:Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]]

The number of terms which should match can be controlled with the
<<query-dsl-minimum-should-match,`minimum_should_match`>>
Expand All @@ -108,6 +111,7 @@ GET /_search
}
--------------------------------------------------
// CONSOLE
// TEST[warning:Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]]

which is roughly equivalent to:

Expand Down Expand Up @@ -154,6 +158,7 @@ GET /_search
}
--------------------------------------------------
// CONSOLE
// TEST[warning:Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]]

which is roughly equivalent to:

Expand Down Expand Up @@ -209,6 +214,7 @@ GET /_search
}
--------------------------------------------------
// CONSOLE
// TEST[warning:Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]]

which is roughly equivalent to:

Expand Down Expand Up @@ -270,6 +276,7 @@ GET /_search
}
--------------------------------------------------
// CONSOLE
// TEST[warning:Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]]

which is roughly equivalent to:

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/query-dsl/match-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ GET /_search
[[query-dsl-match-query-cutoff]]
===== Cutoff frequency

deprecated[7.3.0,"This option can be omitted as the <<query-dsl-match-query>> can skip block of documents efficiently, without any configuration, provided that the total number of hits is not tracked."]

The match query supports a `cutoff_frequency` that allows
specifying an absolute or relative document frequency where high
frequency terms are moved into an optional subquery and are only scored
Expand Down Expand Up @@ -139,6 +141,7 @@ GET /_search
}
--------------------------------------------------
// CONSOLE
// TEST[warning:Deprecated field [cutoff_frequency] used, replaced by [you can omit this option, the [match] query can skip block of documents efficiently if the total number of hits is not tracked]]

IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means
that when trying it out on test indexes with low document numbers you
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
---
"Test common terms query with stacked tokens":
- skip:
features: "warnings"

- do:
indices.create:
index: test
Expand Down Expand Up @@ -47,6 +50,8 @@
refresh: true

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -62,6 +67,8 @@
- match: { hits.hits.2._id: "3" }

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -76,6 +83,8 @@
- match: { hits.hits.1._id: "2" }

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -90,6 +99,8 @@
- match: { hits.hits.2._id: "3" }

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -103,6 +114,8 @@
- match: { hits.hits.0._id: "2" }

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -118,6 +131,8 @@
- match: { hits.hits.1._id: "1" }

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -132,6 +147,8 @@
- match: { hits.hits.0._id: "2" }

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -144,6 +161,8 @@
- match: { hits.hits.0._id: "2" }

- do:
warnings:
- 'Deprecated field [common] used, replaced by [[match] query which can efficiently skip blocks of documents if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -158,6 +177,8 @@
- match: { hits.hits.2._id: "3" }

- do:
warnings:
- 'Deprecated field [cutoff_frequency] used, replaced by [you can omit this option, the [match] query can skip block of documents efficiently if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -172,6 +193,8 @@
- match: { hits.hits.1._id: "2" }

- do:
warnings:
- 'Deprecated field [cutoff_frequency] used, replaced by [you can omit this option, the [match] query can skip block of documents efficiently if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -187,6 +210,8 @@
- match: { hits.hits.2._id: "3" }

- do:
warnings:
- 'Deprecated field [cutoff_frequency] used, replaced by [you can omit this option, the [match] query can skip block of documents efficiently if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand All @@ -201,6 +226,8 @@
- match: { hits.hits.1._id: "2" }

- do:
warnings:
- 'Deprecated field [cutoff_frequency] used, replaced by [you can omit this option, the [multi_match] query can skip block of documents efficiently if the total number of hits is not tracked]'
search:
rest_total_hits_as_int: true
body:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,11 @@ public int hashCode() {
return Objects.hash(classHash(), Arrays.hashCode(equalsTerms()));
}

/**
* @deprecated Since max_score optimization landed in 7.0, normal MultiMatchQuery
* will achieve the same result without any configuration.
*/
@Deprecated
public static BlendedTermQuery commonTermsBlendedQuery(Term[] terms, final float[] boosts, final float maxTermFrequency) {
return new BlendedTermQuery(terms, boosts) {
@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,11 @@
* Extended version of {@link CommonTermsQuery} that allows to pass in a
* {@code minimumNumberShouldMatch} specification that uses the actual num of high frequent terms
* to calculate the minimum matching terms.
*
* @deprecated Since max_optimization optimization landed in 7.0, normal MatchQuery
* will achieve the same result without any configuration.
*/
@Deprecated
public class ExtendedCommonTermsQuery extends CommonTermsQuery {

public ExtendedCommonTermsQuery(Occur highFreqOccur, Occur lowFreqOccur, float maxTermFrequency) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,16 @@
* and high-frequency terms are added to an optional boolean clause. The
* optional clause is only executed if the required "low-frequency' clause
* matches.
*
* @deprecated Since max_optimization optimization landed in 7.0, normal MatchQuery
* will achieve the same result without any configuration.
*/
@Deprecated
public class CommonTermsQueryBuilder extends AbstractQueryBuilder<CommonTermsQueryBuilder> {

public static final String COMMON_TERMS_QUERY_DEPRECATION_MSG = "[match] query which can efficiently " +
"skip blocks of documents if the total number of hits is not tracked";

public static final String NAME = "common";

public static final float DEFAULT_CUTOFF_FREQ = 0.01f;
Expand Down Expand Up @@ -87,7 +94,9 @@ public class CommonTermsQueryBuilder extends AbstractQueryBuilder<CommonTermsQue

/**
* Constructs a new common terms query.
* @deprecated See {@link CommonTermsQueryBuilder} for more details.
*/
@Deprecated
public CommonTermsQueryBuilder(String fieldName, Object text) {
if (Strings.isEmpty(fieldName)) {
throw new IllegalArgumentException("field name is null or empty");
Expand All @@ -101,7 +110,9 @@ public CommonTermsQueryBuilder(String fieldName, Object text) {

/**
* Read from a stream.
* @deprecated See {@link CommonTermsQueryBuilder} for more details.
*/
@Deprecated
public CommonTermsQueryBuilder(StreamInput in) throws IOException {
super(in);
fieldName = in.readString();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,18 @@
* result of the analysis.
*/
public class MatchQueryBuilder extends AbstractQueryBuilder<MatchQueryBuilder> {

private static final String CUTOFF_FREQUENCY_DEPRECATION_MSG = "you can omit this option, " +
"the [match] query can skip block of documents efficiently if the total number of hits is not tracked";

public static final ParseField ZERO_TERMS_QUERY_FIELD = new ParseField("zero_terms_query");
public static final ParseField CUTOFF_FREQUENCY_FIELD = new ParseField("cutoff_frequency");
/**
* @deprecated Since max_optimization optimization landed in 7.0, normal MatchQuery
* will achieve the same result without any configuration.
*/
@Deprecated
public static final ParseField CUTOFF_FREQUENCY_FIELD =
new ParseField("cutoff_frequency").withAllDeprecated(CUTOFF_FREQUENCY_DEPRECATION_MSG);
public static final ParseField LENIENT_FIELD = new ParseField("lenient");
public static final ParseField FUZZY_TRANSPOSITIONS_FIELD = new ParseField("fuzzy_transpositions");
public static final ParseField FUZZY_REWRITE_FIELD = new ParseField("fuzzy_rewrite");
Expand Down Expand Up @@ -252,7 +262,10 @@ public int maxExpansions() {
* Set a cutoff value in [0..1] (or absolute number &gt;=1) representing the
* maximum threshold of a terms document frequency to be considered a low
* frequency term.
*
* @deprecated see {@link MatchQueryBuilder#CUTOFF_FREQUENCY_FIELD} for more details
*/
@Deprecated
public MatchQueryBuilder cutoffFrequency(float cutoff) {
this.cutoffFrequency = cutoff;
return this;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@
* Same as {@link MatchQueryBuilder} but supports multiple fields.
*/
public class MultiMatchQueryBuilder extends AbstractQueryBuilder<MultiMatchQueryBuilder> {

private static final String CUTOFF_FREQUENCY_DEPRECATION_MSG = "you can omit this option, " +
"the [multi_match] query can skip block of documents efficiently if the total number of hits is not tracked";

public static final String NAME = "multi_match";

public static final MultiMatchQueryBuilder.Type DEFAULT_TYPE = MultiMatchQueryBuilder.Type.BEST_FIELDS;
Expand All @@ -64,7 +68,8 @@ public class MultiMatchQueryBuilder extends AbstractQueryBuilder<MultiMatchQuery
private static final ParseField SLOP_FIELD = new ParseField("slop");
private static final ParseField ZERO_TERMS_QUERY_FIELD = new ParseField("zero_terms_query");
private static final ParseField LENIENT_FIELD = new ParseField("lenient");
private static final ParseField CUTOFF_FREQUENCY_FIELD = new ParseField("cutoff_frequency");
private static final ParseField CUTOFF_FREQUENCY_FIELD =
new ParseField("cutoff_frequency").withAllDeprecated(CUTOFF_FREQUENCY_DEPRECATION_MSG);
private static final ParseField TIE_BREAKER_FIELD = new ParseField("tie_breaker");
private static final ParseField FUZZY_REWRITE_FIELD = new ParseField("fuzzy_rewrite");
private static final ParseField MINIMUM_SHOULD_MATCH_FIELD = new ParseField("minimum_should_match");
Expand Down Expand Up @@ -505,7 +510,11 @@ public boolean lenient() {
* Set a cutoff value in [0..1] (or absolute number &gt;=1) representing the
* maximum threshold of a terms document frequency to be considered a low
* frequency term.
*
* @deprecated Since max_score optimization landed in 7.0, normal MultiMatchQuery
* will achieve the same result without any configuration.
*/
@Deprecated
public MultiMatchQueryBuilder cutoffFrequency(float cutoff) {
this.cutoffFrequency = cutoff;
return this;
Expand All @@ -515,7 +524,11 @@ public MultiMatchQueryBuilder cutoffFrequency(float cutoff) {
* Set a cutoff value in [0..1] (or absolute number &gt;=1) representing the
* maximum threshold of a terms document frequency to be considered a low
* frequency term.
*
* @deprecated Since max_score optimization landed in 7.0, normal MultiMatchQuery
* will achieve the same result without any configuration.
*/
@Deprecated
public MultiMatchQueryBuilder cutoffFrequency(Float cutoff) {
this.cutoffFrequency = cutoff;
return this;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,10 @@ public static MatchQueryBuilder matchQuery(String name, Object text) {
*
* @param fieldName The field name.
* @param text The query text (to be analyzed).
*
* @deprecated See {@link CommonTermsQueryBuilder}
*/
@Deprecated
public static CommonTermsQueryBuilder commonTermsQuery(String fieldName, Object text) {
return new CommonTermsQueryBuilder(fieldName, text);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,10 @@ public void setOccur(BooleanClause.Occur occur) {
this.occur = occur;
}

/**
* @deprecated See {@link MatchQueryBuilder#setCommonTermsCutoff(Float)} for more details
*/
@Deprecated
public void setCommonTermsCutoff(Float cutoff) {
this.commonTermsCutoff = cutoff;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import org.apache.lucene.search.BooleanQuery;
import org.elasticsearch.common.NamedRegistry;
import org.elasticsearch.common.ParseField;
import org.elasticsearch.common.geo.GeoShapeType;
import org.elasticsearch.common.geo.ShapesAvailability;
import org.elasticsearch.common.io.stream.NamedWriteableRegistry;
Expand All @@ -32,7 +33,6 @@
import org.elasticsearch.common.xcontent.ParseFieldRegistry;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.MatchBoolPrefixQueryBuilder;
import org.elasticsearch.index.query.BoostingQueryBuilder;
import org.elasticsearch.index.query.CommonTermsQueryBuilder;
import org.elasticsearch.index.query.ConstantScoreQueryBuilder;
Expand All @@ -49,6 +49,7 @@
import org.elasticsearch.index.query.IntervalQueryBuilder;
import org.elasticsearch.index.query.IntervalsSourceProvider;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.MatchBoolPrefixQueryBuilder;
import org.elasticsearch.index.query.MatchNoneQueryBuilder;
import org.elasticsearch.index.query.MatchPhrasePrefixQueryBuilder;
import org.elasticsearch.index.query.MatchPhraseQueryBuilder;
Expand Down Expand Up @@ -279,6 +280,7 @@

import static java.util.Collections.unmodifiableMap;
import static java.util.Objects.requireNonNull;
import static org.elasticsearch.index.query.CommonTermsQueryBuilder.COMMON_TERMS_QUERY_DEPRECATION_MSG;
import static org.elasticsearch.index.query.SpanNearQueryBuilder.SpanGapQueryBuilder;

/**
Expand Down Expand Up @@ -807,7 +809,8 @@ private void registerQueryParsers(List<SearchPlugin> plugins) {
registerQuery(new QuerySpec<>(MoreLikeThisQueryBuilder.NAME, MoreLikeThisQueryBuilder::new,
MoreLikeThisQueryBuilder::fromXContent));
registerQuery(new QuerySpec<>(WrapperQueryBuilder.NAME, WrapperQueryBuilder::new, WrapperQueryBuilder::fromXContent));
registerQuery(new QuerySpec<>(CommonTermsQueryBuilder.NAME, CommonTermsQueryBuilder::new, CommonTermsQueryBuilder::fromXContent));
registerQuery(new QuerySpec<>(new ParseField(CommonTermsQueryBuilder.NAME).withAllDeprecated(COMMON_TERMS_QUERY_DEPRECATION_MSG),
CommonTermsQueryBuilder::new, CommonTermsQueryBuilder::fromXContent));
registerQuery(
new QuerySpec<>(SpanMultiTermQueryBuilder.NAME, SpanMultiTermQueryBuilder::new, SpanMultiTermQueryBuilder::fromXContent));
registerQuery(new QuerySpec<>(FunctionScoreQueryBuilder.NAME, FunctionScoreQueryBuilder::new,
Expand Down
Loading

0 comments on commit b23a132

Please sign in to comment.