Skip to content

Commit 4b77d04

Browse files
author
Christoph Büscher
committed
Remove nGram and edgeNGram token filter names (#39070)
In #30209 we deprecated the camel case `nGram` filter name in favour of `ngram` and did the same for `edgeNGram` and `edge_ngram` and we are removing those names in 8.0. This change disallows using the deprecated names for new indices created in 7.0 by throwing an error if these filters are used. Relates to #38911
1 parent 08ad740 commit 4b77d04

File tree

8 files changed

+55
-49
lines changed

8 files changed

+55
-49
lines changed

docs/reference/analysis/tokenfilters/edgengram-tokenfilter.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
[[analysis-edgengram-tokenfilter]]
22
=== Edge NGram Token Filter
33

4-
A token filter of type `edgeNGram`.
4+
A token filter of type `edge_ngram`.
55

6-
The following are settings that can be set for a `edgeNGram` token
6+
The following are settings that can be set for a `edge_ngram` token
77
filter type:
88

99
[cols="<,<",options="header",]

docs/reference/analysis/tokenfilters/ngram-tokenfilter.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
[[analysis-ngram-tokenfilter]]
22
=== NGram Token Filter
33

4-
A token filter of type `nGram`.
4+
A token filter of type `ngram`.
55

6-
The following are settings that can be set for a `nGram` token filter
6+
The following are settings that can be set for a `ngram` token filter
77
type:
88

99
[cols="<,<",options="header",]

docs/reference/migration/migrate_7_0/analysis.asciidoc

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,13 @@ The `standard` token filter has been removed because it doesn't change anything
3838
The `standard_html_strip` analyzer has been deprecated, and should be replaced
3939
with a combination of the `standard` tokenizer and `html_strip` char_filter.
4040
Indexes created using this analyzer will still be readable in elasticsearch 7.0,
41-
but it will not be possible to create new indexes using it.
41+
but it will not be possible to create new indexes using it.
42+
43+
[float]
44+
==== The deprecated `nGram` and `edgeNGram` token filter cannot be used on new indices
45+
46+
The `nGram` and `edgeNGram` token filter names have been deprecated in an earlier 6.x version.
47+
Indexes created using these token filters will still be readable in elasticsearch 7.0 but indexing
48+
documents using those filter names will issue a deprecation warning. Using the deprecated names on
49+
new indices starting with version 7.0.0 will be prohibited and throw an error when indexing
50+
or analyzing documents. Both names should be replaces by `ngram` or `edge_ngram` respectively.

modules/analysis-common/src/main/java/org/elasticsearch/analysis/common/CommonAnalysisPlugin.java

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -415,7 +415,11 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
415415
filters.add(PreConfiguredTokenFilter.singleton("edge_ngram", false, input ->
416416
new EdgeNGramTokenFilter(input, 1)));
417417
filters.add(PreConfiguredTokenFilter.singletonWithVersion("edgeNGram", false, (reader, version) -> {
418-
if (version.onOrAfter(org.elasticsearch.Version.V_6_4_0)) {
418+
if (version.onOrAfter(org.elasticsearch.Version.V_7_0_0)) {
419+
throw new IllegalArgumentException(
420+
"The [edgeNGram] token filter name was deprecated in 6.4 and cannot be used in new indices. "
421+
+ "Please change the filter name to [edge_ngram] instead.");
422+
} else {
419423
deprecationLogger.deprecatedAndMaybeLog("edgeNGram_deprecation",
420424
"The [edgeNGram] token filter name is deprecated and will be removed in a future version. "
421425
+ "Please change the filter name to [edge_ngram] instead.");
@@ -439,7 +443,10 @@ public List<PreConfiguredTokenFilter> getPreConfiguredTokenFilters() {
439443
LimitTokenCountFilterFactory.DEFAULT_CONSUME_ALL_TOKENS)));
440444
filters.add(PreConfiguredTokenFilter.singleton("ngram", false, reader -> new NGramTokenFilter(reader, 1, 2, false)));
441445
filters.add(PreConfiguredTokenFilter.singletonWithVersion("nGram", false, (reader, version) -> {
442-
if (version.onOrAfter(org.elasticsearch.Version.V_6_4_0)) {
446+
if (version.onOrAfter(org.elasticsearch.Version.V_7_0_0)) {
447+
throw new IllegalArgumentException("The [nGram] token filter name was deprecated in 6.4 and cannot be used in new indices. "
448+
+ "Please change the filter name to [ngram] instead.");
449+
} else {
443450
deprecationLogger.deprecatedAndMaybeLog("nGram_deprecation",
444451
"The [nGram] token filter name is deprecated and will be removed in a future version. "
445452
+ "Please change the filter name to [ngram] instead.");

modules/analysis-common/src/test/java/org/elasticsearch/analysis/common/CommonAnalysisPluginTests.java

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,12 @@
4141
public class CommonAnalysisPluginTests extends ESTestCase {
4242

4343
/**
44-
* Check that the deprecated name "nGram" issues a deprecation warning for indices created since 6.3.0
44+
* Check that the deprecated name "nGram" issues a deprecation warning for indices created since 6.0.0
4545
*/
4646
public void testNGramDeprecationWarning() throws IOException {
4747
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
48-
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_6_4_0, Version.CURRENT))
48+
.put(IndexMetaData.SETTING_VERSION_CREATED,
49+
VersionUtils.randomVersionBetween(random(), Version.V_6_0_0, VersionUtils.getPreviousVersion(Version.V_7_0_0)))
4950
.build();
5051

5152
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
@@ -62,12 +63,11 @@ public void testNGramDeprecationWarning() throws IOException {
6263
}
6364

6465
/**
65-
* Check that the deprecated name "nGram" does NOT issues a deprecation warning for indices created before 6.4.0
66+
* Check that the deprecated name "nGram" throws an error since 7.0.0
6667
*/
67-
public void testNGramNoDeprecationWarningPre6_4() throws IOException {
68+
public void testNGramDeprecationError() throws IOException {
6869
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
69-
.put(IndexMetaData.SETTING_VERSION_CREATED,
70-
VersionUtils.randomVersionBetween(random(), Version.V_6_0_0, Version.V_6_3_0))
70+
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_7_0_0, null))
7171
.build();
7272

7373
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
@@ -76,16 +76,21 @@ public void testNGramNoDeprecationWarningPre6_4() throws IOException {
7676
TokenFilterFactory tokenFilterFactory = tokenFilters.get("nGram");
7777
Tokenizer tokenizer = new MockTokenizer();
7878
tokenizer.setReader(new StringReader("foo bar"));
79-
assertNotNull(tokenFilterFactory.create(tokenizer));
79+
IllegalArgumentException ex = expectThrows(IllegalArgumentException.class, () -> tokenFilterFactory.create(tokenizer));
80+
assertEquals(
81+
"The [nGram] token filter name was deprecated in 6.4 and cannot be used in new indices. Please change the filter"
82+
+ " name to [ngram] instead.",
83+
ex.getMessage());
8084
}
8185
}
8286

8387
/**
84-
* Check that the deprecated name "edgeNGram" issues a deprecation warning for indices created since 6.3.0
88+
* Check that the deprecated name "edgeNGram" issues a deprecation warning for indices created since 6.0.0
8589
*/
8690
public void testEdgeNGramDeprecationWarning() throws IOException {
8791
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
88-
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_6_4_0, Version.CURRENT))
92+
.put(IndexMetaData.SETTING_VERSION_CREATED,
93+
VersionUtils.randomVersionBetween(random(), Version.V_6_4_0, VersionUtils.getPreviousVersion(Version.V_7_0_0)))
8994
.build();
9095

9196
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
@@ -102,12 +107,11 @@ public void testEdgeNGramDeprecationWarning() throws IOException {
102107
}
103108

104109
/**
105-
* Check that the deprecated name "edgeNGram" does NOT issues a deprecation warning for indices created before 6.4.0
110+
* Check that the deprecated name "edgeNGram" throws an error for indices created since 7.0.0
106111
*/
107-
public void testEdgeNGramNoDeprecationWarningPre6_4() throws IOException {
112+
public void testEdgeNGramDeprecationError() throws IOException {
108113
Settings settings = Settings.builder().put(Environment.PATH_HOME_SETTING.getKey(), createTempDir())
109-
.put(IndexMetaData.SETTING_VERSION_CREATED,
110-
VersionUtils.randomVersionBetween(random(), Version.V_6_0_0, Version.V_6_3_0))
114+
.put(IndexMetaData.SETTING_VERSION_CREATED, VersionUtils.randomVersionBetween(random(), Version.V_7_0_0, null))
111115
.build();
112116

113117
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
@@ -116,11 +120,14 @@ public void testEdgeNGramNoDeprecationWarningPre6_4() throws IOException {
116120
TokenFilterFactory tokenFilterFactory = tokenFilters.get("edgeNGram");
117121
Tokenizer tokenizer = new MockTokenizer();
118122
tokenizer.setReader(new StringReader("foo bar"));
119-
assertNotNull(tokenFilterFactory.create(tokenizer));
123+
IllegalArgumentException ex = expectThrows(IllegalArgumentException.class, () -> tokenFilterFactory.create(tokenizer));
124+
assertEquals(
125+
"The [edgeNGram] token filter name was deprecated in 6.4 and cannot be used in new indices. Please change the filter"
126+
+ " name to [edge_ngram] instead.",
127+
ex.getMessage());
120128
}
121129
}
122130

123-
124131
/**
125132
* Check that the deprecated analyzer name "standard_html_strip" throws exception for indices created since 7.0.0
126133
*/

modules/analysis-common/src/test/java/org/elasticsearch/analysis/common/HighlighterWithAnalyzersTests.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ public void testNgramHighlightingWithBrokenPositions() throws IOException {
8181
.put("analysis.tokenizer.autocomplete.max_gram", 20)
8282
.put("analysis.tokenizer.autocomplete.min_gram", 1)
8383
.put("analysis.tokenizer.autocomplete.token_chars", "letter,digit")
84-
.put("analysis.tokenizer.autocomplete.type", "nGram")
84+
.put("analysis.tokenizer.autocomplete.type", "ngram")
8585
.put("analysis.filter.wordDelimiter.type", "word_delimiter")
8686
.putList("analysis.filter.wordDelimiter.type_table",
8787
"& => ALPHANUM", "| => ALPHANUM", "! => ALPHANUM",

modules/analysis-common/src/test/resources/rest-api-spec/test/analysis-common/30_tokenizers.yml

Lines changed: 8 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -23,24 +23,7 @@
2323
- match: { detail.tokenizer.tokens.0.token: Foo Bar! }
2424

2525
---
26-
"nGram":
27-
- do:
28-
indices.analyze:
29-
body:
30-
text: good
31-
explain: true
32-
tokenizer:
33-
type: nGram
34-
min_gram: 2
35-
max_gram: 2
36-
- length: { detail.tokenizer.tokens: 3 }
37-
- match: { detail.tokenizer.name: _anonymous_tokenizer }
38-
- match: { detail.tokenizer.tokens.0.token: go }
39-
- match: { detail.tokenizer.tokens.1.token: oo }
40-
- match: { detail.tokenizer.tokens.2.token: od }
41-
42-
---
43-
"nGram_exception":
26+
"ngram_exception":
4427
- skip:
4528
version: " - 6.99.99"
4629
reason: only starting from version 7.x this throws an error
@@ -51,7 +34,7 @@
5134
text: good
5235
explain: true
5336
tokenizer:
54-
type: nGram
37+
type: ngram
5538
min_gram: 2
5639
max_gram: 4
5740
---
@@ -133,7 +116,7 @@
133116
text: "foobar"
134117
explain: true
135118
tokenizer:
136-
type: nGram
119+
type: ngram
137120
min_gram: 3
138121
max_gram: 3
139122
- length: { detail.tokenizer.tokens: 4 }
@@ -162,9 +145,9 @@
162145
body:
163146
text: "foo"
164147
explain: true
165-
tokenizer: nGram
148+
tokenizer: ngram
166149
- length: { detail.tokenizer.tokens: 5 }
167-
- match: { detail.tokenizer.name: nGram }
150+
- match: { detail.tokenizer.name: ngram }
168151
- match: { detail.tokenizer.tokens.0.token: f }
169152
- match: { detail.tokenizer.tokens.1.token: fo }
170153
- match: { detail.tokenizer.tokens.2.token: o }
@@ -194,7 +177,7 @@
194177
text: "foo"
195178
explain: true
196179
tokenizer:
197-
type: edgeNGram
180+
type: edge_ngram
198181
min_gram: 1
199182
max_gram: 3
200183
- length: { detail.tokenizer.tokens: 3 }
@@ -219,9 +202,9 @@
219202
body:
220203
text: "foo"
221204
explain: true
222-
tokenizer: edgeNGram
205+
tokenizer: edge_ngram
223206
- length: { detail.tokenizer.tokens: 2 }
224-
- match: { detail.tokenizer.name: edgeNGram }
207+
- match: { detail.tokenizer.name: edge_ngram }
225208
- match: { detail.tokenizer.tokens.0.token: f }
226209
- match: { detail.tokenizer.tokens.1.token: fo }
227210

modules/analysis-common/src/test/resources/rest-api-spec/test/indices.analyze/10_analyze.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@
7676
analysis:
7777
tokenizer:
7878
trigram:
79-
type: nGram
79+
type: ngram
8080
min_gram: 3
8181
max_gram: 3
8282
filter:

0 commit comments

Comments
 (0)