-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow TokenFilterFactories to rewrite themselves against their preceding chain #33702
Conversation
Pinging @elastic/es-search-aggs |
This would also allow us to move the Synonym filters back into the common analysis module, as the special-casing is no longer required. Another follow-up could be to add a |
I like the idea in general. Maybe we should have eg.
Agreed that MultiTermComponent is a bit hacky + not type safe which is a pity, we need to fix it. Hopefully we can do it in Lucene at the same time so that the way that this problem is handled on both sides remains similar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, this is a great change for the ootb user-experience.
@@ -62,6 +62,11 @@ public MultiplexerTokenFilterFactory(IndexSettings indexSettings, Environment en | |||
this.preserveOriginal = settings.getAsBoolean("preserve_original", true); | |||
} | |||
|
|||
@Override | |||
public TokenFilterFactory getSynonymFilter() { | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be null or a dummy factory that doesn't wrap the token stream? I'm afraid that we might be exposing ourselves to null-pointer exceptions?
server/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java
Show resolved
Hide resolved
public TokenFilterFactory getChainAwareTokenFilterFactory(TokenizerFactory tokenizer, List<CharFilterFactory> charFilters, | ||
List<TokenFilterFactory> previousTokenFilters, | ||
Function<String, TokenFilterFactory> allFilters) { | ||
if (filters == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we worry about concurrency? Caching makes me a little uncomfortable due to the fact that it implies we expect that this will always be called with the same arguments, should we change the API or try to detect misusage?
test this please |
I refactored things a bit to remove the caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ing chain (#33702) We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they appear before them in a chain, because they produce multiple tokens at the same position. This commit adds two methods to the TokenFilterFactory interface. * `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and `CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`. * `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym list `Analyzer`. By default it returns `true`. Fixes #33609
We currently special-case
SynonymFilterFactory
andSynonymGraphFilterFactory
, which need to know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they appear before them in a chain, because they produce multiple tokens at the same position.This commit adds two methods to the TokenFilterFactory interface.
ReferringFilterFactory
andCustomAnalyzerProvider.checkAndApplySynonymFilter
, and by default returnsthis
.true
.Fixes #33609