-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keyword_repeat
and multiplexer
don't play well with subsequent synonym filters
#33609
Comments
Pinging @elastic/es-search-aggs |
#30968 is supposed to address this sort of issue, although it's still not a great workaround, and I think in this case will end up with the synonym filter just being a no-op. I wonder if we should make it possible to specify an analyzer to use for the synonym token filter? Mostly synonyms just need tokenization, with some light normalization (eg lowercasing) applied. Trying to re-use the analysis chain up to the point that the synonym filter appears ends up causing all sorts of weird issues if anything complex is being done. |
Thinking about this some more, including a synonym filter within a condition or multiplex filter currently won't work at all, because SynonymTokenFilterFactory isn't completely set up in its constructor like other filters, and needs an additional method call to load its dictionaries. |
+1 this would help work around these issues in the rare cases when today's default behavior doesn't work. |
Wouldn't this defeat the purpose of the change we made in 6x ? I agree with the fact that only light normalization should be applied to synonym but we also want to avoid the case where the analyzer that is used to build the synonym map is not compatible with the analyzer chain that is used by the field. If you put a |
True, although I think the answer here is more "don't put a stemmer before a synonym filter". Limiting what can appear before the synonym filter is tricky though - what do we do with custom tokenfilters, for example? |
As far as I understand some use cases though, it makes sense to put a stemmer in front of a |
…ing chain (#33702) We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they appear before them in a chain, because they produce multiple tokens at the same position. This commit adds two methods to the TokenFilterFactory interface. * `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and `CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`. * `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym list `Analyzer`. By default it returns `true`. Fixes #33609
…ing chain (#33702) We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they appear before them in a chain, because they produce multiple tokens at the same position. This commit adds two methods to the TokenFilterFactory interface. * `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and `CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`. * `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym list `Analyzer`. By default it returns `true`. Fixes #33609
I am facing the same issue . did u get solution? |
@sanjusagare #33702 should have solved this. If you are having general questions or problems please use the support forums over at https://discuss.elastic.co. We prefer to use Github issues only for bug reports and feature requests, and we think it's more likely this is a question than a bug report and its better to continue the discussion there than on a closed issue. |
I recently saw an issue where an anlyzer chain was set up to perform some stemming on the input and then apply a synonym filter afterwards.
In order to also keep the unstemmed tokens in the output (and apply synonyms as well there if possible), a
keyword_repeat
filter was used, butthis already leads to errors on index creating because the synonyms in the filter are validated by running through the analysis chain:
Gives:
I also tried using a
multipexer
like so, but that is running into similar issues:I'm wondering if I'm using this the wrong way or if there are other ways to achieve similar effect.
Also I'm trying to understand what the position checks that are causing this rejection in
SynonymMap#analyze
are supposed to preventand if those checks could possibly be omitted for the case of the tokens generated by
keyword_repeat
ormultiplexer
.The text was updated successfully, but these errors were encountered: