-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for deprecations when analyzers are built #50908
Check for deprecations when analyzers are built #50908
Conversation
Pinging @elastic/es-search (:Search/Analysis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romseygeek I took a first look and was wondering how version-dependent logic like deprecation from V7.6 on and throwing errors starting with a later version of the analysis components would work. Can you add a small example for this maybe in some tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a short question, rest looks good to me.
analyze = | ||
TransportAnalyzeAction.analyze(req, registry, mockIndexService(), maxTokenCount); | ||
assertEquals(1, analyze.getTokens().size()); | ||
assertWarnings("Using deprecated token filter [deprecated]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small observation: I thought this checks that the normalize
method in DeprecatedTokenFilterFactory, so I tried changing it but the test didn't fail, do you know why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the default implementation of normalize
delegates to create
, so we get the warning from there instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that sounds like the "normalize()" method isn't actualy tested here? I saw we cover that in some other test though, so fine with whatever you decide doing here, I was just curious if this could be checked here through the "_analyze" API as well, but no problem it its too tricky I.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's tested, but in order to make it fail you can't just remove the method (because that then delegates to create()
), you have to have an implementation that doesn't emit a warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I've just realised that we're talking about the AnalyzeAction
- this always uses create()
rather than normalize()
, and we should probably change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably change that
This can most likely be done in a follow up PR though, I just wanted to understand whats going on here and which of the two methods in the DeprecatedTokenFilterFactory in this test is supposed to fire the warning, currently they emit the same text so its hard to tell which code path is checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two minor comments that would be nice to address, LGTM otherwise.
|
||
// Deprecation warnings are emitted when actual TokenStreams are built; this is usually | ||
// too late to be useful, so we build an empty tokenstream at construction time and | ||
// use it, to ensure that warnings are emitted immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also throw exceptions in some cases so it's not only about deprecations ?
@@ -264,4 +271,90 @@ public void testEnsureCloseInvocationProperlyDelegated() throws IOException { | |||
registry.close(); | |||
verify(mock).close(); | |||
} | |||
|
|||
public void testDeprecations() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test that throws an exception ?
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation This is worse in the case where components throw exceptions on upgrade. In this case, users This commit adds a new check that pushes an empty string through all user-defined analyzers Fixes #42349 |
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation warnings when they are first used. However, this means that no warnings are emitted when indexes are created with deprecated components, and users have to actually index a document to see warnings. This makes it much harder to see these warnings and act on them at appropriate times. This is worse in the case where components throw exceptions on upgrade. In this case, users will not be aware of a problem until a document is indexed, instead of at index creation time. This commit adds a new check that pushes an empty string through all user-defined analyzers and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings and exceptions are now emitted when indexes are created or opened. Fixes #42349
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation warnings when they are first used. However, this means that no warnings are emitted when indexes are created with deprecated components, and users have to actually index a document to see warnings. This makes it much harder to see these warnings and act on them at appropriate times. This is worse in the case where components throw exceptions on upgrade. In this case, users will not be aware of a problem until a document is indexed, instead of at index creation time. This commit adds a new check that pushes an empty string through all user-defined analyzers and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings and exceptions are now emitted when indexes are created or opened. Fixes elastic#42349
The lucene Ukrainian analyzer has a bug where a large in-memory dictionary is loaded and stored on a thread local for every tokenstream generated in a new thread (for more details see https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks added in #50908, we create a tokenstream for every registered analyzer in every shard, which means that any node with the ukrainian plugin installed will leak one copy of this dictionary per shard, whether or not the ukrainian analyzer is actually being used. This commit makes the plugin use a fixed version of the UkrainianMorfologikAnalyzer, until we merge a version of lucene that contains the upstream fix.
The lucene Ukrainian analyzer has a bug where a large in-memory dictionary is loaded and stored on a thread local for every tokenstream generated in a new thread (for more details see https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks added in #50908, we create a tokenstream for every registered analyzer in every shard, which means that any node with the ukrainian plugin installed will leak one copy of this dictionary per shard, whether or not the ukrainian analyzer is actually being used. This commit makes the plugin use a fixed version of the UkrainianMorfologikAnalyzer, until we merge a version of lucene that contains the upstream fix.
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.
This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an
IndexAnalyzers
object is built for each index; deprecation warningsare now emitted when indexes are created or opened.
Fixes #42349