-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change grok watch dog to be Matcher based instead of thread based. #48346
Conversation
There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (elastic#47374), and this PR is a followup of that PR. This change should also fix elastic#43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.
Pinging @elastic/es-core-features (:Core/Features/Ingest) |
@@ -38,7 +40,7 @@ | |||
public class TimeoutChecker implements Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droberts195 Does this change look good to you?
This change to ML is required in order to make the watchdog Matcher
based instead of Thread
based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good find! changes look good to me. I cannot comment on the ML changes. so I would wait to hear from the ML team on those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for making the necessary updates to ML!
My two comments in the ML test are just nits - it will still work as it is now but in the future somebody might wonder why it's messing with the interrupted flag at all.
The thing I noted about the return value of Grok.match
has been like that ever since the timeout functionality was introduced to Grok - it's not a problem introduced by this PR. So up to you if you want to leave it for a followup.
} finally { | ||
threadWatchdog.unregister(); | ||
matcherWatchdog.unregister(matcher); | ||
} | ||
return (result != -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with the Javadoc. The Javadoc says the return value is "true if grok expression matches text, false otherwise". But given the current code it should say "true if grok expression matches text or there is a timeout, false otherwise".
Probably a better fix would be to change this line to return (result >= 0);
. It looks like ML is the only component outside of test code that calls this method. If the ML file structure finder is told there's a match when actually there's a timeout then it will move onto the next step but then time out almost immediately afterwards when the overall elapsed time is checked during that next step, so the net effect is still that the endpoint times out. So from an ML perspective I don't mind whether you change this line or not. But it might be best to make the return value more intuitive before someone else uses this method in production code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it might be best to make the return value more intuitive before someone else uses this method in production code.
Agreed. I will change the jdocs in this PR and in a followup will do the change that you're suggesting here.
} finally { | ||
TimeoutChecker.watchdog.unregister(); | ||
TimeoutChecker.watchdog.unregister(matcher); | ||
assertThat(watchdog.registry.get(Thread.currentThread()).matchers.size(), equalTo(0)); | ||
} | ||
} finally { | ||
// ensure the interrupted flag is cleared to stop it making subsequent tests fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this finally
block is unnecessary now that the other code in this test is not altering whether the current thread is interrupted.
@@ -59,18 +64,24 @@ public void testCheckTimeoutExceeded() throws Exception { | |||
} | |||
} | |||
|
|||
public void testWatchdog() { | |||
public void testWatchdog() throws Exception { | |||
|
|||
assertFalse(Thread.interrupted()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the finally
block, I think this is redundant now that the test doesn't check or change whether the current thread is interrupted.
…48346) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.
Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a
Matcher
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)
Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.
This change should also fix #43673, since it appears when unit tests
are ran, the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.