Change grok watch dog to be Matcher based instead of thread based. #48346

martijnvg · 2019-10-22T12:01:10Z

There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a Matcher
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran, the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.

There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (elastic#47374), and this PR is a followup of that PR. This change should also fix elastic#43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.

elasticmachine · 2019-10-22T12:01:12Z

Pinging @elastic/es-core-features (:Core/Features/Ingest)

martijnvg · 2019-10-22T12:28:42Z

...k/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/filestructurefinder/TimeoutChecker.java

 * periodically call {@link Thread#interrupted}, so it is not much more of an
 * inconvenience to have to periodically call this class's {@link #check} method.
 */
 public class TimeoutChecker implements Closeable {


@droberts195 Does this change look good to you?

This change to ML is required in order to make the watchdog Matcher based instead of Thread based.

talevy

good find! changes look good to me. I cannot comment on the ML changes. so I would wait to hear from the ML team on those

droberts195

LGTM

Thanks for making the necessary updates to ML!

My two comments in the ML test are just nits - it will still work as it is now but in the future somebody might wonder why it's messing with the interrupted flag at all.

The thing I noted about the return value of Grok.match has been like that ever since the timeout functionality was introduced to Grok - it's not a problem introduced by this PR. So up to you if you want to leave it for a followup.

droberts195 · 2019-10-23T08:54:06Z

libs/grok/src/main/java/org/elasticsearch/grok/Grok.java

-            threadWatchdog.unregister();
+            matcherWatchdog.unregister(matcher);
        }
        return (result != -1);


This is inconsistent with the Javadoc. The Javadoc says the return value is "true if grok expression matches text, false otherwise". But given the current code it should say "true if grok expression matches text or there is a timeout, false otherwise".

Probably a better fix would be to change this line to return (result >= 0);. It looks like ML is the only component outside of test code that calls this method. If the ML file structure finder is told there's a match when actually there's a timeout then it will move onto the next step but then time out almost immediately afterwards when the overall elapsed time is checked during that next step, so the net effect is still that the endpoint times out. So from an ML perspective I don't mind whether you change this line or not. But it might be best to make the return value more intuitive before someone else uses this method in production code.

But it might be best to make the return value more intuitive before someone else uses this method in production code.

Agreed. I will change the jdocs in this PR and in a followup will do the change that you're suggesting here.

droberts195 · 2019-10-23T08:55:37Z

...gin/ml/src/test/java/org/elasticsearch/xpack/ml/filestructurefinder/TimeoutCheckerTests.java

+                assertThat(watchdog.registry.get(Thread.currentThread()).matchers.size(), equalTo(0));
            }
        } finally {
            // ensure the interrupted flag is cleared to stop it making subsequent tests fail


I think this finally block is unnecessary now that the other code in this test is not altering whether the current thread is interrupted.

droberts195 · 2019-10-23T08:56:14Z

...gin/ml/src/test/java/org/elasticsearch/xpack/ml/filestructurefinder/TimeoutCheckerTests.java

-    public void testWatchdog() {
+    public void testWatchdog() throws Exception {

        assertFalse(Thread.interrupted());


Similar to the finally block, I think this is redundant now that the test doesn't check or change whether the current thread is interrupted.

…48346) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.

martijnvg added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.0.0 v7.6.0 labels Oct 22, 2019

martijnvg requested a review from talevy October 22, 2019 12:01

fixed checkstyle violation

2ec4e95

martijnvg commented Oct 22, 2019

View reviewed changes

Merge remote-tracking branch 'es/master' into matcher_watch_dog

a672499

talevy approved these changes Oct 22, 2019

View reviewed changes

droberts195 approved these changes Oct 23, 2019

View reviewed changes

martijnvg added 2 commits October 24, 2019 14:27

Merge remote-tracking branch 'es/master' into matcher_watch_dog

5a05235

iter

bd1766d

martijnvg merged commit 12d32af into elastic:master Oct 24, 2019

DaveCTurner mentioned this pull request Oct 25, 2019

MatcherWatchdogTests fail occasionally #48519

Closed

droberts195 mentioned this pull request Nov 5, 2019

[CI] TimeoutCheckerTests.testWatchdog failing regularly #48861

Closed

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change grok watch dog to be Matcher based instead of thread based. #48346

Change grok watch dog to be Matcher based instead of thread based. #48346

Uh oh!

martijnvg commented Oct 22, 2019 •

edited

Loading

Uh oh!

elasticmachine commented Oct 22, 2019

Uh oh!

martijnvg Oct 22, 2019

Uh oh!

talevy left a comment

Uh oh!

droberts195 left a comment

Uh oh!

droberts195 Oct 23, 2019 •

edited

Loading

Uh oh!

martijnvg Oct 24, 2019

Uh oh!

droberts195 Oct 23, 2019

Uh oh!

droberts195 Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Change grok watch dog to be Matcher based instead of thread based. #48346

Change grok watch dog to be Matcher based instead of thread based. #48346

Uh oh!

Conversation

martijnvg commented Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Oct 22, 2019

Uh oh!

martijnvg Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

talevy left a comment

Choose a reason for hiding this comment

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

droberts195 Oct 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg Oct 24, 2019

Choose a reason for hiding this comment

Uh oh!

droberts195 Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

droberts195 Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

martijnvg commented Oct 22, 2019 •

edited

Loading

droberts195 Oct 23, 2019 •

edited

Loading