-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WatcherYamlRestIT fails on test get watch api with missing watch on existing index #53177
Comments
Pinging @elastic/es-core-features (:Core/Features/Watcher) |
I tried, but it didn't reproduce locally for me. |
This also failed on 7.x today https://gradle-enterprise.elastic.co/s/izleuckcsczy6/tests/j3inyc2qyhaby-3e5dsrwmfrm6w. @martijnvg I see you have a commit that mentions this issue. I wonder if there might be a race condition in stopping the TriggerService where the TriggerService is stopped but a watch is added concurrently/after the stop of the triggerservice somehow, maybe by the WatcherIndexingListener? |
Thanks @jaymode for triaging this failure. I also did some investigating and if watcher is stopped then the I'm not sure whether this problem occurs here, the watcher rest qa module doesn't execute tests parallel and I've not yet found evidence that a watch was added after watcher was stopped. The commit that I pushed, delays the checking of the watcher stats' |
There's another instance of this failure on 7.x: https://gradle-enterprise.elastic.co/s/xreqekg5chqoy One failure on
…and another on
|
One more on |
PR elastic#53362 muted the entire suite of REST tests, but only 2 tests were identified as failing. This PR un-mutes the full suite and only mutes the 2 test identified on elastic#53177
It appears that this may be a setup/tear down issue and build-stats shows more test failures than the ones referenced here. As of #53362 all of the REST tests for Watcher are muted and will look into this issue as soon as possible. |
…inning of a test. Relates to elastic#53177
Also log a message instead of failing if there are active watches at a beginning of a test. Relates to #53177
Also log a message instead of failing if there are active watches at a beginning of a test. Relates to #53177
Another failure of SmokeTestWatcherWithSecurityClientYamlTestSuiteIT on 7.x today, which doesn't reproduce for me. ./gradlew ':x-pack:plugin:watcher:qa:with-security:integTestRunner' --tests "org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityClientYamlTestSuiteIT.test {yaml=watcher/usage/10_basic/Test watcher usage stats output}" Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+multijob+fast+part2/4330/console Stack trace: |
@martijnvg @jakelandis new build failure: https://gradle-enterprise.elastic.co/s/y54f4tptbqszc Here are the logs. |
muting watcher yaml tests. See issue #53177
Basde on the last failure. The logs show:
It seems that on a fresh start the the watch_count is 1. So either the stat count is not getting updated fast enough (e.g. the new watch didn't seem to be counted), or the prior watch was not deleted on the first call to usage (per the view of the usage api), but was deleted (per the view of the usage api) by the time it go to the second call. I suspect the second scenerio based on the logs above. |
Here is the server log with some context:
It looks like we still have start/stop issues specifically:
It triggered a job, paused execution, then creates the .watches indices (maybe due to the watch itself updating the .watches index ?) which causes watcher to restart in the background ? |
I think I see what is happening here ... When Watcher executes a Watch it will update the .watches index, on 'postIndex' to the .watches it will add that Watch to the trigger service. The trigger service does not have any notion of started/stopped state, so if we shutdown Watcher, it can be in a stopping or stopped state (which clears out the trigger service), but the trigger service is running and the update .watches added that watch back to the trigger service. So the watches can actually run via the trigger service even when the state is stopped. Subsequent watches don't actually run due the pause check (watcher execution service paused, not processing [1] events) , but adding watch back to the trigger service throws off the watch_count which is sourced from the trigger service. I am still exploring the fixes... my notes
|
Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to #53177 and #56534
Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to elastic#53177 and elastic#56534
Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to elastic#53177 and elastic#56534
Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to elastic#53177 and elastic#56534
I am hopeful that #56556 stabilizes these tests. |
Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to #53177 and #56534
Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to #53177 and #56534
Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to #53177 and #56534
No failures since is the introduction of #56556 (its hard to tell from the screenshot but that failure is on May 11 just prior to the tests getting muted) Closing this issue. |
The following two builds failed for similar-looking reasons:
The failure looked like this:
I haven't attempted to reproduce this.
The text was updated successfully, but these errors were encountered: