-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds plugin version sweep background job #434
Adds plugin version sweep background job #434
Conversation
Codecov Report
@@ Coverage Diff @@
## main #434 +/- ##
=========================================
Coverage 75.94% 75.95%
- Complexity 2480 2492 +12
=========================================
Files 315 316 +1
Lines 14500 14547 +47
Branches 2243 2248 +5
=========================================
+ Hits 11012 11049 +37
- Misses 2239 2246 +7
- Partials 1249 1252 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general questions:
- can we disable the trigger logic in skipExecution since we now have this background loop.
trigger logic I am referring to
override fun clusterChanged(event: ClusterChangedEvent) {
if (event.nodesChanged() || event.isNewCluster) {
sweepISMPluginVersion()
}
}
in SkipExecution
val SWEEP_SKIP_PERIOD: Setting<TimeValue> = Setting.timeSetting( | ||
"opendistro.index_state_management.coordinator.sweep_skip_period", | ||
TimeValue.timeValueMinutes(10), | ||
TimeValue.timeValueMinutes(5), | ||
Setting.Property.NodeScope, | ||
Setting.Property.Dynamic, | ||
Setting.Property.Deprecated | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to have this if we are adding a new setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Tnx!
if (!skipExecution.flag) { | ||
logger.info("Canceling sweep ism plugin version job") | ||
scheduledSkipExecution?.cancel() | ||
} else { | ||
skipExecution.sweepISMPluginVersion() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to cancel this job or let it run forever?
…he case of version discrepancy Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
…r scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
027e78e
to
151fec9
Compare
private fun isIndexStateManagementEnabled(): Boolean = indexStateManagementEnabled == true | ||
|
||
companion object { | ||
private const val RETRY_PERIOD_IN_MINUTES = 5L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this same as sweepSkipPeriod
? If so, should we use sweepSkipPeriod instead?
Good question. And you are right - I am thinking the same. SkipExecution class should do only sweepISMPluginVersion, while the caller class will be responsible for triggering the request. So, my proposal is: Caller class, PluginVersionSweepCoordinator, will listen for cluster changed events and will be responsible for calling the sweepISM method. This class already has a scheduled job that can be canceled optionally (ie. if the skip flag is being set to true). ie.
|
…lag up to 5 mins Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
85cca3c
to
47b7a24
Compare
Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
* [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Empty-Commit Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Co-authored-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 4d844fa)
* [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Empty-Commit Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Co-authored-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 4d844fa) Co-authored-by: Clay Downs <downsrob@amazon.com>
* [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Empty-Commit Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Co-authored-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 4d844fa)
* initial framework Signed-off-by: Joanne Wang <jowg@amazon.com> * Removed recursion from Explain Action to avoid stackoverflow in some situations (#419) Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> Signed-off-by: Joanne Wang <jowg@amazon.com> * enabled by default integrated Signed-off-by: Joanne Wang <jowg@amazon.com> * cleaned up comments and logs, created unit test and updated previous integration tests Signed-off-by: Joanne Wang <jowg@amazon.com> * added delete validation logic Signed-off-by: Joanne Wang <jowg@amazon.com> * fixed rollover validation unit tests Signed-off-by: Joanne Wang <jowg@amazon.com> * added validation info field to ManagedIndexMetaData Signed-off-by: Joanne Wang <jowg@amazon.com> * removed step context as input Signed-off-by: Joanne Wang <jowg@amazon.com> * added validationmetadata class Signed-off-by: Joanne Wang <jowg@amazon.com> * restored old integration tests and changed validation service output Signed-off-by: Joanne Wang <jowg@amazon.com> * before integrated validation meta data into managed index meta data Signed-off-by: Joanne Wang <jowg@amazon.com> * integrated validation meta data Signed-off-by: Joanne Wang <jowg@amazon.com> * working version Signed-off-by: Joanne Wang <jowg@amazon.com> * added validation mapping Signed-off-by: Joanne Wang <jowg@amazon.com> * fixed integ tests Signed-off-by: Joanne Wang <jowg@amazon.com> * renamed some values Signed-off-by: Joanne Wang <jowg@amazon.com> * before removing from managed index meta data Signed-off-by: Joanne Wang <jowg@amazon.com> * created validation result object in explain Signed-off-by: Joanne Wang <jowg@amazon.com> * testing Signed-off-by: Joanne Wang <jowg@amazon.com> * run fails Signed-off-by: Joanne Wang <jowg@amazon.com> * integration test for delete + added framework for force merge Signed-off-by: Joanne Wang <jowg@amazon.com> * removed step validation metadata and still testing explain results Signed-off-by: Joanne Wang <jowg@amazon.com> * before removing from managed index runner Signed-off-by: Joanne Wang <jowg@amazon.com> * removed from managed index runner Signed-off-by: Joanne Wang <jowg@amazon.com> * clean up and tests Signed-off-by: Joanne Wang <jowg@amazon.com> * all validation tests pass Signed-off-by: Joanne Wang <jowg@amazon.com> * removed validation result from all managed index meta data Signed-off-by: Joanne Wang <jowg@amazon.com> * restored old IT tests Signed-off-by: Joanne Wang <jowg@amazon.com> * fixed it tests, set explain validation to false Signed-off-by: Joanne Wang <jowg@amazon.com> * clean up Signed-off-by: Joanne Wang <jowg@amazon.com> * Change test page size to avoid index/search TimeInMillis < 1 issue. (#460) * Change test page size to avoid indexTimeInMillis < 1 issue. Signed-off-by: Angie Zhang <langelzh@amazon.com> * Change test page size to avoid indexTimeInMillis < 1 issue. Signed-off-by: Angie Zhang <langelzh@amazon.com> Signed-off-by: Angie Zhang <langelzh@amazon.com> * Transform maxclauses fix (#477) * transform maxClauses fix Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * added bucket log to track processed buckets Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * various renames/changes Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * fixed detekt issues Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * added comments to test Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * removed debug logging Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * empty commit to trigger checks Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * reduced pageSize to 1 in few ITs to avoid flaky tests; fixed bug where pagesProcessed was calculated incorrectly Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * reverted pagesProcessed change; fixed few ITs Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * 483: Updated detekt plugin and snakeyaml dependency. Updated a code t… (#485) * 483: Updated detekt plugin and snakeyaml dependency. Updated a code to reduce the number of issues after static analysis Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * 483: Updated snakeyaml version to use the latest Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Remove HOST_DENY_LIST usage as Notification plugin will own it (#471) (#107) Signed-off-by: Xuesong Luo <lxuesong@amazon.com> Signed-off-by: Xuesong Luo <lxuesong@amazon.com> * Disable detekt because of the CVE (#497) Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * Deprecate Master nonmenclature (#501) Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> * [AUTO] Increment version to 2.3.0-SNAPSHOT (#484) (#503) * fix#921-README-forum-link-index_mgmnt (#499) Signed-off-by: cwillum <cwmmoore@amazon.com> Signed-off-by: cwillum <cwmmoore@amazon.com> * 64: Added rounding when using aggreagate script for avg metric. Added… (#490) * 64: Added rounding when using aggreagate script for avg metric. Added unit tests for checking average aggregations against the target rollup index Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * 64: Rollup job renamed Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * 64: Removed unrelevant metrics for the avg calculation test Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Revert Disable detekt and force choose snakeyml 1.32 (#528) * Revert Disable detekt: 50ac1e9 Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Remove force choosing snakeyml 1.31 Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Force snakeyaml 1.32 Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Empty commit Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> * Added 2.3 release note (#507) (#515) (#517) * Update 2.3 release note Signed-off-by: Angie Zhang <langelzh@amazon.com> * Update 2.3 release note Signed-off-by: Angie Zhang <langelzh@amazon.com> * Update 2.3 release note Signed-off-by: Angie Zhang <langelzh@amazon.com> * Update 2.3 release note Signed-off-by: Angie Zhang <langelzh@amazon.com> * Update 2.3 release note Signed-off-by: Angie Zhang <langelzh@amazon.com> Signed-off-by: Angie Zhang <langelzh@amazon.com> (cherry picked from commit d9793ac) Signed-off-by: Angie Zhang <langelzh@amazon.com> Signed-off-by: Angie Zhang <langelzh@amazon.com> (cherry picked from commit 7217b5b) Co-authored-by: Angie Zhang <langelzh@amazon.com> * Add 2.2 release note (#450) (#452) (#516) * Add 2.2 release note Signed-off-by: Angie Zhang <langelzh@amazon.com> * Add 2.2 release note Signed-off-by: Angie Zhang <langelzh@amazon.com> Co-authored-by: Angie Zhang <langelzh@amazon.com> (cherry picked from commit 8eb5da6) Signed-off-by: Angie Zhang <langelzh@amazon.com> Signed-off-by: Angie Zhang <langelzh@amazon.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com> * Adds plugin version sweep background job (#434) * [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Empty-Commit Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Co-authored-by: Stevan Buzejic <buzejic.stevan@gmail.com> * flaky transform test fix attempt (#542) * flaky transform test fix attempt Signed-off-by: Petar Dzepina <petar.dzepina@vroom.com> * accidental paste fix Signed-off-by: Petar Dzepina <petar.dzepina@vroom.com> Signed-off-by: Petar Dzepina <petar.dzepina@vroom.com> Co-authored-by: Petar Dzepina <petar.dzepina@vroom.com> Signed-off-by: Joanne Wang <jowg@amazon.com> Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> Signed-off-by: Angie Zhang <langelzh@amazon.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Xuesong Luo <lxuesong@amazon.com> Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com> Signed-off-by: cwillum <cwmmoore@amazon.com> Signed-off-by: Siddhant Deshmukh <deshsid@amazon.com> Signed-off-by: Petar Dzepina <petar.dzepina@vroom.com> Co-authored-by: Petar <petar.dzepina@gmail.com> Co-authored-by: Angie Zhang <98716549+Angie-Zhang@users.noreply.github.com> Co-authored-by: Stevan Buzejic <30922513+stevanbz@users.noreply.github.com> Co-authored-by: xluo-aws <109580118+xluo-aws@users.noreply.github.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com> Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com> Co-authored-by: Chris Moore <107723039+cwillum@users.noreply.github.com> Co-authored-by: Siddhant Deshmukh <deshsid@amazon.com> Co-authored-by: Angie Zhang <langelzh@amazon.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com> Co-authored-by: Clay Downs <downsrob@amazon.com> Co-authored-by: Stevan Buzejic <buzejic.stevan@gmail.com> Co-authored-by: Petar Dzepina <petar.dzepina@vroom.com>
…ensearch-project#539) * [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Empty-Commit Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Co-authored-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 4d844fa) Co-authored-by: Clay Downs <downsrob@amazon.com>
…ensearch-project#539) * [207]: Added 5 min scheduled job for sweeping ISM plugin version in the case of version discrepancy Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Created pluginVersionSweepCoordinator component responsible for scheduling the skip execution task. Annotated tests in order to prevent thread leak error during integrational tests Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * [207]: Increased retry period for background job that sets the skip flag up to 5 mins Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> * Empty-Commit Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Co-authored-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 4d844fa) Co-authored-by: Clay Downs <downsrob@amazon.com> Signed-off-by: Ronnak Saxena <ronsax@amazon.com>
Issue #, if available:
#207
Description of changes:
Index Management currently skips all job executions when there are two differing versions of Index Management on the cluster. The plugin currently does this by performing a NodesInfoRequest to get and compare plugin versions whenever there is a node added or a new cluster, and set a flag, SkipExecution, to true when there are multiple plugin versions. We have seen cases where the SkipExecution flag is still set to true even though the upgrade process (early ES 7.x to later ES 7.x) has finished and the cluster is on the latest version w/ all nodes containing the same version of IM plugin.
From analyzing the code, we can see race conditions that would allow multiple requests to overwrite each other in the wrong order. Though the cluster changed events would come in order, the NodesInfoRequests may actually overwrite the flag out of order.
To resolve this race condition, this PR adds a background job which will run every five minutes to poll the plugin versions if the flag is currently set to true.
This is an alternative strategy to #423 and is also entirely by Stevan Buzejic, @stevanbz, I am just raising the PR for an early review.
CheckList:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.