-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log error messages and clean up monitor when indexing doc level queries or metadata creation fails #900
Conversation
client.execute( | ||
AlertingActions.DELETE_MONITOR_ACTION_TYPE, | ||
DeleteMonitorRequest(indexResponse.id, RefreshPolicy.IMMEDIATE) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets catch this as well and log that we couldnt do the monitor cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, we should separate out the logic of deleting monitors from the TransportDeleteMonitor class and call those helper functions here.
The same should be done for the other CRUD actions.
client.execute( | ||
AlertingActions.DELETE_MONITOR_ACTION_TYPE, | ||
DeleteMonitorRequest(indexResponse.id, RefreshPolicy.IMMEDIATE) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets catch this as well and log that we couldnt do the monitor cleanup. Also we would need to delete the metadata that we created as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete monitor transport action deletes the metadata as well.
6c11123
to
3178347
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #900 +/- ##
============================================
+ Coverage 75.95% 76.14% +0.19%
Complexity 110 110
============================================
Files 125 125
Lines 6969 6992 +23
Branches 1043 1043
============================================
+ Hits 5293 5324 +31
+ Misses 1143 1130 -13
- Partials 533 538 +5
|
Lets add security tests for this scenario where its with and without filter by backend role setting |
request.monitor = request.monitor.copy(id = indexResponse.id) | ||
var (metadata, created) = MonitorMetadataService.getOrCreateMetadata(request.monitor) | ||
if (created == false) { | ||
log.warn("Metadata doc id:${metadata.id} exists, but it shouldn't!") | ||
var metadata: MonitorMetadata? | ||
try { // delete monitor if metadata creation fails, log the right error and re-throw the error to fail listener | ||
request.monitor = request.monitor.copy(id = indexResponse.id) | ||
var (monitorMetadata: MonitorMetadata, created: Boolean) = MonitorMetadataService.getOrCreateMetadata(request.monitor) | ||
if (created == false) { | ||
log.warn("Metadata doc id:${monitorMetadata.id} exists, but it shouldn't!") | ||
} | ||
metadata = monitorMetadata | ||
} catch (t: Exception) { | ||
log.error("failed to create metadata for monitor ${indexResponse.id}. deleting monitor") | ||
cleanupMonitorAfterPartialFailure(indexResponse) | ||
throw t | ||
} | ||
if (request.monitor.monitorType == Monitor.MonitorType.DOC_LEVEL_MONITOR) { | ||
indexDocLevelMonitorQueries(request.monitor, indexResponse.id, metadata, request.refreshPolicy) | ||
try { | ||
if (request.monitor.monitorType == Monitor.MonitorType.DOC_LEVEL_MONITOR) { | ||
indexDocLevelMonitorQueries(request.monitor, indexResponse.id, metadata, request.refreshPolicy) | ||
} | ||
// When inserting queries in queryIndex we could update sourceToQueryIndexMapping | ||
MonitorMetadataService.upsertMetadata(metadata, updating = true) | ||
} catch (t: Exception) { | ||
log.error("failed to index doc level queries monitor ${indexResponse.id}. deleting monitor", t) | ||
cleanupMonitorAfterPartialFailure(indexResponse) | ||
throw t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have one big try catch for getting/creating the metadata and updating it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an outer try catch block, which returns listener failure, wrapping all this
creating and updating metadata are not consecutive calls. the update call is done for source To Query Index Mapping
we want the right error message for doc level queries related failures and metadata creation failures. Hence the separate try-catch blocks.
fun `test execute monitor without create when no monitors exists`() { | ||
val docQuery = DocLevelQuery(query = "test_field:\"us-west-2\"", name = "3") | ||
fun `test cleanup monitor on partial create monitor failure`() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing an existing test?
…tadata creation fails Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monitor metadata instead of workflow metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted successfully, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
monitor metadata instead of workflow metadata
createMonitor(monitor) | ||
fail("monitor creation should fail due to incorrect analyzer name in test setup") | ||
} catch (e: Exception) { | ||
Assert.assertEquals(client().search(SearchRequest(SCHEDULED_JOBS_INDEX)).get().hits.hits.size, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert queryIndex has 0 docs and no new mappings applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cant assert that because we dont fail delete monitor request when query index docs clean up fails.
So even though we call delete monitor there is no guarantee of that. It's only a best effort clean up
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets put the workflow metadata id in here, so it gives us the ability to delete it manually if needed.
} catch (e: Exception) { | ||
// we only log the error and don't fail the request because if monitor document has been deleted, | ||
// we cannot retry based on this failure | ||
log.error("Failed to delete workflow metadata for monitor ${monitor.id}.", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-900-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.7 2.7
# Navigate to the new working tree
cd .worktrees/backport-2.7
# Create a new branch
git switch --create backport/backport-900-to-2.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.7 Then, create a pull request where the |
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
…es or metadata creation fails (opensearch-project#900) (opensearch-project#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.3 2.3
# Navigate to the new working tree
cd .worktrees/backport-2.3
# Create a new branch
git switch --create backport/backport-900-to-2.3
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.3
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.3 Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.5 2.5
# Navigate to the new working tree
cd .worktrees/backport-2.5
# Create a new branch
git switch --create backport/backport-900-to-2.5
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 333349cebe6044646d04fad2e328f573d2a7e9c5
# Push it to GitHub
git push --set-upstream origin backport/backport-900-to-2.5
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.5 Then, create a pull request where the |
…es or metadata creation fails (opensearch-project#900) (opensearch-project#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: AWSHurneyt <hurneyt@amazon.com>
* [Backport 2.x] QueryIndex rollover when field mapping limit is reached (#729) Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> Signed-off-by: AWSHurneyt <hurneyt@amazon.com> * log error messages and clean up monitor when indexing doc level queries or metadata creation fails (#900) (#912) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: AWSHurneyt <hurneyt@amazon.com> * [Backport 2.x] Notification security fix (#861) * Notification security fix (#852) * added injecting whole user object in threadContext before calling notification APIs so that backend roles are available to notification plugin Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * compile fix Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * refactored user_info injection to use InjectSecurity Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * ktlint fix Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> --------- Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> (cherry picked from commit e0b7a5a) * remove unneeded import Signed-off-by: Ashish Agrawal <ashisagr@amazon.com> --------- Signed-off-by: Ashish Agrawal <ashisagr@amazon.com> Co-authored-by: Petar Dzepina <petar.dzepina@gmail.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com> Signed-off-by: AWSHurneyt <hurneyt@amazon.com> * Added missing imports. Signed-off-by: AWSHurneyt <hurneyt@amazon.com> * Multiple indices support in DocLevelMonitorInput (#784) (#808) Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> * Removed redundant calls to initDocLevelQueryIndex and indexDocLevelQueries. Signed-off-by: AWSHurneyt <hurneyt@amazon.com> * Fixed a bug that prevented alerts from being generated for doc level monitors that use wildcard characters in index names. (#894) (#902) Signed-off-by: AWSHurneyt <hurneyt@amazon.com> (cherry picked from commit 8c033b9) Co-authored-by: AWSHurneyt <hurneyt@amazon.com> Signed-off-by: AWSHurneyt <hurneyt@amazon.com> * Resolved backport issue for PR 729. Signed-off-by: AWSHurneyt <hurneyt@amazon.com> * Resolved backport issue for PR 758. Signed-off-by: AWSHurneyt <hurneyt@amazon.com> --------- Signed-off-by: Petar Dzepina <petar.dzepina@gmail.com> Signed-off-by: AWSHurneyt <hurneyt@amazon.com> Signed-off-by: Ashish Agrawal <ashisagr@amazon.com> Co-authored-by: Petar Dzepina <petar.dzepina@gmail.com> Co-authored-by: Surya Sashank Nistala <snistala@amazon.com> Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com> Co-authored-by: Ashish Agrawal <ashisagr@amazon.com>
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>
…es or metadata creation fails (opensearch-project#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com>
* log error messages and clean up monitor when indexing doc level queries or metadata creation fails (#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize doc-level monitor workflow for index patterns (#1097) Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize doc-level monitor execution workflow for datastreams (#1302) * optimize doc-level monitor execution for datastreams Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> * add more tests to address comments Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> * add integTest for multiple datastreams inside a single index pattern * add integTest for multiple datastreams inside a single index pattern Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> --------- Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Bulk index findings and sequentially invoke auto-correlations (#1355) * Bulk index findings and sequentially invoke auto-correlations Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Bulk index findings in batches of 10000 and make it configurable Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Addressing review comments Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Add integ tests to test bulk index findings Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Fix ktlint formatting Signed-off-by: Megha Goyal <goyamegh@amazon.com> --------- Signed-off-by: Megha Goyal <goyamegh@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Add jvm aware setting and max num docs settings for batching docs for percolate queries (#1435) * add jvm aware and max docs settings for batching docs for percolate queries Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix stats logging Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add queryfieldnames field in findings mapping Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize to fetch only fields relevant to doc level queries in doc level monitor instead of entire _source for each doc (#1441) * optimize to fetch only fields relevant to doc level queries in doc level monitor Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix test for settings check Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix ktlint Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize sequence number calculation and reduce search requests in doc level monitor execution (#1445) * optimize sequence number calculation and reduce search requests by n where n is number of shards being queried in the executino Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix tests Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * optimize check indices and execute to query only write index of aliases and datastreams during monitor creation Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix test Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add javadoc Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add tests to verify seq_no calculation Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix tests Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix BWC tests Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * clean up doc level queries on dry run (#1430) Signed-off-by: Joanne Wang <jowg@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix import Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix tests Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix BWC version Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix another test Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Revert order of operations change Signed-off-by: Chase Engelbrecht <engechas@amazon.com> --------- Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> Signed-off-by: Megha Goyal <goyamegh@amazon.com> Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Joanne Wang <jowg@amazon.com> Co-authored-by: Surya Sashank Nistala <snistala@amazon.com> Co-authored-by: Subhobrata Dey <sbcd90@gmail.com> Co-authored-by: Megha Goyal <56077967+goyamegh@users.noreply.github.com> Co-authored-by: Joanne Wang <jowg@amazon.com>
In Monitor creation flow adds logic to log correct error message and clean up monitor when there is failure in indexing doc level queries or in monitor metadata creation.
Secure test run
Issue #, if available:
#897
Description of changes:
CheckList:
[x] Commits are signed per the DCO using --signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.