-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AO][SERVERLESS] Fix Custom Threshold rule tests for Serverless #167644
[AO][SERVERLESS] Fix Custom Threshold rule tests for Serverless #167644
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
Pinging @elastic/actionable-observability (Team: Actionable Observability) |
@@ -51,10 +48,6 @@ export default function ({ getService }: FtrProviderContext) { | |||
index: CUSTOM_THRESHOLD_RULE_ALERT_INDEX, | |||
query: { term: { 'kibana.alert.rule.uuid': ruleId } }, | |||
}); | |||
await esClient.deleteByQuery({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to remove cleaning up the event log index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maryam-saeidi, good question. Actually, in this use case, a.k.a "no_data," we don't ingest any data; hence, there is nothing to delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maryam-saeidi thank you for pointing that out. There is a race condition on this index as we use the wildcard option *
, which creates inconsistent results and sometimes throws an error.
"cause":{"type":"version_conflict_engine_exception","reason":"[S_D574oB8qizAIQCLOib]: version conflict, required seqNo [1], primary term [1]. but no document was found","index_uuid":"cZ1A-w-KS1qY34KrbPGHPA","shard":"0","index":".ds-.kibana-event-log-ds-2023.10.02-000001"},"status":409},{"index":".ds-.kibana-event-log-ds-2023.10.02-000001","id":"TPD574oB8qizAIQCMOiB","cause":{"type":"version_conflict_engine_exception","reason":"[TPD574oB8qizAIQCMOiB]: version conflict, required seqNo [2], primary term [1]. but no document was found","index_uuid":"cZ1A-w-KS1qY34KrbPGHPA","shard":"0","index":".ds-.kibana-event-log-ds-2023.10.02-000001"},"status":409}
So, I changed the delete term to use the rule.id
instead of the alert.consumer
to be more accurate and avoid this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, how does changing the field solve the issue?
The error says there is no data based on the query, so it should be the same case for any field (including rule.id
), isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maryam-saeidi No, the error message is before I change it to rule.id
The error says there is no data based on the query,
This is why I removed the deleteByQuery
in the beginning, but then I realized there is a racing condition.
If you look at the error type version_conflict_engine_exception
. It means there are two operations at the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I removed the deleteByQuery in the beginning, but then I realized there is a racing condition.
If you look at the error type version_conflict_engine_exception. It means there are two operations at the same.
I still don't understand, either the document exists in the index, which should contain both fields (rule.id
and kibana.alert.rule.consumer
), or the document does not exist. How does checking one field instead of the other help?
In which case, do we have a document with rule.id
that does not contain kibana.alert.rule.consumer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maryam-saeidi, the test files run in parallel. So we could have two test files trying to perform CRUD operations on the same index. As the error message I shared mentioned, version conflict, required seqNo [1], primary term [1].
. e.g. A rule is running and still generating events while another after
from another file tries to delete everything using the wildcard *
. (race condition)
So when I scoped the deletion i.e. every rule/every test file deleted ONLY its generated event using rule.id
we
no longer have any issue. We should have relied on the rule.id
instead of consumer from the beginning.
Is that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, super clear, thanks.
Then can we bring back cleaning the event log index with the rule.id
condition? Otherwise, after this test, we still have data in the event log index.
await esClient.deleteByQuery({
index: '.kibana-event-log-*',
query: { term: { 'rule.id': ruleId } },
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I already got it back!
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @fkanout |
Summary
Revert the revert for #166942 💪🏻
Fixes #165569
Fixes #166617
Fixes #166618
Fixes #166619
Fixes #166620