-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix compound query_key and mixed mappings #1330
Conversation
I forgot to update CHANGELOG.md |
Does this related at all to #982 ? Also some unit test coverage are needed for these changes, even if there hadn't been coverage previously. The new Regarding @nsano-rururu's comment I think he meant "You forgot to update the changelog". |
A status update: I've been grinding on this for a while. While trying to write unit tests the current PR, (it feels like) I keep finding things like elastalert2/elastalert/ruletypes.py Lines 948 to 951 in 0716fd3
elastalert2/elastalert/elastalert.py Line 1326 in 0716fd3
query_key will be a string representing a single fieldname, but elastalert2/elastalert/loaders.py Line 416 in 0716fd3
So, I'm scratching my head as to what would be the right fix. I'm leaning towards adding guards to check for But, I'm also considering "should query_key just always be normalized to a list?" and then remove the "hidden" |
They could very well be related. One of the symptoms of this PR's bug is "man, why does |
From what I recall Now that I'm looking more closely at these code changes it appears your new That said, I'm now wondering what scenario it was where you found whitespace characters in the |
I've been wondering the same. After more debugger time, I see the elastalert2/elastalert/elastalert.py Line 337 in 0716fd3
It looks it |
elastalert2/elastalert/elastalert.py Lines 442 to 443 in 76ab593
and because the leaving us with the old commaspace-split replaced with a comma-split. i.e. 76ab593 introduced buggy behavior. |
Isn't this line elastalert2/elastalert/elastalert.py Line 337 in 0716fd3
query_key key, rather than the key itself.
|
Confusing, right? Sometimes elastalert2/elastalert/elastalert.py Line 337 in 0716fd3
hit['_source']['serialnumber.keyword,src_computer.keyword'] = 'NXHFPAA001234567890, LAPTOP-M16RFTYS' and when it comes time to alert, the alert calls elastalert2/elastalert/elastalert.py Lines 466 to 469 in 0716fd3
And this is in the context of a cardinality rule configured with query_key:
- serialnumber.keyword
- src_computer.keyword
raw_count_keys: false |
Yes, that |
Thinking more on this... The drawback to using that explicit delimiter is that is any other code or existing users relying on that field are now going to see that ugly delimiter in their alert message. If we leave it to the comma delimited approach we could at least do a sanity check after the split and see if the length of the split value array matches the But this isn't even the problem you're trying to solve so apologies for muddying up the issue. Back to the original code review, there may still be some benefit leaving the split as-is and using a |
9d1f291
to
51bcb39
Compare
I left the commits "organic". But if the project prefers consolidated commits, I can do that. |
I'm not seeing any new commits. |
51bcb39
to
bf18649
Compare
otherwise, a space gets prefixed to the value
The `raw_count_keys` handler now tests the intended value while iterating over `compound_query_key` terms. Previously, it was checking the method param `key` and erroneously skipping the add_keyword_postfix() conditional
allow finding of qk values from matches when using raw_count_keys=false hard-coding '.keyword' is probably wrong. Not sure where to get the string_multi_field_name from in this context
To use string.removesuffix() and f-strings (>=3.6) and type-hints for builtin collections and because it looks like the project is standardizing on 3.12 anyway
So, always remove it
bf18649
to
e78a4b7
Compare
🤦♂️ git did what I asked rather than what I wanted. Better now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this improvement, and for adding the log warning.
Description
Fixes to allow top_count_keys to still function when using a list of query_key values.
Checklist
make test-docker
with my changes.Questions or Comments
tl;dr fc583f7 and e0795fe are bug fixes. 9d1f291 is a guess.
I didn't attempt unit tests as a.
get_hits_terms
looks strongly coupled with elasticsearch results and b. there wasn't an existing get_hits_terms unit test (that I could find)Confirmed through manual testing that a single query_key still behaves as before and the list of query_key values now populates top_events_* as expected.
I have low confidence 9d1f291 is the correct way to handle
raw_count_keys=false
but it's working (finally) as expected, when dealing with a mix oftext
terms andkeyword
,ip
,int
, etc. terms. As it stands now, with this PR, a rule with this config works as I expect.Manual testing done on a python 3.11 image (similar to the Dockerfile) with vscode debugger checkpoints around the changes
make test-docker