-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing of ignore_above for keyword #105
Comments
The current 1024 is coming from Beats. I was almost commenting so far it rarely did cause any problems expect in some edge case. And now just a few moments ago one of these edge cases popped up: elastic/beats#8076 An other option would be to not have a default and only set it on some specific fields for ECS? |
Hmmm good point. Funny thing is I always assumed that I just double-checked with the following, and confirmed that I was wrong:
The aggregation returns only one bucket (but still reports {
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "ignore",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"txt": "quite a bit too long"
}
},
{
"_index": "ignore",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"txt": "short"
}
}
]
},
"aggregations": {
"texts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "short",
"doc_count": 1
}
]
}
}
} |
So in short, I agree we need to increase Perhaps as Nic is saying, we should even remove it... This one requires more thought. I would definitely keep it in any place that could be a vector of attack (e.g. attacker can crafting an extremely long URL). It's potentially ok to remove in any field that would be filled by internal systems, as opposed to user generated values. |
What can be used as an attack seems to me outside of the scope of ECS and more specific to each use case / implementation. That is why I think removing it would still a good option. |
Changing the value for |
Let's do that. The main challenge here is it requires a change on the Beats code side :-( |
Be careful, simply removing it can cause problems with multi fields. Here keyword should be set to max safe value. |
My thinking here is that this will be a responsibility of the operator to set |
I agree in principle. I'd like to see how this is done in practice, however. |
Yes, this is definitely something we have on our mind to fix for ECS 1.1. We'll be going over many things and cleaning up the details :-) Also, thanks for that screenshot. Are the top 3 values legit URIs, or an attempt at a buffer overflow? 😱 😆 Note that the meaning of ignore_above is to stop indexing a field beyond a given length. However the full original field would still be present in the event. It just wouldn't turn up on a search for that keyword field:
|
great stuff!
Most are legit URIs ironically- none of them were inbound web server
attacks either - those were really long length though.
Yep I am tracking ignore above, also just add to your list - which im sure
you know - they dont show in aggregations.
…On Thu, Mar 21, 2019 at 09:09 Mathieu Martin ***@***.***> wrote:
Yes, this is definitely something we have on our mind to fix for ECS 1.1.
We'll be going over many things and cleaning up the details :-)
Also, thanks for that screenshot. Are the top 3 values legit URIs, or an
attempt at a buffer overflow? 😱 😆
Note that the meaning of ignore_above is to stop indexing a field beyond a
given length. However the full original field would still be present in the
event. It just wouldn't turn up on a search for that keyword field:
- A filter on that exact value will fail to return that document
- A prefix search (e.g. for autocomplete) wouldn't consider this value
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#105 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGDr4sWWt9RMHunfnXTGQnWSA-CArOhfks5vY4SdgaJpZM4WKOaE>
.
|
Yes, good point. Although who has room to display a 500k long value in an aggregation? |
For some keywords like url.query 1024 ignore_above is not long enough. So either the default ignore_above should be increased (8192 seems to be safe for Lucene) or for some fields the length should be increased.
The text was updated successfully, but these errors were encountered: