-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat 6.2.1 in Kubernetes Segfaults #6372
Comments
My config:
and
|
Hi @towolf, Thank you for reporting this, definitely there is something wrong, could you share the content of |
Ah, now I think I can see where this is going. Earlier today, we wanted to test how docker log rotation is handled,
but that was 6 hours ago, or so. So we were spamming megabytes of base64 encoded data into the log, using something like: Here's one of the resulting files, it's the one mentioned before the segfault: ab013b90ca95cb1a0d848ce537844044d7a05de434c1dd44199edcf36e99cb51-json.log I think this is perhaps the I'm very fond of this processor, BTW, it allows our devs to log multi-line messages by just printing JSON to stdout. |
It looks like the problem is happening in the |
I've cleared all old logs using this: Definitely during the day, my log host received quite a few messages without Kubernetes metadata attached. I'll have to study this more. |
Just to be sure, the |
That's correct |
Since yesterday, I sometimes get blocks of these:
|
And then I get message like these, which indicate it is choking on null bytes.
I found a docker logs file that starts with roughly 620000 null bytes after json-file log rotation:
Is filebeat docker prospector compatible with json-file log rotation? However, I only found one file like that so far. |
Both errors are not fatal:
About the JSON decoding error, it should be transient too, next messages should be correctly decoded. I'm researching the zeroes in the file after rotation, there are some measures we can implement to avoid this issue. Have you seen the panic again? |
Yes actually, I got 2 more segfaults on exactly the node that had the "malformed" logs. But also 1 on another one. This is on our staging cluster:
The log of the crash can be reached using this:
We do not have networking issues as far as I'm aware ... |
And this is from the other node that had 1 restart:
|
Looked through the logs, and this definitely started after upgrading filebeat from 6.1.1 to 6.2.1 on Feb 8th on two clusters. 50 Segfaults in total since that day. |
I'm also experiencing same issue. I'm using
|
@exekias I wonder if we should check here https://github.com/elastic/beats/blob/master/libbeat/processors/add_kubernetes_metadata/indexers.go#L161 for nil values and log an Error? Or should GetIndexes also support returning an error? |
s234 is the node with the most container churn, as this is the node where Gitlab-Runner schedules its build pods. This seems to be correlated with actual log contents? Or does the
|
Yes, I think I'm adding some defensive code + logging there to avoid this issue. Thank you everyone for reporting this! so far I couldn't reproduce the issue so the stack traces help a lot here |
While the root cause is unclear, this change adds defensive code against nil Pod processing errors. It also adds more logging to debug this further. Fixes elastic#6372
While the root cause is unclear, this change adds defensive code against nil Pod processing errors. It also adds more logging to debug this further. Fixes #6372
@exekias Merged. Good to know the other branches are not affected. Ignore my comment in the PR then. |
seeing the same (causes container to crash), i'm on the latest 6.2.2 build and on k8s 1.8.5:
|
Closed by #6487, to be released with |
Found another bad problem in #6536. Not sure if that may be related to this issue? |
While the root cause is unclear, this change adds defensive code against nil Pod processing errors. It also adds more logging to debug this further. Fixes elastic#6372
I was tailing the log of Filebeat because I saw that my Filebeat daemonset pods kept restarting.
This is the traceback:
For confirmed bugs, please report:
docker.elastic.co/beats/filebeat:6.2.1
Kubernetes 1.9.1 on amd64
Not sure
The text was updated successfully, but these errors were encountered: