-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd stops processing logs, but keeps running #1630
Comments
We continue to see this issue regularly. Is there any additional detail that would be helpful in assisting diagnosis? |
Does this mean file rotation happened but pos file is not updated? |
I don't know for sure. I definitely saw the index positions being incremented. I am not sure if new logs were added to the pos file. |
I see something similar as well on 0.14.20 for logs in /var/log/containers on a kubernetes cluster. Eventually fluentd will not pickup any new logs in the folder. I have not gotten into looking at the pos file yet. I can say that restarting fluentd will pickup the logs and catchup with past history so I expect the pos file is not updating. |
We are also seeing this intermittently in our Kubernetes platform.
The the
|
I am seeing the same issue running
This happens on our relatively high-volume PostrgreSQL servers, with 12 CSV log files (one for each hour) rotated every 12 hours, each about 1.2GB with 4120603 lines. Details here: https://gist.github.com/goakley/d0968cb9411841e33eda7ccb896cde36 |
Sorry, I missed several comments above. @goakley Could you get the gstack, tool in gdb, result of hanging child process? |
@repeatedly I couldn't get
Will a https://gist.github.com/goakley/b410ad35ab2a10694088432149de06e1 |
Is there any hope that someone can pick this up? We'd like to know if we should investigate different logging solutions to see if this is a consistent behaviour or not. |
Sorry, I forgot to reply...
I'm not sure which is the cause yet, but how about disabling inotify? If set |
@repeatedly I have applied that change and will follow up in a few days with the application's performance. |
@repeatedly I am hitting this similar issue and strace,pstack and gdb thread backtrace has this, it looks like we are hitting the exact same issue blocked by inotify, can you please confirm?
I will try putting this 'enable_stat_watcher false' in config and update if it resolves the issue. I even see my worker process dies and starts again just after few seconds of deployment, let me know if you have insights on that. I will share the configs if you want. |
@repeatedly since applying |
sorry for late reply here, that fixed the issue for me too.. |
I just updated in_tail article for
I assume the problem is inotify scalability. |
hi, @repeatedly , in my case I also have fluentd-1.1.2 periodically hanging and stop processing logs. Inside log directory which suppose to be tailed by fluentd I see the file with timestamp: Apr 18 19:08 Configuration of tail source:
|
* fluentd-cloudwatch: Fix Fluentd hanging fluent/fluentd#1630 * Update Chart.yaml
Have same problem here, the Pod crashed and was restarted but Fluentd is failing to collect logs. Maybe a symlink issue? Use realpath instead of first symlink? |
@mblaschke fluentd supports symlink. Is your problem reproducible with symlink? |
We seen a similar issue where the systemd input plugin just stopped moving forward. |
* fluentd-cloudwatch: Fix Fluentd hanging fluent/fluentd#1630 * Update Chart.yaml
* fluentd-cloudwatch: Fix Fluentd hanging fluent/fluentd#1630 * Update Chart.yaml Signed-off-by: voron <av@arilot.com>
The suggestion by @keir-rex does not work as the container is not able to come up as liveness check keeps fails when it does and it restarts container continuously. Would really like to get a fix for this. |
I'm seeing a similar issue using a custom input plugin that polls from SQS. |
it's a stale issue. I'm closing. if you have any problem, updating fluentd might help it. |
We have seen this issue in our containerized Fluentd instance and the tail input. As the number of files/volume of logs increases the issue seems to occur more often. We are currently running version Does anyone know if this is resolved in a newer version? If so which version? |
I'm also seeing this for high-volume logs, even on the newest version of td-agent. |
To provide more datapoints here:
|
We tried both And enable_stat_watcher false We are still missing log file and pos file udpate. As we are logging very fast, files get rotated and few of the logs missing. |
Same issue here, I've increased inotify watchers but fluents detect nothing. |
@repeatedly I'm facing the same issue too. I've tried enable_stat_watcher false |
We are facing this issue with /var/log/containers/*.log from kubernetes; only a few logs get picked up... |
Has this problem been solved? Fluentd 1.12.0 still has this problem |
We are seeing an issue where fluentd will stop processing logs after some time, but the parent and child processes seem to be running normally.
We are running fluentd in a docker container on a kubernetes cluster, mounting the docker log volume /var/log/containers on the host.
In a recent incident, we saw logs cease being forwarded to the sumologic output, but activity continued in the fluentd log until 12 minutes after that time, eventually no longer picking up new logs (e.g. "following tail of...") at some point after that.
containers.log.pos
continued being updated for 1 hour 13 minutes after the first sign of problems, until it stopped being updated.Killing the fluentd child process gets everything going again.
Config, strace, lsof and sigdump included below.
Details:
fluentd or td-agent version.
fluentd 0.12.37
Environment information, e.g. OS.
host: 4.9.9-coreos-r1
container: debian jessie
Your configuration
see attachments
Attachments:
fluentd config
lsof of child process
sigdump of child process
strace of child process
fluentd log
The text was updated successfully, but these errors were encountered: