Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change logrotate compress to delaycompress to prevent fluentd log tailing from getting stuck #2835

Merged
merged 1 commit into from
Jul 1, 2017

Conversation

r4j4h
Copy link
Contributor

@r4j4h r4j4h commented Jun 30, 2017

Fluent log tailing fails sporadically, seemingly due to logrotate setings.

Source of the setting: fluent/fluentd#780 (comment)

If this the wrong way and there are other ideas on the cause of this issue they are more than welcome! We can change this PR accordingly.


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 30, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @r4j4h. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 30, 2017
@chrislovecnm
Copy link
Contributor

@k8s-bot ok to test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 30, 2017
@justinsb
Copy link
Member

justinsb commented Jul 1, 2017

/lgtm

Thanks! This makes intuitive sense to me.

Was there an error message @r4j4h or did it just stop silently?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2017
@justinsb justinsb merged commit e27b75d into kubernetes:master Jul 1, 2017
@r4j4h
Copy link
Contributor Author

r4j4h commented Jul 3, 2017

No error messages at all, in fact I was only made aware by a team's integration test that verified a given transaction id appears in all app's logs. After a weekend or so only one pod would strangely go silent. I tracked it down by echo'ing directly into the log and knew there was an issue when sending invalid lines didn't even cause a formatting error in the log.

Initially we found rebooting fluentd did not help but restarting the pod in question did, but a patch by okkez seemed to fix that for a while. Then it happened again, but this time with fluentd restarts being enough.

I am happy to report that we haven't seen the issue since I applied the change to each node. I will report back here if it happens again.

Thanks for the quick turn around!! :)

@r4j4h
Copy link
Contributor Author

r4j4h commented Jul 5, 2017

To follow up, this has not completely solved the problem for us. It largely has, as it does not seem as frequent, but we occasionally do still find some tail processes getting stuck. :(

@nsidhaye
Copy link

If you are using copytruncate then it should not hamper in compress. Am I right?

Ref: https://manpages.debian.org/jessie/logrotate/logrotate.8.en.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants