-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promtail: Restart the tailer if we fail to read and upate current position #2532
Conversation
@@ -162,8 +162,7 @@ func (t *FileTarget) run() { | |||
defer func() { | |||
helpers.LogError("closing watcher", t.watcher.Close) | |||
for _, v := range t.tails { | |||
helpers.LogError("updating tailer last position", v.markPositionAndSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this call was redundant, the first thing the stop function does is call markPositionAndSize
helpers.LogError("stopping tailer", tailer.stop) | ||
tailer.cleanup() | ||
tailer.stop() | ||
t.positions.Remove(tailer.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to be inside a function in the tailer but the correct owner of removing from positions file should be the filetarget struct IMO
} | ||
err = t.tail.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this and other cleanup functions into the defer function of the run thread such that any case where tailing stops/fails we properly cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very familiar with the code here, but seems fine to move the cleanup logic higher up as you've done. Would be nice to see a test or two.
(cherry picked from commit b6d9fd5)
I can report that this issue still happens with 1.6.1 promtail/loki. checked a bit in the code I see that go routine is exited in case of this error but I didn't notice when it will be started again. Maybe I missed something... didn't spend a lot of time here. |
I think original issue must be re-opened |
i am still encountering this issue. promtail 1.6.1, k8s(AWS EKS) 1.17
|
exactly. And after a bit you'll get 0 logs from this node unless you restart pod manually. |
There was another race condition fixed a week or so ago (#2717 ), this hopefully fixes the problems you are still seeing |
Reworked how the tailers work a bit to restart if we ever fail to read the position file.