Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod logs stop being pulled when container log files are rotated #446

Open
oliverf1 opened this issue Oct 15, 2021 · 7 comments
Open

Pod logs stop being pulled when container log files are rotated #446

oliverf1 opened this issue Oct 15, 2021 · 7 comments

Comments

@oliverf1
Copy link

oliverf1 commented Oct 15, 2021

As a followup of https://github.com/ansible/awx/issues/10366.\

Receptor is pulling logs from the pods using the kubernetes API log call

logreq := kw.clientset.CoreV1().Pods(kw.pod.ObjectMeta.Namespace).GetLogs(kw.pod.Name, &corev1.PodLogOptions{
Container: "worker",
Follow: true,
})
.
However as explained in kubernetes/kubernetes#59902, if the container log files (on the node) are rotated the log stream stops. As a consequence logs are not fully sent to AWX database. In such case, the AWX jobs will be flagged as failed, even though it went to the end, because the whole logs are not available.

@oliverf1 oliverf1 changed the title @nicovs That is a very good catch! I can confirm that indeed kubectl -f logs stops when the log file at the node level are rotated. Pod logs stop being pulled when container log files are rotated Oct 15, 2021
@shanemcd
Copy link
Member

Related: ansible/awx#11338

@shanemcd
Copy link
Member

Trying to fix this here: #683

@domq
Copy link

domq commented Jul 5, 2023

Trying to fix this here: #683

I seem to still get only the first chunk of logs. Running AWX receptor version 1.4.1 against OpenShift 3.11's (I know, I know) Kubernetes 1.11.0+d4cacc0. (Edit: this appears to not be a timing issue, as I initially thought.)

It is worth noting that the /tmp/receptor/awx-0/*/stdout files do not end with (or contain) an { "eof": true } marker in this case, indicating that Receptor at least understands that it has work remaining. If that latter fact means that I should open another issue, kindly let me know.

@Klaas-
Copy link

Klaas- commented Jul 19, 2023

Crosslinking ansible/awx#14158 I am guessing this may have the same root cause?

@Klaas-
Copy link

Klaas- commented Jul 19, 2023

and upstream has a PR for something that sounds related: kubernetes/kubernetes#118500

@Klaas-
Copy link

Klaas- commented Oct 18, 2023

A potential fix was merged in upstream kubernetes in kubernetes/kubernetes#115702

@Klaas-
Copy link

Klaas- commented Jan 9, 2024

I updated kubernetes to 1.29 -- this seems to fix the issues for me

domq pushed a commit to epfl-si/wp-ops that referenced this issue Sep 27, 2024
As seen in ansible/awx#11338 and ansible/receptor#446

- Force `RECEPTOR_KUBE_SUPPORT_RECONNECT` as per ansible/receptor#683
- Pump up timeouts thereof
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants