-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job output no complete and appear on ERROR in UI Exceeded retries for reading stdout #11803
Comments
fixed with kubelet --container-log-max-size |
We observe the same symptoms once in a while, both on 19.4 and 20.0. It is not reproducible however, and the same job just works fine on re-run. Also, the mentioned log entries show up regularly on AWX EE container regardless of whether there were problems with jobs or not. We also tried experimenting with forks (not slices), but had no any luck (even though it helped to fix some re-occurring case once). Finally, we tried to reproduce "log rotation" problem, by running a job in debug and on many hosts - but then it did not reproduce. What exactly do you mean by saying
is it about the amount of output (seems to not make any difference for us), amount of hosts (again, doesn't seem to impact us), or execution time (some jobs failed within first minute, others on their 20th minute)? |
@stanislav-zaprudskiy yes for me is based on size log, with many hosts and many tasks the log size is bigger, |
We were running into the same issue with AWX Operator running on Amazon EKS. We finally managed to resolve it by adding this to our config as described here: apiVersion: awx.ansible.com/v1beta1
kind: AWX
spec:
ee_extra_env: |
- name: RECEPTOR_KUBE_SUPPORT_RECONNECT
value: disabled Our Kubernetes server version: serverVersion:
gitCommit: abb98ec0631dfe573ec5eae40dc48fd8f2017424
gitVersion: v1.24.8-eks-ffeb93d We had to disable the new reconnect behavior of receptor, because it does not seem to work with EKS even though it is supposed to be compatible with Kubernetes version 1.24.8 and later. |
How do I set RECEPTOR_KUBE_SUPPORT_RECONNECT to disabled for a custom pod spec? |
Please confirm the following
Summary
Hello
With large job execution, the output on UI is not complete, and he finish with error
On EE worker the job is complete with no probleme
on EE awx see max retries for reading output
On TASK error appear but with no details
AWX version
19.3
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
2.9
Operating system
redhat 8.4
Web browser
Chrome
Steps to reproduce
launch large job
Expected results
fulll output with no error
Actual results
split output and job on error
Additional information
with many tries , the output job stop always in the same line
The text was updated successfully, but these errors were encountered: