-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long job marked as error because logs stop to be retrieved while job continues and complete successfully #11451
Comments
Do you have the error that the job shows? There is an API field |
my problem from #11473 now is this one here as well (after setting up a new server and upgrading from 19.3.0 to 19.5.0. Dec 23 09:40:45 vpadm002 k3s[915]: I1223 09:40:45.329466 915 topology_manager.go:187] "Topology Admit Handler" |
set up with https://github.com/kurokobo/awx-on-k3s there are jobs on my instance that run okay that are even longer (> 300 seconds) "status": "error", another try |
Sometimes the rotation of the logs by kubelet causes similar issue. |
a very similar problem with maybe the same cause is now happening with a workflow. |
This is a duplicate of either #11594 (comment) or #11338 |
Please confirm the following
Summary
Dear AWX-team,
I am running AWX within a Kubernetes Cluster, job are run on custom awx-ee made with ansible-builder (ansible-runner:stable-2.10-devel + few tools needed by playbooks)
All jobs are working fine, however a long job is repeatedly marked as Error.
Checking the job logs there is no failure but an incomplete log, stopped in the middle of a task.
Monitoring the job pod directly using kubectl i see it completed the playbook successfully what match the generated artifacts i found from tasks supposedly not executed from AWX and not the display of awx.
AWX version
19.5.0
Installation method
kubernetes
Modifications
yes
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
Trigger a long job and wait for it to be marked as error
check the container trough kubectl or any other tool and see it keeps running.
Expected results
The log display in AWX to be complete and status marked as real status.
Actual results
Awx job display incomplete log, Error status
Additional information
custom runner images that includes some tools and collections required by playbooks
The text was updated successfully, but these errors were encountered: