You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that AWX is open source software provided for free and that I might not receive a timely response.
I am NOT reporting a (potential) security vulnerability. (These should be emailed to security@ansible.com instead.)
Bug Summary
One of our jobs consistently fails with this error:
Task was marked as running but was not present in the job queue, so it has been marked as failed.
We haven't been able to identify any resource crunch on k8s cluster, neither the AWX POD are running out of resources.
AWX version
21.3.0
Select the relevant components
UI
UI (tech preview)
API
Docs
Collection
CLI
Other
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
Our setup:
AKS 1.23.8
AWX Operator: 0.24.0
AWX: 21.3.0
This job is connecting to ~30 linux VMs (inventory hosts) and from each VM, contacting ~100 network devices to get output of 3 commands.
The output is being stored in a dictionary per inventory host.
The job runs okay when there are lesser network devices (upto 90ish), with always fail with 100.
As the error probably says, the issue should not be in the network or device access or anything else.
Expected results
Play runs smooth and job finishes as expected
Actual results
Job fails with error message:
Task was marked as running but was not present in the job queue, so it has been marked as failed.
Additional information
No response
The text was updated successfully, but these errors were encountered:
@deep7861 you may be running into the k8s max container log issue. Changing this max log size varies depending on your k8s cluster type, but here is a thread that explains it a bit #11338 (comment)
@fosterseth Thank you for looking into this issue.
While I try to find the log size relation, I happen to notice a strange behavior.
In some of the posts you mentioned, I saw a suggestion to check the 'result_traceback' value from /api/v2/jobs/job_id for the failed job.
Now, when I try doing it - the page doesn't load. Here is what I get:
When I try to look for that job from usual AWX UI, it fails as well:
This error appears to happen and I note below log from web container:
2023/08/08 15:28:49 [error] 33#33: *189 upstream prematurely closed connection while reading response header from upstream, client: 10.244.7.25, server: _, req │
│ 10.244.7.25 - - [08/Aug/2023:15:28:49 +0000] "GET /api/v2/unified_jobs/?name__icontains=ine_lm¬__launch_type=sync&order_by=-finished&page=1&page_size=20 HTT │
│ DAMN ! worker 5 (pid: 38) died, killed by signal 9 :( trying respawn ... │
│ Respawned uWSGI worker 5 (new pid: 70) │
│ mounting awx.wsgi:application on / │
│ WSGI app 0 (mountpoint='/') ready in 1 seconds on interpreter 0x7636d0 pid: 70 (default app)
Please confirm the following
security@ansible.com
instead.)Bug Summary
One of our jobs consistently fails with this error:
Task was marked as running but was not present in the job queue, so it has been marked as failed.
We haven't been able to identify any resource crunch on k8s cluster, neither the AWX POD are running out of resources.
AWX version
21.3.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
No response
Operating system
No response
Web browser
No response
Steps to reproduce
Our setup:
AKS 1.23.8
AWX Operator: 0.24.0
AWX: 21.3.0
This job is connecting to ~30 linux VMs (inventory hosts) and from each VM, contacting ~100 network devices to get output of 3 commands.
The output is being stored in a dictionary per inventory host.
The job runs okay when there are lesser network devices (upto 90ish), with always fail with 100.
As the error probably says, the issue should not be in the network or device access or anything else.
Expected results
Play runs smooth and job finishes as expected
Actual results
Job fails with error message:
Task was marked as running but was not present in the job queue, so it has been marked as failed.
Additional information
No response
The text was updated successfully, but these errors were encountered: