Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automation jobs are canceled unintentionally #12677

Closed
6 of 9 tasks
klauserber opened this issue Aug 17, 2022 · 3 comments
Closed
6 of 9 tasks

Automation jobs are canceled unintentionally #12677

klauserber opened this issue Aug 17, 2022 · 3 comments

Comments

@klauserber
Copy link

klauserber commented Aug 17, 2022

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.

Bug Summary

We have a problem with sporadic errors in automations job:

  • The log output stucks. It has nothing to do with AWX stops gathering job output if kubernetes starts a new log #11338 , the container-log-max-size is already increased to 200 MB and happens in jobs with only about <200 log lines as well.
  • The Job is marked with 'error' in the UI
  • It runs till the end without errors but at the end of the container log we see a event like this: {"status": "canceled", "runner_ident": "361629"}
  • when wie re-run the job everything works well and the log shows an event like: {"status": "successful", "runner_ident": "361634"}
  • it happens in many different jobs sporadicly.
  • Nobody has canceled theses jobs.
  • The jobs are started from a workflow

AWX version

21.4.0

Select the relevant components

  • UI
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

yes

Ansible version

2.10.11

Operating system

Kubernetes 1.23.6 on Ubuntu 20.04

Web browser

Chrome

Steps to reproduce

Unfortunately we have no idea how to reproduce the error.

Expected results

All jobs, that are running without errors are have complete logs and are marked with the status 'Success'.

Actual results

see bug summary

Additional information

wie have an extended execution environment images with some additional binary dependency (terraform, kubectl, helm ...), built like the original awx-ee.

@shanemcd
Copy link
Member

@klauserber Are you still seeing this? If you look at /api/v2/jobs/<id> does job_explanation have anything in it?

@akus062381
Copy link
Member

Hi @klauserber!

Thank you very much for for this issue. It means a lot to us that you have taken time to contribute by opening this report.

On this issue, there were comments added but it has been some time since then without response. At this time we are closing this issue. If you get time to address the comments we can reopen the issue if you can contact us by using any of the communication methods listed in the page below:

https://github.com/ansible/awx/#get-involved

Thank you once again for this and your interest in AWX!

@klauserber
Copy link
Author

Please reopen this issue, we still have this error.

Currently we are using AWX 21.7.0

"job_explanation": "Job terminated due to error"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants