job failed with zipfile.BadZipFile: File is not a zip file error from time to time #12343

mick1627 · 2022-06-08T11:03:54Z

Please confirm the following

I agree to follow this project's code of conduct.
I have checked the current issues for duplicates.
I understand that AWX is open source software provided for free and that I might not receive a timely response.

Summary

From time to time some jobs failed with the following error :

The job has the ID 818890, here the logs os awx_task pod

2022-06-08 08:59:57,895 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 acknowledged
2022-06-08 08:59:57,998 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.main.scheduler Spawned project_update 818891 (pending) as dependency of job 818890 (pending)
2022-06-08 08:59:58,099 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.main.scheduler Spawned inventory_update 818892 (pending) as dependency of job 818890 (pending)
2022-06-08 08:59:58,252 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 blocked by projectupdate-818891
2022-06-08 09:00:02,114 DEBUG    [51e06205750f454dbfe891f2d1d42b36] awx.analytics.job_lifecycle job-818890 blocked by projectupdate-818891
2022-06-08 09:00:06,457 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 blocked by inventoryupdate-818892
2022-06-08 09:00:17,519 DEBUG    [caed80d3bec34e74a35bbf8ad76835e6] awx.analytics.job_lifecycle job-818890 blocked by inventoryupdate-818892
2022-06-08 09:00:20,532 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 controller node chosen
2022-06-08 09:00:20,542 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 waiting
2022-06-08 09:00:20,543 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.main.scheduler job 818890 (waiting) consumed 6 capacity units from default with prior total of 6
2022-06-08 09:00:20,658 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.main.dispatch task 83173b34-9278-46e2-83cf-f6bac77638c5 starting awx.main.tasks.RunJob(*[818890])
2022-06-08 09:00:20,944 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 pre run
2022-06-08 09:00:20,967 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.main.tasks Skipping project sync for job 818890 (running) because commit is locally available
2022-06-08 09:00:26,243 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 preparing playbook
2022-06-08 09:00:26,465 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 running playbook
2022-06-08 09:01:01,616 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 work unit id received
2022-06-08 09:01:01,729 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 work unit id assigned
2022-06-08 09:01:04,360 INFO     [bd8cd35bcf2346ab948cf830785023cc] awx.main.commands.run_callback_receiver Event processing is finished for Job 818890, sending notifications
2022-06-08 09:01:04,360 INFO     [bd8cd35bcf2346ab948cf830785023cc] awx.main.commands.run_callback_receiver Event processing is finished for Job 818890, sending notifications
2022-06-08 09:01:07,887 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.main.tasks job 818890 (running) finished running, producing 0 events.
2022-06-08 09:01:07,895 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 post run
2022-06-08 09:01:08,212 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.analytics.job_lifecycle job-818890 finalize run
2022-06-08 09:01:08,443 WARNING  [bd8cd35bcf2346ab948cf830785023cc] awx.main.dispatch job 818890 (error) encountered an error (rc=None), please see task stdout for details.
2022-06-08 09:01:08,446 DEBUG    [bd8cd35bcf2346ab948cf830785023cc] awx.main.tasks Executing error task id 83173b34-9278-46e2-83cf-f6bac77638c5, subtasks: [{'type': 'job', 'id': 818890}]

In the UI the last modified is 11:00:20 AM (CET) or 9:00:20 UTC

but we can see in the logs of awx_task that the job finished at 09:01:07.

AWX version

19.5.0

Select the relevant components

UI
API
Docs

Installation method

kubernetes

Modifications

yes

Ansible version

2.12

Operating system

No response

Web browser

No response

Steps to reproduce

Launch job template in AWX, from time to time it failed with this zipfile.BadZipFile: File is not a zip file error

Expected results

Job launch correctly

Actual results

job failed with zipfile.BadZipFile error

Additional information

WE use custom awx EE.

The text was updated successfully, but these errors were encountered:

fosterseth · 2022-06-08T15:11:17Z

We think this bug was fixed in receptor 1.2.3, please make sure this receptor version is present in your awx ee image

mick1627 · 2022-06-08T15:25:10Z

We think this bug was fixed in receptor 1.2.3, please make sure this receptor version is present in your awx ee image

I have quay.io/ansible/receptor:devel in my ee repo, and the receptor version is 1.2.0+gce6c620. I will update it to use quay.io/ansible/receptor:latest to have version 1.2.3.

I'm closing this issue and reopen it if I still have this issue with receptor 1.2.3.

mick1627 · 2022-06-24T13:20:48Z

I'm reopening this issue, I still have it after the update of AWX and our custom EE :

I'm using AWX 21.1.0
Custom EE with receptor 1.2.3.

receptor --version
1.2.3

shanemcd · 2022-06-24T13:36:10Z

Did you run the above command inside of the control plane EE or inside of a pod for a running job?

mick1627 · 2022-06-24T13:42:16Z

I am using the same docker image for the control plane EE and the pod where the job is running

shanemcd · 2022-06-24T13:48:57Z

Can you share any information about the workload? How many hosts, how many tasks, how much stdout, are you gathering a lot of facts? Anything you can share might be helpful.

mick1627 · 2022-06-30T07:14:18Z

It happens randomly in different kind of job template. For the last issue, it happens on a job part of a workflow. The job was launch on 1 host, about 15 tasks and no gather facts.

stanislav-zaprudskiy · 2022-10-05T09:00:12Z

This could be ansible/ansible-runner#998

shanemcd · 2022-11-15T16:52:19Z

This might have been fixed with ansible/ansible-runner#1161

coudenysj · 2023-11-30T14:12:30Z

We still see this error quite a lot on our setup.

2023-11-30 12:37:28,292 ERROR    [fa7348566d2e4873932a160255a1a0cd] awx.main.tasks.jobs job 161809 (running) Exception occurred while running task
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/jobs.py", line 605, in run
    res = receptor_job.run()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/receptor.py", line 318, in run
    res = self._run_internal(receptor_ctl)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/receptor.py", line 393, in _run_internal
    res = processor_future.result()
  File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/utils/common.py", line 1199, in wrapper_cleanup_new_process
    return func(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/tasks/receptor.py", line 461, in processor
    return ansible_runner.interface.run(
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/ansible_runner/interface.py", line 213, in run
    r.run()
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/ansible_runner/streaming.py", line 361, in run
    self.artifacts_callback(data)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/ansible_runner/streaming.py", line 335, in artifacts_callback
    unstream_dir(self._input, length, self.artifact_dir)
  File "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/ansible_runner/utils/streaming.py", line 78, in unstream_dir
    with zipfile.ZipFile(tmp.name, "r") as archive:
  File "/usr/lib64/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/usr/lib64/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

ansible-runner --version
2.3.3

The code from https://github.com/ansible/ansible-runner/pull/1162/files is there, but we still get this error.

swiedernix · 2024-01-23T13:32:55Z

same problem came up in our setup:
AWX 23.5.1
ansible_runner 2.3.4
k3s: v1.28.4+k3s2

coudenysj · 2024-02-07T07:46:10Z

Is there any suggestion to fix this issue, because we get this quite often:

/jobs/174889 2024-02-07T05:10:17.720127Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174888 2024-02-07T05:10:17.631557Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174887 2024-02-07T05:00:27.010690Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174879 2024-02-07T04:40:26.852807Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174872 2024-02-07T04:20:26.397516Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174871 2024-02-07T04:20:26.111226Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174867 2024-02-07T04:00:26.542494Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174864 2024-02-07T03:55:41.692424Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174857 2024-02-07T03:20:26.090661Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174856 2024-02-07T03:20:25.799524Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174854 2024-02-07T03:00:26.420650Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174851 2024-02-07T02:59:55.172001Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174847 2024-02-07T02:40:25.808439Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174845 2024-02-07T02:20:26.445563Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174841 2024-02-07T02:00:27.005234Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174840 2024-02-07T02:00:26.750985Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174839 2024-02-07T02:00:26.401924Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174802 2024-02-06T14:08:10.237412Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174778 2024-02-06T12:57:13.975563Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174745 2024-02-06T11:32:42.871325Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174665 2024-02-06T08:26:04.317739Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174653 2024-02-06T08:04:31.230036Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174619 2024-02-06T08:04:16.185793Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174585 2024-02-06T05:00:26.304256Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174563 2024-02-06T03:00:27.215729Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174559 2024-02-06T02:40:26.593133Z error (`zipfile.BadZipFile: File is not a zip file`)
/jobs/174552 2024-02-06T02:00:34.138571Z error (`zipfile.BadZipFile: File is not a zip file`)

blundey · 2024-04-04T11:47:38Z

Same issue on latest version -

BadZipFile("File is not a zip file") - on random job runs. We run the same job on 300+ hosts and it may occur once in 300 times...

Any devs got time to check this?

fosterseth · 2024-04-05T15:52:18Z

@blundey

usually this relates to issues around log rotation

you can bump up your max log container size for your cluster

you should also enable reconnect

ansible/receptor#683

github-actions bot added component:ui needs_triage type:bug labels Jun 8, 2022

mick1627 closed this as completed Jun 8, 2022

mick1627 reopened this Jun 24, 2022

github-actions bot removed the type:bug label Jun 24, 2022

fosterseth added component:api and removed component:ui labels Jun 29, 2022

shanemcd removed the needs_triage label Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job failed with zipfile.BadZipFile: File is not a zip file error from time to time #12343

job failed with zipfile.BadZipFile: File is not a zip file error from time to time #12343

mick1627 commented Jun 8, 2022

fosterseth commented Jun 8, 2022

mick1627 commented Jun 8, 2022

mick1627 commented Jun 24, 2022

shanemcd commented Jun 24, 2022

mick1627 commented Jun 24, 2022

shanemcd commented Jun 24, 2022

mick1627 commented Jun 30, 2022

stanislav-zaprudskiy commented Oct 5, 2022

shanemcd commented Nov 15, 2022

coudenysj commented Nov 30, 2023

swiedernix commented Jan 23, 2024

coudenysj commented Feb 7, 2024

blundey commented Apr 4, 2024

fosterseth commented Apr 5, 2024 •

edited

Loading

job failed with zipfile.BadZipFile: File is not a zip file error from time to time #12343

job failed with zipfile.BadZipFile: File is not a zip file error from time to time #12343

Comments

mick1627 commented Jun 8, 2022

Please confirm the following

Summary

AWX version

Select the relevant components

Installation method

Modifications

Ansible version

Operating system

Web browser

Steps to reproduce

Expected results

Actual results

Additional information

fosterseth commented Jun 8, 2022

mick1627 commented Jun 8, 2022

mick1627 commented Jun 24, 2022

shanemcd commented Jun 24, 2022

mick1627 commented Jun 24, 2022

shanemcd commented Jun 24, 2022

mick1627 commented Jun 30, 2022

stanislav-zaprudskiy commented Oct 5, 2022

shanemcd commented Nov 15, 2022

coudenysj commented Nov 30, 2023

swiedernix commented Jan 23, 2024

coudenysj commented Feb 7, 2024

blundey commented Apr 4, 2024

fosterseth commented Apr 5, 2024 • edited Loading

fosterseth commented Apr 5, 2024 •

edited

Loading