Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF when many steps using same big input artifact #9525

Closed
tooptoop4 opened this issue Sep 6, 2022 · 8 comments · Fixed by #9921
Closed

EOF when many steps using same big input artifact #9525

tooptoop4 opened this issue Sep 6, 2022 · 8 comments · Fixed by #9921
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc area/docs Incorrect, missing, or mistakes in docs good first issue Good for newcomers hacktoberfest

Comments

@tooptoop4
Copy link
Contributor

tooptoop4 commented Sep 6, 2022

i have a 1.2GB artifact on s3

intermittently some of the tasks that use the same artifact as input gets below error:


Error (exit code 1): tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now

OR

Error (exit code 1): gzip: write: Out of memory tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now

logs also mention max duration:

time="2022-09-06T02:08:48.942Z" level=info msg="Processing workflow" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.945Z" level=info msg="Task-result reconciliation" namespace=auth numObjs=3 workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message="Error (exit code 1): tar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now" new.phase=Error new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-3794695805 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node unchanged" nodeID=bitbucket-workflow-20220906t12032810007frlh-2620825050
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message="Error (exit code 1): tar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now" new.phase=Error new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-217183810 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message= new.phase=Running new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-1085538644 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message="Error (exit code 1): gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now" new.phase=Error new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-3703802753 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.945Z" level=info msg="node unchanged" nodeID=bitbucket-workflow-20220906t12032810007frlh-2444850475
time="2022-09-06T02:08:48.945Z" level=info msg="node unchanged" nodeID=bitbucket-workflow-20220906t12032810007frlh-2697090281
time="2022-09-06T02:08:48.945Z" level=info msg="node changed" new.message= new.phase=Running new.progress=0/1 nodeID=bitbucket-workflow-20220906t12032810007frlh-1264146474 old.message=PodInitializing old.phase=Pending old.progress=0/1
time="2022-09-06T02:08:48.946Z" level=info msg="SG Outbound nodes of bitbucket-workflow-20220906t12032810007frlh-3571636647 are [bitbucket-workflow-20220906t12032810007frlh-2620825050]" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.946Z" level=info msg="SG Outbound nodes of bitbucket-workflow-20220906t12032810007frlh-3332293106 are [bitbucket-workflow-20220906t12032810007frlh-2697090281]" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.946Z" level=info msg="SG Outbound nodes of bitbucket-workflow-20220906t12032810007frlh-213337688 are [bitbucket-workflow-20220906t12032810007frlh-2444850475]" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-610152142 phase Running -> Error" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-610152142 message: Max duration limit exceeded" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.947Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-610152142 finished: 2022-09-06 02:08:48.947745613 +0000 UTC" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-2763505215 phase Running -> Error" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-2763505215 message: Max duration limit exceeded" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.948Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-2763505215 finished: 2022-09-06 02:08:48.948248846 +0000 UTC" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="Max duration limit exceeded. Failing..." namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-3681608794 phase Running -> Error" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-3681608794 message: Max duration limit exceeded" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh
time="2022-09-06T02:08:48.949Z" level=info msg="node bitbucket-workflow-20220906t12032810007frlh-3681608794 finished: 2022-09-06 02:08:48.949245597 +0000 UTC" namespace=auth workflow=bitbucket-workflow-20220906t12032810007frlh

another run's 'init' container logs:

time="2022-09-06T02:52:23.726Z" level=info msg="Start loading input artifacts..."
time="2022-09-06T02:52:23.726Z" level=info msg="Downloading artifact: repo"
time="2022-09-06T02:52:23.726Z" level=info msg="S3 Load path: /argo/inputs/artifacts/repo.tmp, key: argo_wf_logs/2022/09/06/02/4
9/bitbucket-workflow-20220906t12490510007g2tv/bitbucket-workflow-20220906t12490510007g2tv-3635102771/repo.tgz"
time="2022-09-06T02:52:23.726Z" level=info msg="Creating minio client using AWS SDK credentials"
time="2022-09-06T02:52:25.220Z" level=info msg="Getting file from s3" bucket=myredactbucket en
dpoint=s3.amazonaws.com key=argo_wf_logs/2022/09/06/02/49/bitbucket-workflow-20220906t12490510007g2tv/bitbucket-workflow-2022090
6t12490510007g2tv-3635102771/repo.tgz path=/argo/inputs/artifacts/repo.tmp
time="2022-09-06T02:52:38.714Z" level=info msg="Detecting if /argo/inputs/artifacts/repo.tmp is a tarball"
time="2022-09-06T02:52:38.714Z" level=info msg="tar -xf /argo/inputs/artifacts/repo.tmp -C /argo/inputs/artifacts/repo.tmpdir"
time="2022-09-06T02:52:49.455Z" level=error msg="`tar -xf /argo/inputs/artifacts/repo.tmp -C /argo/inputs/artifacts/repo.tmpdir`
 failed: gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar: Unexpected EOF in archive\ntar: Error is not recoverab
le: exiting now\n"
time="2022-09-06T02:52:49.601Z" level=error msg="executor error: gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar
: Unexpected EOF in archive\ntar: Error is not recoverable: exiting now"
time="2022-09-06T02:52:49.601Z" level=info msg="Alloc=9938 TotalAlloc=18507 Sys=29138 NumGC=5 Goroutines=4"
time="2022-09-06T02:52:49.601Z" level=fatal msg="gzip: write: Out of memory\ntar: Unexpected EOF in archive\ntar: Unexpected EOF
 in archive\ntar: Error is not recoverable: exiting now"

its a fan out workflow:
image

version 3.3.9

@hbrewster-splunk
Copy link

Have you used PodSpecPatch to increase the resource request?

@tooptoop4
Copy link
Contributor Author

@hbrewster-splunk that worked but i think the docs should be updated to mention changing that for init container on steps that use a big input artifact

                  - name: mystep
                    podSpecPatch: '{"initContainers":[{"name":"init", "resources":{"requests":{"memory": "2Gi", "cpu": "300m" },"limits":{"memory": "3Gi", "cpu": "900m" }}}]}'
                    inputs:
                      artifacts:
                        - name: repo
                          path: /repo
                    container:
                      image: blabla

@terrytangyuan
Copy link
Member

Feel free to submit a PR to improve the docs.

@terrytangyuan terrytangyuan added good first issue Good for newcomers area/docs Incorrect, missing, or mistakes in docs and removed type/bug labels Sep 7, 2022
@gkum99
Copy link

gkum99 commented Sep 9, 2022

Hi @terrytangyuan i am new to open source and want to work on this issue . Can you assign it to me

@tooptoop4
Copy link
Contributor Author

@2022H1030014G you can raise a PR to fix this even without being assigned

@stale
Copy link

stale bot commented Oct 1, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

@stale stale bot added problem/stale This has not had a response in some time and removed problem/stale This has not had a response in some time labels Oct 1, 2022
@argoproj argoproj deleted a comment from juliusvonkohout Oct 6, 2022
@juliusvonkohout
Copy link
Contributor

juliusvonkohout commented Oct 6, 2022

Hi @terrytangyuan i am new to open source and want to work on this issue . Can you assign it to me

@2022H1030014G Are you still working on it?

@terrytangyuan
Copy link
Member

Hi @terrytangyuan i am new to open source and want to work on this issue . Can you assign it to me

@2022H1030014G Are you still working on it?

If there's no PR, you can just start working on it.

sarabala1979 pushed a commit that referenced this issue Oct 29, 2022
Signed-off-by: awwwd <amitauddy94@gmail.com>
juchaosong pushed a commit to juchaosong/argo-workflows that referenced this issue Nov 3, 2022
 (argoproj#9921)

Signed-off-by: awwwd <amitauddy94@gmail.com>
Signed-off-by: juchao <juchao@coscene.io>
@agilgur5 agilgur5 added the area/artifacts S3/GCP/OSS/Git/HDFS etc label Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts S3/GCP/OSS/Git/HDFS etc area/docs Incorrect, missing, or mistakes in docs good first issue Good for newcomers hacktoberfest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants