Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS permission error unclear #232

Closed
mausch opened this issue Jun 25, 2020 · 3 comments · Fixed by #243
Closed

AWS permission error unclear #232

mausch opened this issue Jun 25, 2020 · 3 comments · Fixed by #243

Comments

@mausch
Copy link

mausch commented Jun 25, 2020

Got the following log from running a job on AWS Batch:

2020-06-25 11:48:12.924 Bootstrapping conda environment...(this could take a few minutes)
2020-06-25 11:48:15.650 Workflow starting (run-id 1593085694938901):
2020-06-25 11:48:15.765 [1593085694938901/start/1 (pid 13039)] Task is starting.
2020-06-25 11:48:16.772 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Task is starting (status SUBMITTED)...
2020-06-25 11:48:20.970 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Task is starting (status RUNNABLE)...
2020-06-25 11:48:51.102 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Task is starting (status RUNNABLE)...
2020-06-25 11:49:21.238 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Task is starting (status RUNNABLE)...
2020-06-25 11:49:25.868 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Task is starting (status STARTING)...
2020-06-25 11:49:52.043 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Task is starting (status RUNNING)...
2020-06-25 11:49:58.404 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Setting up task environment.
2020-06-25 11:50:11.631 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Downloading code package.
2020-06-25 11:50:11.631 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2020-06-25 11:50:21.414 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Downloading code package.
2020-06-25 11:50:21.414 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2020-06-25 11:50:31.419 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Downloading code package.
2020-06-25 11:50:35.731 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2020-06-25 11:50:46.324 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Downloading code package.
2020-06-25 11:50:46.324 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2020-06-25 11:50:56.977 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Downloading code package.
2020-06-25 11:50:56.978 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2020-06-25 11:51:06.525 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] Downloading code package.
2020-06-25 11:51:06.525 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
2020-06-25 11:51:12.959 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] tar: job.tar: Cannot open: No such file or directory
2020-06-25 11:51:12.959 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] tar: Error is not recoverable: exiting now
2020-06-25 11:51:15.041 [1593085694938901/start/1 (pid 13039)] Batch error:
2020-06-25 11:51:15.042 [1593085694938901/start/1 (pid 13039)] Essential container in task exited This could be a transient error. Use @retry to retry.

Metaflow should specify which operation specifically got a 403 (and if it's related to S3, say what the bucket+object was), to make the error easily actionable.

What I did as a workaround is search for "Downloading code package" in the github repo, found

"echo \'Downloading code package.\'; "
so I see this is about the METAFLOW_CODE_URL somehow (
.environment_variable('METAFLOW_CODE_URL', code_package_url) \
).

Needless to say, I shouldn't have to dig through the source code to understand what the problem is, the error should be more clear instead.

BTW this line: 2020-06-25 11:51:12.959 [1593085694938901/start/1 (pid 13039)] [1a261446-4a1b-46b6-a632-129a0db30abe] tar: job.tar: Cannot open: No such file or directory
seems to suggest that the code is still looking for that file even after the download failed. Instead it should abort I think.

@savingoyal
Copy link
Collaborator

Good point. We will fix this.

@mausch
Copy link
Author

mausch commented Jul 2, 2020

Thank you!

@smolendawid
Copy link

I'm trying to run the first AWS tutorial and have the same problem, what is the solution? I don't get it reading this topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants