You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran metaflow configure aws, configuring METAFLOW_DATASTORE_SYSROOT_S3 and METAFLOW_DATATOOLS_SYSROOT_S3 and setting METAFLOW_DEFAULT_DATASTORE to s3.
After, I ran python myflow.py run, and the process appeared to hang:
Metaflow 2.5.0 executing MyFlow for user:developer
Execute before run...
Validating your flow...
The graph looks good!
Running pylint...
Pylint is happy!
I added some debug logging to _one_boto_op. It looks like S3 access is silently failing and retrying for up to 5 minutes.
Retry #0
Sleeping for 7 seconds...
Retry #1
Sleeping for 2 seconds...
Retry #2
Sleeping for 11 seconds...
Retry #3
Sleeping for 18 seconds...
Retry #4
Sleeping for 18 seconds...
Retry #5
Sleeping for 34 seconds...
Retry #6
Sleeping for 65 seconds...
Retry #7
Sleeping for 133 seconds...
We ran out of attempts (and slept for 288 seconds)!
S3 access failed:
S3 operation failed.
Key requested: s3://metaflow-artifact-store/flows/MyFlow/1644904607105118/_parameters/0/0.attempt.json
On top of this, if I set METAFLOW_S3_RETRY_COUNT=0 to mitigate the retry behavior, I still need to wait for the sleep:
Retry #0
Sleeping for 8 seconds...
We ran out of attempts (and slept for 8 seconds)!
S3 access failed:
S3 operation failed.
Key requested: s3://metaflow-artifact-store/flows/MyFlow/1644905069513467/_parameters/0/0.attempt.json
Error: Unable to locate credentials
Expected Behavior
I was surprised that I had to wait almost 5 minutes for the process to fail and let me know I have an AWS configuration error.
I would expect setting the retry count to 0 (effectively turning it off) to also disable the sleep behavior.
The text was updated successfully, but these errors were encountered:
Problem
I ran
metaflow configure aws
, configuringMETAFLOW_DATASTORE_SYSROOT_S3
andMETAFLOW_DATATOOLS_SYSROOT_S3
and settingMETAFLOW_DEFAULT_DATASTORE
tos3
.After, I ran
python myflow.py run
, and the process appeared to hang:I added some debug logging to
_one_boto_op
. It looks like S3 access is silently failing and retrying for up to 5 minutes.On top of this, if I set
METAFLOW_S3_RETRY_COUNT=0
to mitigate the retry behavior, I still need to wait for the sleep:Expected Behavior
I was surprised that I had to wait almost 5 minutes for the process to fail and let me know I have an AWS configuration error.
I would expect setting the retry count to 0 (effectively turning it off) to also disable the sleep behavior.
The text was updated successfully, but these errors were encountered: