Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix s3_put_object retries in datastore/s3.py. Also fail gracefully if… #234

Closed
wants to merge 4 commits into from

Conversation

romain-intel
Copy link
Contributor

… boto3 not installed in datatools

Uploads were not retried correctly since the buffer status was
not reset after the first attempt which led to an "IO on closed file"
error or an empty file being written to S3

… boto3 not installed in datatools

Uploads were not retried correctly since the buffer status was
not reset after the first attempt which led to an "IO on closed file"
error or an empty file being written to S3
@romain-intel romain-intel requested a review from crk-codaio June 25, 2020 23:47
@crk-codaio crk-codaio requested a review from tuulos July 6, 2020 21:55
@crk-codaio
Copy link
Contributor

Also added @tuulos who was the original author of this fix (internally) IIRC.


from .datastore import MetaflowDataStore, DataException, only_if_not_done
from ..metadata import MetaDatum
from .util.s3util import aws_retry, get_s3_client
from .. import metaflow_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this above L:20?

While on this subject do you know if there is a special format that python auto-format tools follow for organizing import statements (alphabetical, relative references last etc.)?

I usually follow alphabetical, and prefer absolute imports first then relative (like from .metadata import foo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is usually a preference. I think it is system ones first then user ones. Both in alphabetical order. Not sure about absolute vs relatives. I think once we run the auto format tools, this should reformat everything appropriately but can definitely move this one above if you want.

@@ -197,6 +203,9 @@ def __init__(self,
tmproot: (optional) Root path for temporary files (default: '.')
"""

if not boto_found:
MetaflowException("You need to install 'boto3' in order to use S3.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you missing a raise here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Good catch.

Copy link
Contributor

@crk-codaio crk-codaio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after addressing minor comments.

@tuulos
Copy link
Collaborator

tuulos commented Jul 23, 2020

lgtm

Copy link
Collaborator

@savingoyal savingoyal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Merge to master!

@romain-intel
Copy link
Contributor Author

Superseded by #288

@savingoyal savingoyal deleted the fix-s3-datatools branch August 19, 2020 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants