-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 client refactor #2482
S3 client refactor #2482
Conversation
This reverts commit dfd7bc5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your first contribution @gardenunez !
While waiting on Travis, I thought I'd go ahead and provide some feedback and questions :)
luigi/contrib/s3.py
Outdated
return key if key[-1:] == '/' or key == '' else key + '/' | ||
|
||
@staticmethod | ||
def _check_deprecated_argument(arguments): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be best to make this _check_deprecated_argument(**kwargs)
? to keep with Python practice
luigi/contrib/s3.py
Outdated
|
||
# put the file | ||
self.put_multipart(local_path, destination_s3_path, **kwargs) | ||
|
||
def put_string(self, content, destination_s3_path, **kwargs): | ||
""" | ||
Put a string to an S3 path. | ||
:param content: Object data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should already be clear by the method name, but could we :param content: Data str
or something. Object data
just makes me hesitate.
luigi/contrib/s3.py
Outdated
@@ -188,7 +188,7 @@ def exists(self, path): | |||
logger.debug('Path %s does not exist', path) | |||
return False | |||
|
|||
def remove(self, path, recursive=True): | |||
def remove(self, path, recursive=True, skip_trash=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parent function in FileSystem class has this attribute. Thought will be good to keep compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be something added to FileSystem for hadoop/hive/webhdfs long ago. It serves no purpose for S3 given that there is no "trash" for s3.
It's also worth noting that LocalFileSystem did not include this arg either.
For that, i'm going to say to exclude skip_trash
for S3Client's remove()
.
But nice eye for checking its parent!
Restarting build due to Travis connection error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm going to leave this for a couple days to see if anyone else reviews. If not, then I'll merge. Thanks @gardenunez ! |
Agree on giving it a couple of days. Thank you for the review. |
Thanks @gardenunez |
* upstream-master: (82 commits) S3 client refactor (spotify#2482) Rename to rpc_log_retries, and make it apply to all the logging involved Factor log_exceptions into a configuration parameter Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478) Add a visiblity level for luigi.Parameters (spotify#2278) Add support for multiple requires and inherits arguments (spotify#2475) Add metadata columns to the RDBMS contrib (spotify#2440) Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477) tests: Use RunOnceTask where possible (spotify#2476) Optional TOML configs support (spotify#2457) Added default port behaviour for Redshift (spotify#2474) Add codeowners file with default and specific example (spotify#2465) Add Data Revenue to the `blogged` list (spotify#2472) Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469) Use task_id comparison in Task.__eq__. (spotify#2462) Add stale config Move github templates to .github dir Fix transfer config import (spotify#2458) Additions to provide support for the Load Sharing Facility (LSF) job scheduler (spotify#2373) Version 2.7.6 ...
* upstream-master: S3 client refactor (spotify#2482) Rename to rpc_log_retries, and make it apply to all the logging involved Factor log_exceptions into a configuration parameter Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478) Add a visiblity level for luigi.Parameters (spotify#2278) Add support for multiple requires and inherits arguments (spotify#2475) Add metadata columns to the RDBMS contrib (spotify#2440) Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477) tests: Use RunOnceTask where possible (spotify#2476) Optional TOML configs support (spotify#2457) Added default port behaviour for Redshift (spotify#2474) Add codeowners file with default and specific example (spotify#2465) Add Data Revenue to the `blogged` list (spotify#2472)
* upstream-master: Remove long-deprecated scheduler config variable alternatives (spotify#2491) Bump tornado milestone version (spotify#2490) Update moto to 1.x milestone version (spotify#2471) Use passed password when create a redis connection (spotify#2489) S3 client refactor (spotify#2482) Rename to rpc_log_retries, and make it apply to all the logging involved Factor log_exceptions into a configuration parameter Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478) Add a visiblity level for luigi.Parameters (spotify#2278) Add support for multiple requires and inherits arguments (spotify#2475) Add metadata columns to the RDBMS contrib (spotify#2440) Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477) tests: Use RunOnceTask where possible (spotify#2476) Optional TOML configs support (spotify#2457) Added default port behaviour for Redshift (spotify#2474) Add codeowners file with default and specific example (spotify#2465) Add Data Revenue to the `blogged` list (spotify#2472) Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469) Use task_id comparison in Task.__eq__. (spotify#2462)
Description
Small refactor over S3Client and s3_test module.
Motivation and Context
The motivation came because we are using Luigi actively in my organization and we recently use this module and I saw some small things I can update as a starting point to contribute to the project and get more deeply in near future.
Have you tested this? If so, how?
Since I haven't change any behaviour I didn't add new tests but I checked that all jobs passed in my custom TravisCI configuration.