Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 client refactor #2482

Merged
merged 11 commits into from
Aug 13, 2018
Merged

S3 client refactor #2482

merged 11 commits into from
Aug 13, 2018

Conversation

gardenunez
Copy link
Contributor

Description

Small refactor over S3Client and s3_test module.

Motivation and Context

The motivation came because we are using Luigi actively in my organization and we recently use this module and I saw some small things I can update as a starting point to contribute to the project and get more deeply in near future.

Have you tested this? If so, how?

Since I haven't change any behaviour I didn't add new tests but I checked that all jobs passed in my custom TravisCI configuration.

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your first contribution @gardenunez !

While waiting on Travis, I thought I'd go ahead and provide some feedback and questions :)

return key if key[-1:] == '/' or key == '' else key + '/'

@staticmethod
def _check_deprecated_argument(arguments):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be best to make this _check_deprecated_argument(**kwargs)? to keep with Python practice


# put the file
self.put_multipart(local_path, destination_s3_path, **kwargs)

def put_string(self, content, destination_s3_path, **kwargs):
"""
Put a string to an S3 path.
:param content: Object data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should already be clear by the method name, but could we :param content: Data str or something. Object data just makes me hesitate.

@@ -188,7 +188,7 @@ def exists(self, path):
logger.debug('Path %s does not exist', path)
return False

def remove(self, path, recursive=True):
def remove(self, path, recursive=True, skip_trash=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parent function in FileSystem class has this attribute. Thought will be good to keep compatibility.

Copy link
Collaborator

@dlstadther dlstadther Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be something added to FileSystem for hadoop/hive/webhdfs long ago. It serves no purpose for S3 given that there is no "trash" for s3.

It's also worth noting that LocalFileSystem did not include this arg either.

For that, i'm going to say to exclude skip_trash for S3Client's remove().

But nice eye for checking its parent!

@dlstadther
Copy link
Collaborator

Restarting build due to Travis connection error

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dlstadther
Copy link
Collaborator

I'm going to leave this for a couple days to see if anyone else reviews. If not, then I'll merge.

Thanks @gardenunez !

@gardenunez
Copy link
Contributor Author

Agree on giving it a couple of days. Thank you for the review.

@dlstadther dlstadther merged commit f604ce4 into spotify:master Aug 13, 2018
@dlstadther
Copy link
Collaborator

Thanks @gardenunez

dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 14, 2018
* upstream-master: (82 commits)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
  Add stale config
  Move github templates to .github dir
  Fix transfer config import (spotify#2458)
  Additions to provide support for the Load Sharing Facility (LSF) job scheduler (spotify#2373)
  Version 2.7.6
  ...
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 14, 2018
* upstream-master:
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 16, 2018
* upstream-master:
  Remove long-deprecated scheduler config variable alternatives (spotify#2491)
  Bump tornado milestone version (spotify#2490)
  Update moto to 1.x milestone version (spotify#2471)
  Use passed password when create a redis connection (spotify#2489)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants