Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/s3 copy #2508

Merged
merged 14 commits into from
Sep 28, 2018
Merged

Refactor/s3 copy #2508

merged 14 commits into from
Sep 28, 2018

Conversation

gardenunez
Copy link
Contributor

Description

Extract copy directory and copy file logic out of copy function in S3 client.

Motivation and Context

Is just a small refactor over copy function for having more readability and reuse copy file logic also in copy directory.
The idea came by reviewing this PR: #2488

Have you tested this? If so, how?

I didn't change any behaviour so tests remain the same.

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While your objective is completed (to separate out copy functionality into helper methods), the current state of the helper methods becomes hard to use due to the massive amount of required args.

Can you see about making the scheme more flexible?

luigi/contrib/s3.py Outdated Show resolved Hide resolved
@dlstadther
Copy link
Collaborator

Build errors appear to have been a Travis issue as they have since resolved without change on our end.

@gardenunez
Copy link
Contributor Author

I have made the changes suggested by @dlstadther. Please take a look.

@gardenunez
Copy link
Contributor Author

gardenunez commented Sep 1, 2018

While I was doing the previous changes I that we are sending to _copy_file and _copy_dir the source_path, src_bucket and src_key args, this is redundant. Maybe sending copy_source ( AKA source_path would be enough and more close to meta.client.copy definition. What do you think?

@gardenunez
Copy link
Contributor Author

Another thing is that I think copy nested directories are not supported, would that be desirable?

@dlstadther
Copy link
Collaborator

I agree on supplying source-path to method as the other parts can be created from that.

_copy_dir() already supports nesteddir copying

@gardenunez
Copy link
Contributor Author

Changes committed. @dlstadther Let me know what do you think?

luigi/contrib/s3.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else LGTM

Feel free to ignore my one last improvement request. They're both correct, one is just shorter and more clear the than other.


(src_bucket, src_key) = self._path_to_bucket_and_key(source_path)
(dst_bucket, dst_key) = self._path_to_bucket_and_key(destination_path)

# don't allow threads to be less than 3
threads = 3 if threads < 3 else threads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this line long ago and have always wanted to update it to threads = max(threads, 3). This PR seems appropriate since you've already modified other code in this method

@dlstadther dlstadther merged commit c151a5b into spotify:master Sep 28, 2018
dlstadther added a commit to dlstadther/luigi that referenced this pull request Sep 28, 2018
* upstream-master:
  Add Python 3.7 compatibility (spotify#2466)
  Refactor s3 copy into sub-methods (spotify#2508)
  Remove s3 bucket validation prior to file upload (spotify#2528)
  Version 2.7.9
  set upper bound of python-daemon
  Update MockTarget mode to accept r* or w* (spotify#2519)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants