Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix S3Client.copy return value consistency #2488

Merged
merged 7 commits into from
Sep 1, 2018

Conversation

dlstadther
Copy link
Collaborator

@dlstadther dlstadther commented Aug 13, 2018

Description

Adds support for num/size copy response for single-file copies.

Motivation and Context

S3Client.copy doesn't return a tuple if not copying a directory. This patch adds support for num/size tuple copy response for non-directories.

Have you tested this? If so, how?

Adds assertions for minimum values to existing copy tests.
All s3_test.py tests pass locally.

@dlstadther
Copy link
Collaborator Author

Just tagging a couple other recent luigi/contrib/s3.py contributors to quicken the review process. Your review would be appreciated!

@gardenunez @ouanixi

Copy link
Contributor

@gardenunez gardenunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple of non-blocking comments.
Another suggestion would be to extract copy-dir and copy-file logic inside copy to a separate function. Maybe even to have copy_dir and copy_file functions instead of copy, this one I still not sure.

@@ -527,3 +511,11 @@ def _run_multipart_test(self, part_size, file_size, **kwargs):
key_size = s3_client.get_key(s3_path).size
self.assertEqual(file_size, key_size)
tmp_file.close()

def _run_copy_response_test(self, response):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find the way to check that the size is strictly equal to the sum instead of using GreaterEqual. Same thing to the number of items.

@@ -454,15 +453,17 @@ def test_copy_dir(self):
self.assertTrue(s3_client.exists(file_path))

s3_dest = 's3://mybucket/copydir_new/'
s3_client.copy(s3_dir, s3_dest, threads=10, part_size=copy_part_size)
response = s3_client.copy(s3_dir, s3_dest, threads=10, part_size=copy_part_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have a test for empty directory and empty file cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already exists a test_copy_empty_file, but I'll add a test for empty directory

@dlstadther
Copy link
Collaborator Author

@gardenunez Thanks for the review and suggestions!

Another suggestion would be to extract copy-dir and copy-file logic inside copy to a separate function. Maybe even to have copy_dir and copy_file functions instead of copy, this one I still not sure.

I have no opposition to this refactor, but it is outside of the scope of this PR. Feel free to contribute it separately though :)

@@ -425,6 +424,26 @@ def test_copy_empty_file(self):
"""
self._run_copy_test(self.test_put_multipart_empty_file)

@mock_s3
@skipOnTravis('https://travis-ci.org/spotify/luigi/jobs/145895385')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Copy/paste. Will remove to try to green build without it.

Copy link
Contributor

@gardenunez gardenunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


return 1, key_size

def _run_copy_response_test(self, response, expected_num=None, expected_size=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just skimmed this as I don't know the s3 module.

@@ -235,7 +235,7 @@ def test_put_multipart_less_than_split_size(self):
# 5MB is minimum part size
part_size = 8388608
file_size = 5000
self._run_multipart_test(part_size, file_size)
return self._run_multipart_test(part_size, file_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you return the last thing you do in a test case??? I thought you only used self.assertBlah()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_put_multipart... tests are used below as args to self._run_copy_test(...) and I need the output of self._run_multipart_test to verify if the correct num and size of files were copied

@gardenunez gardenunez mentioned this pull request Aug 28, 2018
@dlstadther dlstadther merged commit 30d97e4 into spotify:master Sep 1, 2018
@dlstadther dlstadther deleted the feature/s3-copy-size branch September 1, 2018 01:46
dlstadther added a commit to dlstadther/luigi that referenced this pull request Sep 1, 2018
* upstream-master:
  Fix S3Client.copy return value consistency (spotify#2488)
  s3client check for deprecated host keyword and raise error with the details (spotify#2493)
  Fix exception when toml lib is not installed (spotify#2506)
  Add Okko to companies that use luigi (spotify#2512)
  Added optional choice for hdfs clients  (spotify#2487)
  Version 2.7.8
  revert tornado upgrade
  Version 2.7.7
  added a new event 'progress' (spotify#2498)
  Add Uppsala University / pharmb.io as user (spotify#2496)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants