-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make storage upload have no chunk size by default. #606
Conversation
It hasn't been merged into the mainline yet, but I implemented a chunksize=None approach for uploads in gsutil's fork of apitools: https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/third_party/storage_apitools/transfer.py#L773 However, because you're using httplib2, there are a number of other issues that you'll need to solve to get this to work. https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/gcs_json_media.py has some examples. |
As for chunk size in general, the ideal thing is not to use chunks for uploads or downloads at all unless you have a specific reason to do so (for example, buffering a streaming transfer). |
Rather than passing in an |
Are we leaving the blob's own CHUNK_SIZE around, but just using it for downloads? Should we be making it an explicit parameter there, too? |
I plan on implementing the default I just wanted to make sure passing |
so i've finally got myself an official repo for apitools; i'm planning on finally doing a round-robin merge to get all the copies back in sync. in particular:
|
@craigcitro w00t! Thanks for the news. The main question is
which @thobrla indicates is only in upstream ( |
Ping @craigcitro
|
sorry, i missed this question earlier in the thread. i believe the code you're looking for is still upstream in gsutil, but i'm working on getting the various versions of apitools unified. |
@craigcitro Do you need help on the unification? |
UPDATE: Chatting with @craigcitro today (March 2, 2015) it seems the @tseaver Do you want to vendor in the latest updates from |
Yeah, I'll vendor it in. |
@craigcitro I've done the re-vendoring, but now lack context to get the last set of tests passing: ======================================================================
ERROR: test_upload_from_file_resumable (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 418, in test_upload_from_file_resumable
blob.upload_from_file(fh, rewind=True)
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/blob.py", line 354, in upload_from_file
finish_callback=lambda *args: None)
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/_gcloud_vendor/apitools/base/py/transfer.py", line 804, in StreamInChunks
additional_headers=additional_headers)
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/_gcloud_vendor/apitools/base/py/transfer.py", line 764, in __StreamMedia
'%d' % self.progress)
CommunicationError: Failed to transfer all bytes in chunk, upload paused at byte 4
======================================================================
FAIL: test_download_as_string (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 328, in test_download_as_string
self.assertEqual(fetched, b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef
======================================================================
FAIL: test_download_to_file (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 272, in test_download_to_file
self.assertEqual(fh.getvalue(), b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef
======================================================================
FAIL: test_download_to_filename (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 307, in test_download_to_filename
self.assertEqual(wrote, b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef
---------------------------------------------------------------------- It looks as though code which deals with explicitly-chunked stuff from the server side is now broken. |
is the easiest way for me to play with it just to clone your branch? |
Likely so: I don't want to make it a PR here until things pass. Just run |
@tseaver @craigcitro I'm happy to take up the vendor-ing in process of |
i think he was hitting test failures on upload/download; i'm intending to try it, but i've been believing that for 3+ weeks. 😉 |
I can babysit you if you like. |
my issue is that it hasn't hit the top of my todo list -- currently heads-down on something else completely. gsutil is currently using the version of apitools at master, so i'm reasonably confident in that code. i just need to drop the new version in and see why it's failing for veneer. |
Cool |
#755 will lead to de-vendoring Once that lands, we can re-visit this one. |
I agree. After hacking on #754 and a late night chat with Craig, it seems that is the best way forward. |
Yes this PR is the one that made us realize #811 was "necessary" (more like was an option). |
Closing this PR and going to take the new approach of just making |
Fixes #546.
@tseaver @thobrla
I took a stab at the issue but am still unsure of a few things:
apitools
actually attempt a full upload whenchunksize=None
? (/cc @craigcitro)use_chunked=False
and then just use the chunk size on theBlob
?Also @thobrla I'm curious, RE: "We should allow the user to override the chunk size". Do we need a default chunk size for downloads? In the case that someone creates an object with a custom chunk size, should we use it in this method or still try the "all in one go" approach?