Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORAGE: Send blob metadata in upload requests #754

Closed
dhermes opened this issue Mar 24, 2015 · 13 comments
Closed

STORAGE: Send blob metadata in upload requests #754

dhermes opened this issue Mar 24, 2015 · 13 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@dhermes
Copy link
Contributor

dhermes commented Mar 24, 2015

When self._changes is not empty, we should upgrade a media request to a multipart upload:

If you have metadata that you want to send along with the data to upload, you can make a single multipart/related request. This is a good choice if the data you are sending is small enough to upload again in its entirety if the connection fails.

whereas for resumable uploads, we should

Make an initial request to the upload URI that includes the metadata, if any.

@craigcitro Is there a way to set the body of the initial request in a resumable upload?

H/T to @pdknsk for #536

@dhermes dhermes added the api: storage Issues related to the Cloud Storage API. label Mar 24, 2015
@dhermes dhermes added this to the Storage Stable milestone Mar 24, 2015
@craigcitro
Copy link
Contributor

yes -- if you call upload.ConfigureRequest with an http request that has a body, it'll go ahead and set up the multipart upload for you:
https://github.com/google/apitools/blob/3ea053daf8108da4d2f8d08243bad9c53656c375/apitools/base/py/transfer.py#L580 (edited link by DJH)

so i remember there was a reason that you guys didn't just want to use a generated storage client (which would handle these sorts of details for you); what was it?

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

Thanks for the pointer for sending a body.

As for using generated client, I don't remember there being a reason, but I'd guess it was Python 3 support and protorpc.

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

I expected

diff --git a/gcloud/storage/blob.py b/gcloud/storage/blob.py
index ac3f9ce..c7e1336 100644
--- a/gcloud/storage/blob.py
+++ b/gcloud/storage/blob.py
@@ -340,7 +340,9 @@ class Blob(_PropertyMixin):
                                         path=self.bucket.path + '/o')

         # Use apitools 'Upload' facility.
-        request = http_wrapper.Request(upload_url, 'POST', headers)
+        body = '' if not self._changes else json.dumps(self._properties)
+        request = http_wrapper.Request(url=upload_url, http_method='POST',
+                                       headers=headers, body=body)

         upload.ConfigureRequest(upload_config, request, url_builder)
         query_params = url_builder.query_params

to handle it, but am getting

{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "badContent",
    "message": "Unsupported content with type: application/octet-stream"
   }
  ],
  "code": 400,
  "message": "Unsupported content with type: application/octet-stream"
 }
}

when setting content_type in the upload. This seems to be a problem with the API, not the code. Am I missing something?


URI: 'https://www.googleapis.com/upload/storage/v1/b/{BUCKET}/o?uploadType=resumable&name=LargeFile'
METHOD: 'POST'
BODY: '{"metadata": "bar"}'
HEADERS: {
  "Accept": "application/json",
  "Accept-Encoding": "gzip, deflate",
  "Authorization": "Bearer {TOKEN}",
  "User-Agent": "gcloud-python/0.4.3",
  "X-Upload-Content-Length": "5254113",
  "X-Upload-Content-Type": "application/octet-stream",
  "content-length": "19"
}
REDIRECTIONS: 5
CONNECTION_TYPE: None

It also failed with a body of {"contentType": "application/zip"} which causes "X-Upload-Content-Type": "application/zip"

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

Also

upload.ConfigureRequest(upload_config, request, url_builder)

fails on a multipart upload because no content type is there:

.../gcloud/storage/blob.pyc in upload_from_file(self, file_obj, rewind, size, content_type, num_retri
es)
    345                                        headers=headers, body=body)
    346
--> 347         upload.ConfigureRequest(upload_config, request, url_builder)
    348         query_params = url_builder.query_params
    349         base_url = conn.API_BASE_URL + '/upload'

.../_gcloud_vendor/apitools/base/py/transfer.pyc in ConfigureRequest(self, upload_config, http_request, url_builder)
    533       if http_request.body:
    534         url_builder.query_params['uploadType'] = 'multipart'
--> 535         self.__ConfigureMultipartRequest(http_request)
    536       else:
    537         url_builder.query_params['uploadType'] = 'media'

.../_gcloud_vendor/apitools/base/py/transfer.pyc in __ConfigureMultipartRequest(self, http_request)
    556     # attach the body as one part
    557     msg = mime_nonmultipart.MIMENonMultipart(
--> 558         *http_request.headers['content-type'].split('/'))
    559     msg.set_payload(http_request.body)
    560     msg_root.attach(msg)

@craigcitro
Copy link
Contributor

is content-type in the request headers?

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

@craigcitro No. This is not an issue when upload.ConfigureRequest decides it should be media or resumable. Only an issue for multipart.

@craigcitro
Copy link
Contributor

so if we're doing a simple or resumable upload, we just let the content-type be in the headers for the request.

however, for multipart, we need the content-type for the body in order to be able to assemble the whole mime/multipart message. so it's looking in the headers for a content-type, which isn't there.

where are you currently setting the content-type? can we move that up?

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

It doesn't ever get put in headers. The only place it is used is in the creation of the Upload object:

        upload = transfer.Upload(file_obj, content_type, total_bytes,
                                 auto_transfer=False,
                                 chunksize=self.CHUNK_SIZE)

@craigcitro
Copy link
Contributor

ah -- right, i got confused about which content-type we were looking for.

the upload knows its content-type, and it'll set it. the one we're missing is the content type for the message body, which is just application/json. we just need to add that in the headers before we initialize.

(normally this is done when apitools converts the proto message body into json:
https://github.com/google/apitools/blob/3ea053daf8108da4d2f8d08243bad9c53656c375/apitools/base/py/base_api.py#L537
)

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

OK Cool. Pressing pause on wrestling this needlessly. Filed #755.

@cristicbz
Copy link

Now that the migration to apitools is complete, this should be unblocked, right?

@tarrencev
Copy link

Would love this functionality

@tseaver
Copy link
Contributor

tseaver commented Jun 5, 2017

#3362 includes existing blob properties with the initial (or only) upload request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

6 participants