Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(perf): pull hashes without refreshing metadata #1419

Merged
merged 2 commits into from
Mar 23, 2021

Conversation

stephenplusplus
Copy link
Contributor

Fixes #1338

This PR narrows the conditions for when a metadata refresh is required during a file download.

Before:

  1. user calls file.createReadStream()
  2. file finishes downloading
  3. we call file.getMetadata() to get the remote file's hashes, even if the user disabled data integrity (validation: false)
  4. if validation wasn't explicitly disabled, we confirm data integrity, comparing locally computed hashes vs. hashes from the API

After:

  1. user calls file.createReadStream()
  2. our API call to begin the download grabs the remote file's hashes from the response headers (x-goog-hash)
  3. file finishes downloading
  4. if validation wasn't explicitly disabled, we confirm data integrity, comparing locally computed hashes vs. hashes from the first call to the API (step 2)

So, one less API call.

@stephenplusplus stephenplusplus requested a review from a team March 10, 2021 01:37
@stephenplusplus stephenplusplus requested a review from a team as a code owner March 10, 2021 01:37
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 10, 2021
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Mar 10, 2021
@stephenplusplus stephenplusplus force-pushed the spp--1338 branch 2 times, most recently from 9f35847 to 809acc2 Compare March 10, 2021 01:39
@codecov
Copy link

codecov bot commented Mar 10, 2021

Codecov Report

Merging #1419 (1d79a1b) into master (d0c165c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1419   +/-   ##
=======================================
  Coverage   99.10%   99.10%           
=======================================
  Files          14       14           
  Lines       12145    12155   +10     
  Branches      529      530    +1     
=======================================
+ Hits        12036    12046   +10     
  Misses        109      109           
Impacted Files Coverage Δ
src/file.ts 99.94% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0c165c...1d79a1b. Read the comment docs.

const delimiterIndex = hashKeyValPair.indexOf('=');
const hashType = hashKeyValPair.substr(0, delimiterIndex);
const hashValue = hashKeyValPair.substr(delimiterIndex + 1);
hashes[hashType as 'crc32c' | 'md5'] = hashValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed this PR with @shaffeeullah on md5 hashes she'll follow-up on remaining Node.js specific changes:

  1. Composed objects will not support md5. Could you add a warning / message to users that it can't be validated because it's a composed object (only md5: https://cloud.google.com/storage/docs/composite-objects#metadata).
  2. Could you also add a unit test for this case?

Example of a non-composed GCS Object response headers:

HTTP/1.1 200 OK
X-goog-metageneration: 53
Content-location: https://storage.googleapis.com/storage/v1/b/anima-frank/o/coderfrank?alt=media
Content-disposition: attachment
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Vary: Origin,X-Origin
Content-length: 0
Cache-control: no-cache, no-store, max-age=0, must-revalidate
Etag: CNerzaGOku8CEDU=
X-goog-generation: 1614705373435351
Pragma: no-cache
X-guploader-uploadid: ABg5-UzkhLZtBh_nh_MyJJUrxf2cptQxl5InjXd8jYRhUN31wS_DHiHBctrKCu7RLyi0q_5cZvAdHpj1D7nS9FsgsbI
Date: Mon, 15 Mar 2021 20:48:06 GMT
X-goog-hash: crc32c=AAAAAA==,md5=1B2M2Y8AsgTpgAmY7PhCfg==
Content-type: application/octet-stream
X-goog-storage-class: MULTI_REGIONAL

Example of a composed GCS Object response headers:

HTTP/1.1 200 OK
X-goog-metageneration: 1
Content-location: https://storage.googleapis.com/storage/v1/b/anima-frank/o/coderfrank-composed?alt=media
Content-disposition: attachment
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Vary: Origin,X-Origin
Content-length: 0
Cache-control: no-cache, no-store, max-age=0, must-revalidate
Etag: CNr8/deVs+8CEAE=
X-goog-generation: 1615841237892698
Pragma: no-cache
X-guploader-uploadid: ABg5-UzkOS0YokMzs44P_reufnM5l2idHXaVhKKbsjyQr1OzvgqZtOUIEkahxuFplJfkcylpFQ_4GyEq98JbWocEvt4
Date: Mon, 15 Mar 2021 20:48:59 GMT
X-goog-hash: crc32c=AAAAAA==
Content-type: application/octet-stream
X-goog-storage-class: MULTI_REGIONAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn’t mind tackling that in a follow-up PR, unless you feel strongly it should be included here. This change shouldn’t alter the behavior we currently have, as far as how that situation is handled.

I will add a test for if just one algorithm comes back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added.

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stephenplusplus, LGTM!

Pending @shaffeeullah approval as well.

@stephenplusplus stephenplusplus merged commit f3ec627 into googleapis:master Mar 23, 2021
gcf-owl-bot bot added a commit that referenced this pull request Apr 14, 2022
* build: add sync-repo-settings and change branch protection
Source-Link: googleapis/synthtool@ed8079c
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-nodejs:latest@sha256:80bfa0c67226453b37b501be7748b2fa2a2676cfeec0012e79e3a1a8f1cbe6a3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sequential requests when downloading file
3 participants