Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential requests when downloading file #1338

Closed
mattzeunert opened this issue Nov 14, 2020 · 10 comments · Fixed by #1419
Closed

Sequential requests when downloading file #1338

mattzeunert opened this issue Nov 14, 2020 · 10 comments · Fixed by #1419
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@mattzeunert
Copy link

I'm using the File.download method and was wondering if its performance could be improved.

Currently it makes two requests sequentially:

  1. /storage/.../key?alt=media to download the file
  2. /storage/.../key to download file metadata

Would it be possible to make those requests in parallel? The second request adds another 20 - 250ms to the download.

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Nov 14, 2020
@stephenplusplus
Copy link
Contributor

I believe this is happening because of our check for data integrity after the download is completed. With the latest metadata, we get the most recent hash of the remote object. It does indeed seem that we could make those requests in parallel, getting the metadata at the same time the download request starts. Thank you for the suggestion, I'll take a look at this soon!

@stephenplusplus stephenplusplus self-assigned this Nov 14, 2020
@stephenplusplus stephenplusplus added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Nov 14, 2020
@danvk
Copy link

danvk commented Dec 11, 2020

Even better, could we make this second request optional? When you download a file, you just get the Buffer, you don't get the metadata. I'd rather not pay the cost of getting the metadata just on the off chance I might use it.

@stephenplusplus
Copy link
Contributor

@danvk we do the metadata request for the sake of data integrity-- did we give you the exact file the server is storing. However, we should definitely stop making that request if you disable validation (file.createReadStream({validation: false}).

@danvk
Copy link

danvk commented Dec 11, 2020

Yes, I appreciate your concern for my data integrity :) I'd just like an option to let you play fast and loose.

Here's my hack to disable this second RPC. ⚠️ Use at your own peril! ⚠️

/**
 * This is a hack to avoid issuing a second RPC to cloud storage for file metadata that we don't
 * use. This RPC is done in serial after the content is downloaded and is unconditional.
 * Whatever you do, don't use the file object after you patch it, other than to call
 * createReadStream or download!
 */
export function patchFileForSpeedyDownload(file: File) {
  // See https://github.com/googleapis/nodejs-storage/issues/1338
  (file.getMetadata as any) = function (
    this: File,
    options: any,
    callback?: (...args: any[]) => void,
  ) {
    this.metadata = {};
    if (callback) {
      callback(null, null, null);
    } else {
      return Promise.resolve([null, null]);
    }
  };
}

Make sure to set {validation: false} if you use this.

@danvk
Copy link

danvk commented Feb 5, 2021

I was following through some @google-cloud/storage code in my debugger today and noticed that the ?alt=media request already returns the crc/md5 hashes in response headers (x-goog-hash):

image

So even when you are doing validation, the second metadata request isn't needed.

@stephenplusplus
Copy link
Contributor

Great find. PR sent: #1419

@danvk
Copy link

danvk commented Mar 24, 2021

And this is now released in 5.8.2. Thanks for pushing this through @stephenplusplus, I'm very excited to delete my patchFileForSpeedyDownload hack :)

@mattzeunert
Copy link
Author

Thanks @stephenplusplus for implementing this, and thanks @danvk for pointing out the second request isn't necessary!

Just deployed this earlier and reads are now 50% faster.

Screenshot 2021-03-25 at 09 09 01

Also seems like some metadata requests were really slow, and now the max duration is down significantly.

Screenshot 2021-03-25 at 09 08 50

@danvk
Copy link

danvk commented Mar 26, 2021

@mattzeunert what monitoring tool is that? I'd like something similar for my site!

@mattzeunert
Copy link
Author

@danvk It's a Grafana dashboard running on Hosted Metrics. I think Grafana also has their own Cloud offering too.

gcf-owl-bot bot added a commit that referenced this issue Feb 3, 2022
Source-Link: googleapis/synthtool@0b08c9e
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-nodejs:latest@sha256:57c5621efda41e5d0a66b2ed0177ac9a45e50bbaaf17d7b0e9b4989885898fd6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants