Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(perf): pull hashes without refreshing metadata #1419

Merged
merged 2 commits into from
Mar 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 39 additions & 29 deletions src/file.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1211,8 +1211,6 @@ class File extends ServiceObject<File> {
let crc32c = true;
let md5 = false;

let refreshedMetadata = false;

if (typeof options.validation === 'string') {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
(options as any).validation = (options.validation as string).toLowerCase();
Expand All @@ -1222,6 +1220,8 @@ class File extends ServiceObject<File> {
crc32c = false;
}

const shouldRunValidation = !rangeRequest && (crc32c || md5);

if (rangeRequest) {
if (
typeof options.validation === 'string' ||
Expand Down Expand Up @@ -1268,6 +1268,8 @@ class File extends ServiceObject<File> {
qs: query,
};

const hashes: {crc32c?: string; md5?: string} = {};

this.requestStream(reqOpts)
.on('error', err => {
throughStream.destroy(err);
Expand Down Expand Up @@ -1307,10 +1309,22 @@ class File extends ServiceObject<File> {

const headers = rawResponseStream.toJSON().headers;
isServedCompressed = headers['content-encoding'] === 'gzip';
const shouldRunValidation = !rangeRequest && (crc32c || md5);
const throughStreams: Writable[] = [];

if (shouldRunValidation) {
// The x-goog-hash header should be set with a crc32c and md5 hash.
// ex: headers['x-goog-hash'] = 'crc32c=xxxx,md5=xxxx'
if (typeof headers['x-goog-hash'] === 'string') {
headers['x-goog-hash']
.split(',')
.forEach((hashKeyValPair: string) => {
const delimiterIndex = hashKeyValPair.indexOf('=');
const hashType = hashKeyValPair.substr(0, delimiterIndex);
const hashValue = hashKeyValPair.substr(delimiterIndex + 1);
hashes[hashType as 'crc32c' | 'md5'] = hashValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed this PR with @shaffeeullah on md5 hashes she'll follow-up on remaining Node.js specific changes:

  1. Composed objects will not support md5. Could you add a warning / message to users that it can't be validated because it's a composed object (only md5: https://cloud.google.com/storage/docs/composite-objects#metadata).
  2. Could you also add a unit test for this case?

Example of a non-composed GCS Object response headers:

HTTP/1.1 200 OK
X-goog-metageneration: 53
Content-location: https://storage.googleapis.com/storage/v1/b/anima-frank/o/coderfrank?alt=media
Content-disposition: attachment
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Vary: Origin,X-Origin
Content-length: 0
Cache-control: no-cache, no-store, max-age=0, must-revalidate
Etag: CNerzaGOku8CEDU=
X-goog-generation: 1614705373435351
Pragma: no-cache
X-guploader-uploadid: ABg5-UzkhLZtBh_nh_MyJJUrxf2cptQxl5InjXd8jYRhUN31wS_DHiHBctrKCu7RLyi0q_5cZvAdHpj1D7nS9FsgsbI
Date: Mon, 15 Mar 2021 20:48:06 GMT
X-goog-hash: crc32c=AAAAAA==,md5=1B2M2Y8AsgTpgAmY7PhCfg==
Content-type: application/octet-stream
X-goog-storage-class: MULTI_REGIONAL

Example of a composed GCS Object response headers:

HTTP/1.1 200 OK
X-goog-metageneration: 1
Content-location: https://storage.googleapis.com/storage/v1/b/anima-frank/o/coderfrank-composed?alt=media
Content-disposition: attachment
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Vary: Origin,X-Origin
Content-length: 0
Cache-control: no-cache, no-store, max-age=0, must-revalidate
Etag: CNr8/deVs+8CEAE=
X-goog-generation: 1615841237892698
Pragma: no-cache
X-guploader-uploadid: ABg5-UzkOS0YokMzs44P_reufnM5l2idHXaVhKKbsjyQr1OzvgqZtOUIEkahxuFplJfkcylpFQ_4GyEq98JbWocEvt4
Date: Mon, 15 Mar 2021 20:48:59 GMT
X-goog-hash: crc32c=AAAAAA==
Content-type: application/octet-stream
X-goog-storage-class: MULTI_REGIONAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn’t mind tackling that in a follow-up PR, unless you feel strongly it should be included here. This change shouldn’t alter the behavior we currently have, as far as how that situation is handled.

I will add a test for if just one algorithm comes back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added.

});
}

validateStream = hashStreamValidation({crc32c, md5});
throughStreams.push(validateStream);
}
Expand Down Expand Up @@ -1338,46 +1352,42 @@ class File extends ServiceObject<File> {
// our chance to validate the data and let the user know if anything went
// wrong.
let onCompleteCalled = false;
const onComplete = (err: Error | null) => {
if (err) {
onCompleteCalled = true;
throughStream.destroy(err);
const onComplete = async (err: Error | null) => {
if (onCompleteCalled) {
return;
}

if (rangeRequest) {
onCompleteCalled = true;
throughStream.end();
return;
}
onCompleteCalled = true;

if (!refreshedMetadata) {
refreshedMetadata = true;
this.getMetadata({userProject: options.userProject}, onComplete);
if (err) {
throughStream.destroy(err);
return;
}

if (onCompleteCalled) {
if (rangeRequest || !shouldRunValidation) {
throughStream.end();
return;
}

onCompleteCalled = true;

// TODO(https://github.com/googleapis/nodejs-storage/issues/709):
// Remove once the backend issue is fixed.
// If object is stored compressed (having metadata.contentEncoding === 'gzip')
// and was served decompressed, then skip checksum validation because the
// remote checksum is computed against the compressed version of the object.
if (this.metadata.contentEncoding === 'gzip' && !isServedCompressed) {
throughStream.end();
return;
// If object is stored compressed (having
// metadata.contentEncoding === 'gzip') and was served decompressed,
// then skip checksum validation because the remote checksum is computed
// against the compressed version of the object.
if (!isServedCompressed) {
try {
await this.getMetadata({userProject: options.userProject});
} catch (e) {
throughStream.destroy(e);
return;
}
if (this.metadata.contentEncoding === 'gzip') {
throughStream.end();
return;
}
}

const hashes = {
crc32c: this.metadata.crc32c,
md5: this.metadata.md5Hash,
};

// If we're doing validation, assume the worst-- a data integrity
// mismatch. If not, these tests won't be performed, and we can assume
// the best.
Expand Down
Loading