Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optional checksum algorithm for upload #13849

Merged

Conversation

wuuxigh
Copy link

@wuuxigh wuuxigh commented Sep 24, 2024

Description of changes

  • Added optional checksumAlgorithm option for uploadData API to enable CRC-32 checksum calculation.
  • CRC-32 now works within react native environment upon enablement with this new option.
  • Upload cache key now includes a UploadDataOptions CRC-32 hash to avoid collision.

Issue #, if available

Description of how you validated changes

  • Updated unit tests utilizing the checksumAlgorithm option for CRC-32.
  • Local run of react native e2e test.

Checklist

  • PR description included
  • yarn test passes
  • Unit Tests are changed or added
  • Relevant documentation is changed or added (and PR referenced)

Checklist for repo maintainers

  • Verify E2E tests for existing workflows are working as expected or add E2E tests for newly added workflows
  • New source file paths included in this PR have been added to CODEOWNERS, if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@wuuxigh wuuxigh requested review from a team as code owners September 24, 2024 23:46
@wuuxigh wuuxigh changed the title Feat: opt in checksum with optional checksum algorithm for upload feat: optional checksum algorithm for upload Sep 24, 2024
Comment on lines +114 to +116
const optionsHash = (
await calculateContentCRC32(JSON.stringify(uploadDataOptions))
).checksum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making this change 👍 @wuuxigh it does fix a sharp edge with current implementation.

On the other hand, we are going to add an option to disable this upload caching feature completely. Partially because of this sharp edge, and there are more sharp edges. We plan to have a more thorough refactor to how we cache the uploads.

For now I think this change is OK to be merged in, even though it will not be used by StorageBrowser.

cc @jimblanc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, when that other change goes in we'll probably need to go through and make optionsHash optional on interfaces.

Comment on lines 110 to 113
const finalCrc32 =
checksumAlgorithm === 'crc-32'
? await getCombinedCrc32(data, size)
: undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check: do we have special error message for client-side hashing errors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new error messages were added in this PR.

Comment on lines 229 to 232
/**
* Indicates the algorithm used to create the checksum for the object.
* This checksum can be used as a data integrity check to verify that the data received is the same data that was originally sent.
* @default undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decription is a bit verbose since integrity check means to verify that the data received is the same data that was originally sent.

Suggested change
/**
* Indicates the algorithm used to create the checksum for the object.
* This checksum can be used as a data integrity check to verify that the data received is the same data that was originally sent.
* @default undefined
/**
* Indicates the algorithm used to create the checksum for the object.
* The checksum created is for data integrity check by service-side.
* By default this checksum is not created.
* @default undefined

packages/storage/src/providers/s3/utils/crc32.ts Outdated Show resolved Hide resolved
new Uint8Array((hexString.match(/\w{2}/g)! ?? []).map(h => parseInt(h, 16)))
.buffer;

const hexToBase64 = (hexString: string) =>
export const hexToBase64 = (hexString: string) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, move this to own file.

@@ -126,7 +128,7 @@ export const getUploadsCacheKey = ({
levelStr = accessLevel === 'guest' ? 'public' : accessLevel;
}

const baseId = `${size}_${resolvedContentType}_${bucket}_${levelStr}_${key}`;
const baseId = `${optionsHash}_${size}_${resolvedContentType}_${bucket}_${levelStr}_${key}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check: What will happen here for any ongoing uploads when this change is released? Is the change backwards compatible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not be backwards compatible and in progress uploads will need to start from scratch.

Comment on lines +114 to +116
const optionsHash = (
await calculateContentCRC32(JSON.stringify(uploadDataOptions))
).checksum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, when that other change goes in we'll probably need to go through and make optionsHash optional on interfaces.

Comment on lines 229 to 232
/**
* Indicates the algorithm used to create the checksum for the object.
* This checksum can be used as a data integrity check to verify that the data received is the same data that was originally sent.
* @default undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To build on what Allan suggested, I'd make it even less verbose:

Suggested change
/**
* Indicates the algorithm used to create the checksum for the object.
* This checksum can be used as a data integrity check to verify that the data received is the same data that was originally sent.
* @default undefined
/**
* The algorithm used to compute a checksum for the object. Used to verify that the data received by S3
* matches what was originally sent. Disabled by default.
*
* @default undefined

offset += crc32Hash.length;
}

return `${(await calculateContentCRC32(combinedArray.buffer)).checksum}-${crc32List.length}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a stupid question, but why is the length appended to the hash?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Length is needed to indicate that this is a combined checksum expected by s3.

AllanZhengYP
AllanZhengYP previously approved these changes Oct 3, 2024
Copy link
Member

@AllanZhengYP AllanZhengYP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

jimblanc
jimblanc previously approved these changes Oct 8, 2024
AllanZhengYP
AllanZhengYP previously approved these changes Oct 9, 2024
Copy link
Member

@AllanZhengYP AllanZhengYP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for adding this.

@AllanZhengYP AllanZhengYP dismissed stale reviews from jimblanc and themself via 1c4034d October 9, 2024 17:38
@AllanZhengYP AllanZhengYP merged commit 02cb08a into aws-amplify:storage-browser/integrity Oct 9, 2024
28 checks passed
AllanZhengYP added a commit to AllanZhengYP/amplify-js that referenced this pull request Oct 10, 2024
AllanZhengYP added a commit that referenced this pull request Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants