feat(core): cache fingerprints of large assets #21321

bdonlan · 2022-07-25T20:30:52Z

When fingerprinting large assets, hashing the asset can take quite a
long time - over a second for a 300MB asset, for example. This can add
up, particularly when generating multiple stacks in a single build, or
when running test suites that bundle assets multiple times, and is not
avoidable by asset caching (since it's computing the cache key).

This change caches the result of digesting individual files based on the
inode, mtime, and size of the input file.

This feature improved the runtime of one of our slowest tests by ~10%.

closes: #21297

Note: No README entries were added, because this sub-subsystem was already not documented in the README.

All Submissions:

Have you followed the guidelines in our Contributing guide?

New Features

Have you added the new feature to an integration test? N/A

gitpod-io · 2022-07-25T20:30:59Z

bdonlan · 2022-07-25T21:08:13Z

Changed to chore as the PR linter doesn't like a feature that doesn't touch the README. Let me know if feat would still be preferable.

packages/@aws-cdk/core/test/fs/fs-fingerprint.test.ts

jogold · 2022-07-27T08:19:51Z

packages/@aws-cdk/core/test/fs/fs-fingerprint.test.ts

+      const hash2 = FileSystem.fingerprint(largefile, {});
+
+      expect(hash1).toEqual(hash2);
+      expect(openSyncSpy).toHaveBeenCalledTimes(1);


You should restore the mock or clear all mocks in a beforeEach(). Because it is not cleared you end up with 3 calls in your considers mtime test. If this test is moved up in the file for some reason it will fail (test should not rely on testing order).

Also, is there a reason to create a large file for this test? It is the caching mechanism that is tested so it should work on files of any size?

Suggested change

expect(openSyncSpy).toHaveBeenCalledTimes(1);

expect(openSyncSpy).toHaveBeenCalledTimes(1);

openSyncSpy.mockRestore()

same for the second test

Good point, we don't need to use a large file now that this is not sensitive to timing or filesize thresholds. I'll update that as well as adding a mock restore call.

rix0rrr · 2022-07-27T12:21:54Z

packages/@aws-cdk/core/lib/fs/fingerprint.ts

@@ -84,6 +96,13 @@ export function fingerprint(fileOrDirectory: string, options: FingerprintOptions
 }

 export function contentFingerprint(file: string): string {
+  const stats = fs.statSync(file);
+  const cacheKey = JSON.stringify({ mtime: stats.mtime, inode: stats.ino, size: stats.size });


Put a comment here explaining why this is safe on Windows

Pull request has been modified.

When fingerprinting large assets, hashing the asset can take quite a long time - over a second for a 300MB asset, for example. This can add up, particularly when generating multiple stacks in a single build, or when running test suites that bundle assets multiple times, and is not avoidable by asset caching (since it's computing the cache key). This change caches the result of digesting individual files based on the inode, mtime, and size of the input file. This feature improved the runtime of one of our slowest tests by ~10%. closes: aws#21297

mergify · 2022-07-28T13:46:28Z

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

aws-cdk-automation · 2022-07-28T14:22:14Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
Commit ID: 78d6af0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mergify · 2022-07-28T14:22:56Z

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

RomainMuller · 2022-07-29T10:51:09Z

packages/@aws-cdk/core/lib/fs/fingerprint.ts

+    mtime_unix: stats.mtime.getUTCDate(),
+    mtime_ms: stats.mtime.getUTCMilliseconds(),


This is not doing what is intended:

getUTCDate() returns the UTC day-of-the-month of the Date object

getUTCMilliseconds() returns the UTC milliseconds component of the Date object (i.e: fractional seconds)

Thanks for the catch (and fix!)

#21374) Instead of using the complete mtime value, it only accounted for the day-of-month and fractional seconds part of the timestamp, which is not the intention. The issue was introduced in #21321

When fingerprinting large assets, hashing the asset can take quite a long time - over a second for a 300MB asset, for example. This can add up, particularly when generating multiple stacks in a single build, or when running test suites that bundle assets multiple times, and is not avoidable by asset caching (since it's computing the cache key). This change caches the result of digesting individual files based on the inode, mtime, and size of the input file. This feature improved the runtime of one of our slowest tests by ~10%. closes: aws#21297 Note: No README entries were added, because this sub-subsystem was already not documented in the README. ---- ### All Submissions: * [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) ### New Features * [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? N/A

aws#21374) Instead of using the complete mtime value, it only accounted for the day-of-month and fractional seconds part of the timestamp, which is not the intention. The issue was introduced in aws#21321

github-actions bot added the p2 label Jul 25, 2022

aws-cdk-automation requested a review from a team July 25, 2022 20:31

bdonlan force-pushed the inode-fingerprint branch from c6142f8 to 3664ad6 Compare July 25, 2022 21:07

bdonlan changed the title ~~feat(core): use inode data to cache fingerprints~~ chore(core): use inode data to cache fingerprints Jul 25, 2022

jogold reviewed Jul 26, 2022

View reviewed changes

packages/@aws-cdk/core/test/fs/fs-fingerprint.test.ts Outdated Show resolved Hide resolved

bdonlan force-pushed the inode-fingerprint branch 2 times, most recently from de789cf to d858559 Compare July 26, 2022 18:56

jogold reviewed Jul 27, 2022

View reviewed changes

rix0rrr previously requested changes Jul 27, 2022

View reviewed changes

bdonlan force-pushed the inode-fingerprint branch from d858559 to 4b8ecd9 Compare July 27, 2022 20:13

bdonlan force-pushed the inode-fingerprint branch from 4b8ecd9 to fc2be6c Compare July 27, 2022 20:28

rix0rrr changed the title ~~chore(core): use inode data to cache fingerprints~~ feat(core): cache fingerprints of large assets Jul 28, 2022

rix0rrr approved these changes Jul 28, 2022

View reviewed changes

rix0rrr added pr-linter/exempt-readme The PR linter will not require README changes pr-linter/exempt-integ-test The PR linter will not require integ test changes labels Jul 28, 2022

Merge branch 'main' into inode-fingerprint

78d6af0

mergify bot merged commit 17f1ec8 into aws:main Jul 28, 2022

RomainMuller reviewed Jul 29, 2022

View reviewed changes

RomainMuller mentioned this pull request Jul 29, 2022

fix(core): asset fingerprint cache invalidation incorrectly uses mtime #21374

Merged

bdonlan deleted the inode-fingerprint branch August 1, 2022 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): cache fingerprints of large assets #21321

feat(core): cache fingerprints of large assets #21321

bdonlan commented Jul 25, 2022

gitpod-io bot commented Jul 25, 2022

bdonlan commented Jul 25, 2022

jogold Jul 27, 2022

bdonlan Jul 27, 2022

rix0rrr Jul 27, 2022

mergify bot commented Jul 28, 2022

aws-cdk-automation commented Jul 28, 2022

mergify bot commented Jul 28, 2022

RomainMuller Jul 29, 2022

bdonlan Aug 1, 2022

		mtime_unix: stats.mtime.getUTCDate(),
		mtime_ms: stats.mtime.getUTCMilliseconds(),

feat(core): cache fingerprints of large assets #21321

feat(core): cache fingerprints of large assets #21321

Conversation

bdonlan commented Jul 25, 2022

All Submissions:

New Features

gitpod-io bot commented Jul 25, 2022

bdonlan commented Jul 25, 2022

jogold Jul 27, 2022

Choose a reason for hiding this comment

bdonlan Jul 27, 2022

Choose a reason for hiding this comment

rix0rrr Jul 27, 2022

Choose a reason for hiding this comment

mergify bot commented Jul 28, 2022

aws-cdk-automation commented Jul 28, 2022

AWS CodeBuild CI Report

mergify bot commented Jul 28, 2022

RomainMuller Jul 29, 2022

Choose a reason for hiding this comment

bdonlan Aug 1, 2022

Choose a reason for hiding this comment