Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): cache fingerprints of large assets #21321

Merged
merged 2 commits into from
Jul 28, 2022

Conversation

bdonlan
Copy link
Contributor

@bdonlan bdonlan commented Jul 25, 2022

When fingerprinting large assets, hashing the asset can take quite a
long time - over a second for a 300MB asset, for example. This can add
up, particularly when generating multiple stacks in a single build, or
when running test suites that bundle assets multiple times, and is not
avoidable by asset caching (since it's computing the cache key).

This change caches the result of digesting individual files based on the
inode, mtime, and size of the input file.

This feature improved the runtime of one of our slowest tests by ~10%.

closes: #21297

Note: No README entries were added, because this sub-subsystem was already not documented in the README.


All Submissions:

New Features

@gitpod-io
Copy link

gitpod-io bot commented Jul 25, 2022

@github-actions github-actions bot added the p2 label Jul 25, 2022
@aws-cdk-automation aws-cdk-automation requested a review from a team July 25, 2022 20:31
@bdonlan bdonlan force-pushed the inode-fingerprint branch from c6142f8 to 3664ad6 Compare July 25, 2022 21:07
@bdonlan bdonlan changed the title feat(core): use inode data to cache fingerprints chore(core): use inode data to cache fingerprints Jul 25, 2022
@bdonlan
Copy link
Contributor Author

bdonlan commented Jul 25, 2022

Changed to chore as the PR linter doesn't like a feature that doesn't touch the README. Let me know if feat would still be preferable.

@bdonlan bdonlan force-pushed the inode-fingerprint branch 2 times, most recently from de789cf to d858559 Compare July 26, 2022 18:56
const hash2 = FileSystem.fingerprint(largefile, {});

expect(hash1).toEqual(hash2);
expect(openSyncSpy).toHaveBeenCalledTimes(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should restore the mock or clear all mocks in a beforeEach(). Because it is not cleared you end up with 3 calls in your considers mtime test. If this test is moved up in the file for some reason it will fail (test should not rely on testing order).

Also, is there a reason to create a large file for this test? It is the caching mechanism that is tested so it should work on files of any size?

Suggested change
expect(openSyncSpy).toHaveBeenCalledTimes(1);
expect(openSyncSpy).toHaveBeenCalledTimes(1);
openSyncSpy.mockRestore()

same for the second test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we don't need to use a large file now that this is not sensitive to timing or filesize thresholds. I'll update that as well as adding a mock restore call.

@@ -84,6 +96,13 @@ export function fingerprint(fileOrDirectory: string, options: FingerprintOptions
}

export function contentFingerprint(file: string): string {
const stats = fs.statSync(file);
const cacheKey = JSON.stringify({ mtime: stats.mtime, inode: stats.ino, size: stats.size });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a comment here explaining why this is safe on Windows

@bdonlan bdonlan force-pushed the inode-fingerprint branch from d858559 to 4b8ecd9 Compare July 27, 2022 20:13
@mergify mergify bot dismissed rix0rrr’s stale review July 27, 2022 20:13

Pull request has been modified.

When fingerprinting large assets, hashing the asset can take quite a
long time - over a second for a 300MB asset, for example. This can add
up, particularly when generating multiple stacks in a single build, or
when running test suites that bundle assets multiple times, and is not
avoidable by asset caching (since it's computing the cache key).

This change caches the result of digesting individual files based on the
inode, mtime, and size of the input file.

This feature improved the runtime of one of our slowest tests by ~10%.

closes: aws#21297
@bdonlan bdonlan force-pushed the inode-fingerprint branch from 4b8ecd9 to fc2be6c Compare July 27, 2022 20:28
@rix0rrr rix0rrr changed the title chore(core): use inode data to cache fingerprints feat(core): cache fingerprints of large assets Jul 28, 2022
@rix0rrr rix0rrr added pr-linter/exempt-readme The PR linter will not require README changes pr-linter/exempt-integ-test The PR linter will not require integ test changes labels Jul 28, 2022
@mergify
Copy link
Contributor

mergify bot commented Jul 28, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 78d6af0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 17f1ec8 into aws:main Jul 28, 2022
@mergify
Copy link
Contributor

mergify bot commented Jul 28, 2022

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

Comment on lines +119 to +120
mtime_unix: stats.mtime.getUTCDate(),
mtime_ms: stats.mtime.getUTCMilliseconds(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not doing what is intended:

  • getUTCDate() returns the UTC day-of-the-month of the Date object
  • getUTCMilliseconds() returns the UTC milliseconds component of the Date object (i.e: fractional seconds)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch (and fix!)

RomainMuller added a commit that referenced this pull request Jul 29, 2022
#21374)

Instead of using the complete mtime value, it only accounted for the day-of-month
and fractional seconds part of the timestamp, which is not the intention.

The issue was introduced in #21321
@bdonlan bdonlan deleted the inode-fingerprint branch August 1, 2022 16:28
josephedward pushed a commit to josephedward/aws-cdk that referenced this pull request Aug 30, 2022
When fingerprinting large assets, hashing the asset can take quite a
long time - over a second for a 300MB asset, for example. This can add
up, particularly when generating multiple stacks in a single build, or
when running test suites that bundle assets multiple times, and is not
avoidable by asset caching (since it's computing the cache key).

This change caches the result of digesting individual files based on the
inode, mtime, and size of the input file.

This feature improved the runtime of one of our slowest tests by ~10%.

closes: aws#21297

Note: No README entries were added, because this sub-subsystem was already not documented in the README.


----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)? N/A
josephedward pushed a commit to josephedward/aws-cdk that referenced this pull request Aug 30, 2022
aws#21374)

Instead of using the complete mtime value, it only accounted for the day-of-month
and fractional seconds part of the timestamp, which is not the intention.

The issue was introduced in aws#21321
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2 pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exempt-readme The PR linter will not require README changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core: Use file stat data to fingerprint assets
5 participants