Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gatsby): limit node manifest creation limit #35359

Merged
merged 8 commits into from
Apr 14, 2022

Conversation

veryspry
Copy link
Contributor

@veryspry veryspry commented Apr 6, 2022

Description

Context

Node Manifest files are used to tie a unique revision state of a piece of content in a data source to a gatsby page generated from that exact data. In some cases, this is causing upwards of 150K plus node manifest files to get written to disk which can cause performance degradation in certain cases.

How is it done?

This helps PR introduces a limit to the number of node manifest files written to disk during a gatsby build which can be controlled with the env var NODE_MANIFEST_FILE_LIMIT.

If the number of node manifests that are queued to get written to disk is greater than NODE_MANIFEST_FILE_LIMIT, then the node manifest "candidates" are sorted by their updatedAtUTC field timestamp. That sorted list of node manifests is used to queue up file writes file writes are then stopped as soon as the NODE_MANIFEST_FILE_LIMIT has been reached

⚠️ Notes

  • The current limit is set to default at 10,000 which is still quite high. We might want to tweak this before merging. Feel free to discuss 🙂
  • The integration tests for this were acting a little funny. I still need to spend a little time making sure everything is okay there.

Node Manifest Sorting

I spent some time testing some different sorting approaches and it turns out that the native array sort is almost certainly adequate.

Sorting 1,000,000 node manifests took ~11156 ms while sorting 100,000 took ~914 ms. These numbers seem well within reason in context of a Gatsby build.

@veryspry veryspry requested a review from TylerBarnes April 6, 2022 23:52
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Apr 6, 2022
@veryspry veryspry requested a review from DanielSLew April 6, 2022 23:52
@LekoArts LekoArts added topic: core Relates to Gatsby's core (e.g. page loading, reporter, state machine) and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Apr 7, 2022
Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough context but I fear that this might break in subtle ways or I might not understand the feature well enough

packages/gatsby/src/utils/node-manifest.ts Outdated Show resolved Hide resolved
packages/gatsby/src/utils/node-manifest.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@TylerBarnes TylerBarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! 🙌 Just a couple nits

packages/gatsby/src/utils/node-manifest.ts Outdated Show resolved Hide resolved
packages/gatsby/src/utils/node-manifest.ts Outdated Show resolved Hide resolved
DanielSLew
DanielSLew previously approved these changes Apr 11, 2022
Copy link
Contributor

@DanielSLew DanielSLew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! This is out of the scope of this PR but maybe we need a way to clear out old manifests as well? As it is now we only clear them on cache clears. But maybe there's more we could be doing to clear them so they don't keep accumulating?

@TylerBarnes
Copy link
Contributor

@DanielSLew it's a good idea but I think out of scope for this PR. We have the manifest ID as part of each page so perhaps when a page is deleted we could delete the manifest 🤔

@veryspry
Copy link
Contributor Author

Just updated the tests! I realized that they would fail if running all of the integration tests for this (I was only running the one new test while adding new tests and didn't check to make sure that everything would still pass if the whole test file got ran)

@veryspry veryspry force-pushed the veryspry/node-manifest-limit branch from 6aa1358 to 2550939 Compare April 13, 2022 14:34
@veryspry
Copy link
Contributor Author

I published a canary with these changes:

gatsby@4.13.0-alpha-node-manifest-limit.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: core Relates to Gatsby's core (e.g. page loading, reporter, state machine)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants