-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gatsby): don't block event loop during inference #37780
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gatsbot
bot
added
the
status: triage needed
Issue or pull request that need to be triaged and assigned to a reviewer
label
Mar 24, 2023
TylerBarnes
force-pushed
the
fix/inference-event-loop
branch
from
March 24, 2023 18:05
7ff48c7
to
06f9aab
Compare
TylerBarnes
removed
the
status: triage needed
Issue or pull request that need to be triaged and assigned to a reviewer
label
Mar 24, 2023
TylerBarnes
force-pushed
the
fix/inference-event-loop
branch
from
March 24, 2023 18:08
06f9aab
to
71e7e31
Compare
TylerBarnes
added
topic: GraphQL
Related to Gatsby's GraphQL layer
topic: data
Relates to source-nodes, internal-data-bridge, and node creation
labels
Mar 25, 2023
wardpeet
approved these changes
Mar 27, 2023
pieh
pushed a commit
that referenced
this pull request
Mar 29, 2023
Don't block the event loop during inference (cherry picked from commit c08048d)
pieh
pushed a commit
that referenced
this pull request
Mar 29, 2023
Don't block the event loop during inference (cherry picked from commit c08048d)
pieh
pushed a commit
that referenced
this pull request
Mar 29, 2023
pieh
added a commit
that referenced
this pull request
Mar 29, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
topic: data
Relates to source-nodes, internal-data-bridge, and node creation
topic: GraphQL
Related to Gatsby's GraphQL layer
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These changes drop gatsbyjs.com schema building time from 16s to 7s.
For a Contentful site with 4.9M nodes it drops schema building time from 520s to 380s. It also allows the 4.9M node site to build with significantly less memory. Previously it needed 64Gi of memory, now it can build with 24Gi with these changes (+ some other changes I'll be PRing soon).
Vlad previously added code in https://github.com/gatsbyjs/gatsby/pull/19781/files#diff-d380fd3fbf5adf3933e07c737228eb75e520cdc7a5050d4d6b710acd5256d40cR48 that lets the event loop breathe between inferring each node type which was good.
The problem with that though is it means for each node type, every node of that type will be loaded into memory before node can have an opportunity to garbage collect if it needs to. Enough memory is needed to load all of the type of nodes which has the greatest count into memory.
For large sites this means huge amounts of memory is required. For small/medium sites it still means more memory is needed than should be needed but also inferring is slower than it could be because the event loop is blocked for larger chunks of time.
With this code node can throw away the last inferred nodes if it needs to before moving on to the next chunk of 1000.
Apparently v8 can gc whenever it wants to, but from what I saw here that was not the case. Memory linearly grew until each type was finished inferring. With these changes memory usage stays relatively flat during inference. Possibly it's because not blocking the event loop allows other parallel code to complete, allowing node to GC elsewhere.