-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gatsby): Optimize creating many child nodes from one parent #35504
base: master
Are you sure you want to change the base?
Conversation
I have updated the test repo https://github.com/pogo19/gatsby4-slow-json to reflect more the real-world usage. With "gatsby" package in latest 3.x version the "source and transform nodes" phase takes less than 10 seconds (after clean). With 4.x the process gets killed by OS ... in real-world webapp it doesn't happen (probably a bit different data entries' size, had to change it due to some data anonymization), even though data files are even bigger in real-world usage. In real-world app it takes approx 800-1200 seconds. All times are on my development notebook (4yo i7 with 16GB RAM and SSD). |
Also as mentioned in discord thread https://discord.com/channels/484383807575687178/966709737968250910 ... When testing this PR there were also differences in reporting and result:
Also the build with the updated transformer-json failed later (with released version of tranformer-json it worked) : |
Ok, I fixed creating nodes so now the right number of nodes are being created and w/o memory problems. On my machine it's taking ~184s to create the nodes now. But I discovered something very interesting. If you disable the The reason is that when I tried debouncing this write in 7ceceea and that kept us around 6.5-7s for a cool ~25x speedup. |
…rray is updated at least once before schema creation
Failing tests means the key for debouncing needs to be both the node id & type of the child. |
We need to guarantee that the parent node is written when sourcing is finished. Still pondering best way to do that. Any ideas appreciated! |
Yeah, works like a charm. Thanks! In my real-world app the "source and transform nodes" phase takes 8.732s now on my notebook, so it is on par with 3.x. Also no other problems and the app works OK. Thanks again! Looking forward to a release with this merged ;) |
Great! Thanks for continuing to test! Always best to get more real-world validation. |
Is there any chance to get this merged and released? Can I help in any way? |
Any news? |
Is it possible to have a beta version of these released that could make testing easier? |
Sorry about the delay here — you can try out this PR by installing |
Can confirm this has shown massive improvement in my case. SetupI have a number json files (one per locale) that is similar in format to this file, but around 80mb. Logs`gatsty@next`
`gatsby@alpha-transformer-json`
|
I also rolled back to the current gatsby@4 version i'm running to provide the timing increase from v4 -> v5. gatsby@4``` success load gatsby config - 0.060s success load plugins - 1.119s warn gatsby-plugin-react-helmet: Gatsby now has built-in support for modifying the document head. Learn more at https://gatsby.dev/gatsby-head success onPreInit - 0.011s success initialize cache - 0.183s success copy gatsby files - 0.944s success Compiling Gatsby Functions - 0.289s success onPreBootstrap - 0.331s success createSchemaCustomization - 0.020s success gatsby-plugin-react-i18next: create node: de-de/translation - 0.122s success gatsby-plugin-react-i18next: create node: en-us/translation - 0.137s success gatsby-plugin-react-i18next: create node: es-ar/translation - 0.151s success gatsby-plugin-react-i18next: create node: el-gr/translation - 0.165s success gatsby-plugin-react-i18next: create node: es-es/translation - 0.177s success gatsby-plugin-react-i18next: create node: es-mx/translation - 0.190s success gatsby-plugin-react-i18next: create node: cs-cz/translation - 0.201s success gatsby-plugin-react-i18next: create node: it-it/translation - 0.209s success gatsby-plugin-react-i18next: create node: ja-jp/translation - 0.215s success gatsby-plugin-react-i18next: create node: fr-fr/translation - 0.223s success gatsby-plugin-react-i18next: create node: hu-hu/translation - 0.229s success gatsby-plugin-react-i18next: create node: ko-kr/translation - 0.236s success gatsby-plugin-react-i18next: create node: ms-my/translation - 0.241s success gatsby-plugin-react-i18next: create node: pt-br/translation - 0.244s success gatsby-plugin-react-i18next: create node: ro-ro/translation - 0.251s success gatsby-plugin-react-i18next: create node: pl-pl/translation - 0.254s success gatsby-plugin-react-i18next: create node: zh-tw/translation - 0.255s success gatsby-plugin-react-i18next: create node: ru-ru/translation - 0.257s success gatsby-plugin-react-i18next: create node: th-th/translation - 0.255s success gatsby-plugin-react-i18next: create node: zh-my/translation - 0.256s success gatsby-plugin-react-i18next: create node: tr-tr/translation - 0.256s success gatsby-plugin-react-i18next: create node: vn-vn/translation - 0.254s success gatsby-plugin-react-i18next: create node: id-id/translation - 0.253s success Checking for changed pages - 0.001s success source and transform nodes - 212.683s success building schema - 0.846s success Building champion pages - 1.570s success Building champion skin pages - 8.572s success Building league of legends pages - 8.586s success Building Creator redirects - 0.018s success createPages - 10.185s warn Warn: updating default matchPath for all pages /en-us/404/ success createPagesStatefully - 0.136s info Total nodes: 66570, SitePage nodes: 44115 (use --verbose for breakdown) ``` |
Whats next for this PR? |
if (!parent.children.includes(child.id)) { | ||
parent.children.push(child.id) | ||
} | ||
createParentChildLinkBatcher.add({ parent, child }) | ||
|
||
return { | ||
type: `ADD_CHILD_NODE_TO_PARENT_NODE`, | ||
plugin, | ||
payload: parent, | ||
} | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we force flush
batcher just after sourceNodes
lifecycle and while this is where we expect most of createParentChildLink
calls to happen they will not always be just there (for example editing files while dev server is running will randomly update File
nodes + run onCreateNode
for them outside of sourceNodes
). This makes it possible to not create actual link until batch grow large enough to be processed or we sourceNodes
is triggered for different reason.
Is it possible to only do batching while during sourceNodes
and use previous setup for anything else?
Is there any updates for this PR? |
Description
A user migrating from v3 to v4 said their json transforming got a lot lot slower — and provided this test repo https://github.com/pogo19/gatsby4-slow-json
Creating nodes was always going to get slower w/ the introduction of LMDB but the slowdown seemed far more than expected.
Investigating, I saw a couple of issues.
This speeds up creating nodes by ~3.5x (~36s -> 10s roughly on the test repo).
I also added a type cache for generating type names which saves roughly 1s / 10k nodes.
Additional changes
The PR #36610 added the same optimization as #34084 to the CSV transformer, so this PR needs to handle this, too.