Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby): Optimize creating many child nodes from one parent #35504

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7a45d61
feat(gatsby-transformer-json): Speed up creating nodes for arrays
KyleAMathews Apr 27, 2022
f85f07b
100 is about 33% faster than 1000
KyleAMathews Apr 27, 2022
c409697
Add type cache
KyleAMathews Apr 27, 2022
a761e9c
Merge branch 'master' into faster-transformer-json
KyleAMathews May 31, 2022
880951e
Fix creating IDs for nodes
KyleAMathews May 31, 2022
7ceceea
Debounce writing updated parent node
KyleAMathews May 31, 2022
cbfe597
Invoke on leading and trailing to ensure the parent node's children a…
KyleAMathews May 31, 2022
74e12dc
Fix key for debounce fn
KyleAMathews Jun 1, 2022
6fa9892
batch actions instead of timeout on writing
KyleAMathews Jun 1, 2022
27ff437
Keep old behavior for tests
KyleAMathews Jun 1, 2022
81e96b6
Setting batch count of 1 seems to work
KyleAMathews Jun 1, 2022
bc76325
Merge branch 'master' into faster-transformer-json
KyleAMathews Jun 13, 2022
cc04b88
Merge branch 'master' into faster-transformer-json
KyleAMathews Jun 14, 2022
ce78af4
Merge branch 'master' into faster-transformer-json
KyleAMathews Jun 29, 2022
62f56da
Merge branch 'master' into faster-transformer-json
KyleAMathews Jul 28, 2022
e579e74
Merge branch 'master' into faster-transformer-json
KyleAMathews Oct 12, 2022
80b449f
Merge branch 'master' into faster-transformer-json
LekoArts Dec 9, 2022
f2dd1ff
fix csv transformer
LekoArts Dec 9, 2022
f8dcb3c
remove unnecessary await
LekoArts Dec 9, 2022
9485ac0
correct typescript
LekoArts Dec 9, 2022
8e9b755
fix csv tests
LekoArts Dec 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 37 additions & 10 deletions packages/gatsby-transformer-csv/src/gatsby-node.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
const csv = require(`csvtojson`)
const _ = require(`lodash`)

const { typeNameFromFile } = require(`./index`)

const convertToJson = (data, options) =>
Expand All @@ -15,6 +14,8 @@ function shouldOnCreateNode({ node }, pluginOptions = {}) {
return extensions ? extensions.includes(extension) : extension === `csv`
}

const typeCache = new Map()

async function onCreateNode(
{ node, actions, loadNodeContent, createNodeId, createContentDigest },
pluginOptions
Expand All @@ -35,14 +36,26 @@ async function onCreateNode(
if (pluginOptions && _.isFunction(typeName)) {
return pluginOptions.typeName({ node, object })
} else if (pluginOptions && _.isString(typeName)) {
return _.upperFirst(_.camelCase(typeName))
if (typeCache.has(node.internal.type)) {
return typeCache.get(node.internal.type)
} else {
const type = _.upperFirst(_.camelCase(typeName))
typeCache.set(node.internal.type, type)
return type
}
} else {
return typeNameFromFile({ node })
if (typeCache.has(node.internal.type)) {
return typeCache.get(node.internal.type)
} else {
const type = typeNameFromFile({ node })
typeCache.set(node.internal.type, type)
return type
}
}
}

// Generate the new node
async function transformObject(obj, i) {
function transformObject(obj, i) {
const csvNode = {
...obj,
id:
Expand All @@ -59,21 +72,35 @@ async function onCreateNode(
},
}

await createNode(csvNode)
createNode(csvNode)
createParentChildLink({ parent: node, child: csvNode })
}

async function transformArrayChunk({ chunk, startCount }) {
for (let i = 0, l = chunk.length; i < l; i++) {
const obj = chunk[i]
transformObject(obj, i + startCount)
await new Promise(resolve =>
setImmediate(() => {
resolve()
})
)
}
}

if (_.isArray(parsedContent)) {
if (pluginOptions && nodePerFile) {
if (pluginOptions && _.isString(nodePerFile)) {
await transformObject({ [nodePerFile]: parsedContent }, 0)
transformObject({ [nodePerFile]: parsedContent }, 0)
} else {
await transformObject({ items: parsedContent }, 0)
transformObject({ items: parsedContent }, 0)
}
} else {
for (let i = 0, l = parsedContent.length; i < l; i++) {
const obj = parsedContent[i]
await transformObject(obj, i)
const chunks = _.chunk(parsedContent, 100)
let count = 0
for (const chunk of chunks) {
await transformArrayChunk({ chunk, startCount: count })
count += chunk.length
}
}
}
Expand Down
65 changes: 52 additions & 13 deletions packages/gatsby-transformer-json/src/gatsby-node.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ function shouldOnCreateNode({ node }) {
return node.internal.mediaType === `application/json`
}

const typeCache = new Map()

async function onCreateNode(
{ node, actions, loadNodeContent, createNodeId, createContentDigest },
pluginOptions
Expand All @@ -16,15 +18,35 @@ async function onCreateNode(
} else if (pluginOptions && _.isString(pluginOptions.typeName)) {
return pluginOptions.typeName
} else if (node.internal.type !== `File`) {
return _.upperFirst(_.camelCase(`${node.internal.type} Json`))
if (typeCache.has(node.internal.type)) {
return typeCache.get(node.internal.type)
} else {
const type = _.upperFirst(_.camelCase(`${node.internal.type} Json`))
typeCache.set(node.internal.type, type)
return type
}
} else if (isArray) {
return _.upperFirst(_.camelCase(`${node.name} Json`))
if (typeCache.has(node.name)) {
return typeCache.get(node.name)
} else {
const type = _.upperFirst(_.camelCase(`${node.name} Json`))
typeCache.set(node.name, type)
return type
}
} else {
return _.upperFirst(_.camelCase(`${path.basename(node.dir)} Json`))
if (typeCache.has(node.dir)) {
return typeCache.get(node.dir)
} else {
const type = _.upperFirst(
_.camelCase(`${path.basename(node.dir)} Json`)
)
typeCache.set(node.dir, type)
return type
}
}
}

async function transformObject(obj, id, type) {
function transformObject(obj, id, type) {
const jsonNode = {
...obj,
id,
Expand All @@ -38,7 +60,7 @@ async function onCreateNode(
if (obj.id) {
jsonNode[`jsonId`] = obj.id
}
await createNode(jsonNode)
createNode(jsonNode)
createParentChildLink({ parent: node, child: jsonNode })
}

Expand All @@ -55,18 +77,35 @@ async function onCreateNode(
throw new Error(`Unable to parse JSON: ${hint}`)
}

if (_.isArray(parsedContent)) {
for (let i = 0, l = parsedContent.length; i < l; i++) {
const obj = parsedContent[i]

await transformObject(
async function transformArrayChunk({ chunk, startCount }) {
for (let i = 0, l = chunk.length; i < l; i++) {
const obj = chunk[i]
transformObject(
obj,
createNodeId(`${node.id} [${i}] >>> JSON`),
getType({ node, object: obj, isArray: true })
createNodeId(`${node.id} [${i + startCount}] >>> JSON`),
getType({
node,
object: obj,
isArray: true,
})
)
await new Promise(resolve =>
setImmediate(() => {
resolve()
})
)
}
}

if (_.isArray(parsedContent)) {
const chunks = _.chunk(parsedContent, 100)
let count = 0
for (const chunk of chunks) {
await transformArrayChunk({ chunk, startCount: count })
count += chunk.length
}
} else if (_.isPlainObject(parsedContent)) {
await transformObject(
transformObject(
parsedContent,
createNodeId(`${node.id} >>> JSON`),
getType({ node, object: parsedContent, isArray: false })
Expand Down
49 changes: 49 additions & 0 deletions packages/gatsby/src/redux/actions/create-parent-child-link.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
import type { IGatsbyNode, IAddChildNodeToParentNodeAction } from "../types"
import Batcher from "../../utils/batcher"
import { store } from "../"
import { getNode } from "../../datastore"

const isTestEnv = process.env.NODE_ENV === `test`
const batchCount = isTestEnv ? 1 : 1000

type CreateParentChildLinkFn = (
payload: {
parent: IGatsbyNode
child: IGatsbyNode
},
plugin?: string
) => IAddChildNodeToParentNodeAction

export const createParentChildLinkBatcher =
new Batcher<CreateParentChildLinkFn>(batchCount)

createParentChildLinkBatcher.bulkCall(createCalls => {
const nodesMap = new Map()

// Add children to parent node(s) and dispatch.
createCalls.forEach(call => {
const { parent, child } = call[0]
const parentId = parent.id

let parentNode
if (nodesMap.has(parentId)) {
parentNode = nodesMap.get(parentId)
} else {
parentNode = getNode(parentId)

if (!parentNode.children.includes(child.id)) {
parentNode.children.push(child.id)
}

nodesMap.set(parentId, parentNode)
}
})

nodesMap.forEach(parentNode => {
const payload: IAddChildNodeToParentNodeAction = {
type: `ADD_CHILD_NODE_TO_PARENT_NODE`,
payload: parentNode,
}
store.dispatch(payload)
})
})
11 changes: 3 additions & 8 deletions packages/gatsby/src/redux/actions/public.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import { createJobV2FromInternalJob } from "./internal"
import { maybeSendJobToMainProcess } from "../../utils/jobs/worker-messaging"
import { reportOnce } from "../../utils/report-once"
import { wrapNode } from "../../utils/detect-node-mutations"
import { createParentChildLinkBatcher } from "./create-parent-child-link"

const isNotTestEnv = process.env.NODE_ENV !== `test`
const isTestEnv = process.env.NODE_ENV === `test`
Expand Down Expand Up @@ -1040,15 +1041,9 @@ actions.createParentChildLink = (
{ parent, child }: { parent: any, child: any },
plugin?: Plugin
) => {
if (!parent.children.includes(child.id)) {
parent.children.push(child.id)
}
createParentChildLinkBatcher.add({ parent, child })

return {
type: `ADD_CHILD_NODE_TO_PARENT_NODE`,
plugin,
payload: parent,
}
return []
Comment on lines -1043 to +1046
Copy link
Contributor

@pieh pieh Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we force flush batcher just after sourceNodes lifecycle and while this is where we expect most of createParentChildLink calls to happen they will not always be just there (for example editing files while dev server is running will randomly update File nodes + run onCreateNode for them outside of sourceNodes). This makes it possible to not create actual link until batch grow large enough to be processed or we sourceNodes is triggered for different reason.

Is it possible to only do batching while during sourceNodes and use previous setup for anything else?

}

/**
Expand Down
4 changes: 4 additions & 0 deletions packages/gatsby/src/utils/source-nodes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import { store } from "../redux"
import { getDataStore, getNode } from "../datastore"
import { actions } from "../redux/actions"
import { IGatsbyState, IGatsbyNode } from "../redux/types"
import { createParentChildLinkBatcher } from "../redux/actions/create-parent-child-link"
import type { GatsbyIterable } from "../datastore/common/iterable"

const { deleteNode } = actions
Expand Down Expand Up @@ -114,6 +115,9 @@ export default async ({
pluginName,
})

// Flush createParentChildLinkBatcher to ensure parent nodes are written with their children dependencies.
createParentChildLinkBatcher.flush()

await getDataStore().ready()

// We only warn for plugins w/o nodes and delete stale nodes on the first sourcing.
Expand Down