Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More work to prevent queries from running when there's in-progress node processing #8859

Merged
merged 4 commits into from
Oct 9, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 1 addition & 50 deletions packages/gatsby-source-filesystem/src/gatsby-node.js
Original file line number Diff line number Diff line change
@@ -44,27 +44,6 @@ const createFSMachine = () =>
},
},
},
PROCESSING: {
initial: `BOOTSTRAPPING`,
states: {
BOOTSTRAPPING: {
on: {
BOOTSTRAP_FINISHED: `IDLE`,
},
},
IDLE: {
on: {
EMIT_FS_EVENT: `PROCESSING`,
},
},
PROCESSING: {
on: {
QUERY_QUEUE_DRAINED: `IDLE`,
TOUCH_NODE: `IDLE`,
},
},
},
},
},
})

@@ -95,7 +74,6 @@ See docs here - https://www.gatsbyjs.org/packages/gatsby-source-filesystem/

const fsMachine = createFSMachine()
let currentState = fsMachine.initialState
let fileNodeQueue = new Map()

// Once bootstrap is finished, we only let one File node update go through
// the system at a time.
@@ -105,26 +83,6 @@ See docs here - https://www.gatsbyjs.org/packages/gatsby-source-filesystem/
`BOOTSTRAP_FINISHED`
)
})
emitter.on(`TOUCH_NODE`, () => {
// If we create a node which is the same as the previous version, createNode
// returns TOUCH_NODE and then nothing else happens so we listen to that
// to return the state back to IDLE.
currentState = fsMachine.transition(currentState.value, `TOUCH_NODE`)
})

emitter.on(`QUERY_QUEUE_DRAINED`, () => {
currentState = fsMachine.transition(
currentState.value,
`QUERY_QUEUE_DRAINED`
)
// If we have any updates queued, run one of them now.
if (fileNodeQueue.size > 0) {
const toProcess = fileNodeQueue.get(Array.from(fileNodeQueue.keys())[0])
fileNodeQueue.delete(toProcess.id)
currentState = fsMachine.transition(currentState.value, `EMIT_FS_EVENT`)
createNode(toProcess)
}
})

const watcher = chokidar.watch(pluginOptions.path, {
ignored: [
@@ -147,13 +105,7 @@ See docs here - https://www.gatsbyjs.org/packages/gatsby-source-filesystem/
createNodeId,
pluginOptions
).then(fileNode => {
if (currentState.value.PROCESSING === `PROCESSING`) {
fileNodeQueue.set(fileNode.id, fileNode)
} else {
currentState = fsMachine.transition(currentState.value, `EMIT_FS_EVENT`)
createNode(fileNode)
}

createNode(fileNode)
return null
})
return fileNodePromise
@@ -201,7 +153,6 @@ See docs here - https://www.gatsbyjs.org/packages/gatsby-source-filesystem/
// It's possible the file node was never created as sometimes tools will
// write and then immediately delete temporary files to the file system.
if (node) {
currentState = fsMachine.transition(currentState.value, `EMIT_FS_EVENT`)
deleteNode({ node })
}
})
14 changes: 14 additions & 0 deletions packages/gatsby/src/internal-plugins/query-runner/query-queue.js
Original file line number Diff line number Diff line change
@@ -61,8 +61,22 @@ const queue = new Queue((plObj, callback) => {
)
}, queueOptions)

// Pause running queries when new nodes are added (processing starts).
emitter.on(`CREATE_NODE`, () => {
queue.pause()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be converted to use xstate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't pause task that is already in progress, right? Just will stop any queued tasks from executing? So there is case where we still will be executing query in parallel to CREATE_NODE processing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Though in practice this is fine as most queries are sync and what's not sync is background jobs or fetching data from elsewhere, neither of which would be affectes by node transformations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but markdown processing is async (and most of reports about problem this PR solves is about markdown). That's why I was asking if we should delay dispatching CREATE_NODE action while query are executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a query of a markdown node is in progress, it gets the markdown node syncly (this should be a word) at the start so even if a new file node immediately causes the markdown remark node to be deleted, it won't affect the in-progress query.

That being said, querying will eventually be fully async so we'll need a better solution for that. I think the current PR is fine as it's simple and solves the problem. Next action would be to add tests and then ensure we have a solution when we go async e.g. delaying API/redux work work until the current queries are finished.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I was looking in wrong spot - at the transformer level we are already shielded from this issue as node is already passed. But before that we use run-sift to actually get the node and this one is async as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR seems good as-is, just trying to see if we need to pursue further

})

// Resume running queries as soon as the api queue is empty.
emitter.on(`API_RUNNING_QUEUE_EMPTY`, () => {
queue.resume()
})

queue.on(`drain`, () => {
emitter.emit(`QUERY_QUEUE_DRAINED`)
})

queue.on(`task_queued`, () => {
emitter.emit(`QUERY_ENQUEUED`)
})

module.exports = queue