diff --git a/docs/docs/behind-the-scenes-terminology.md b/docs/docs/behind-the-scenes-terminology.md new file mode 100644 index 0000000000000..60cdb950b1c03 --- /dev/null +++ b/docs/docs/behind-the-scenes-terminology.md @@ -0,0 +1,221 @@ +--- +title: Terminology +--- + +Throughout the Gatsby code, you'll see the below object fields and variables mentioned. Their definitions and reason for existence are defined below. + +## Page + +### Page Object + +created by calls to [createPage](/docs/actions/#createPage) (see [Page Creation](/docs/page-creation)). + +- [path](#path) +- [matchPath](#matchpath) +- [jsonName](#jsonname) +- [component](#component) +- [componentChunkName](#componentchunkname) +- [internalComponentName](#internalcomponentname) (unused) +- [context](#pagecontext) +- updatedAt + +The above fields are explained below + +### path + +The publicly accessible path in the web URL to access the page in question. E.g + +`/blog/2018-07-17-announcing-gatsby-preview/`. + +It is created when the page object is created (see [Page Creation](/docs/page-creation/)) + +### Redux `pages` namespace + +Contains a map of Page [path](#path) -> [Page object](#page-object). + +### matchPath + +Think of this instead as `client matchPath`. It is ignored when creating pages during the build. But on the frontend, when resolving the page from the path ([find-path.js]()), it is used (via [reach router](https://github.com/reach/router/blob/master/src/lib/utils.js)) to find the matching page. Note that the [pages are sorted](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/pages-writer.js#L38) so that those with matchPaths are at the end, so that explicit paths are matched first. + +This is also used by [gatsby-plugin-create-client-paths](/packages/gatsby-plugin-create-client-paths/?=client). It duplicates pages whose path match some client-only prefix (e.g `/app/`). The duplicated page has a `matchPath` so that it is resolved first on the front end. + +It is also used by [gatsby-plugin-netlify](http://localhost:8000/packages/gatsby-plugin-netlify/?=netlify) when creating `_redirects`. + +### jsonName + +The logical name for the query result of a page. Created during [createPage](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/actions.js#L229). Name is constructed using kebabHash of page path. E.g. For above pagePath, it is: + +`blog-2018-07-17-announcing-gatsby-preview-995` + +### component + +The path on disk to the javascript file containing the React component. E.g + +`/src/templates/template-blog-post.js` + +Think of this as `componentPath` instead. + +### Redux `components ` namespace + +Mapping from `component` (path on disk) to its [Page object](#page-object). It is created every time a page is created (by listening to `CREATE_PAGE`). + +```javascript +{ + `/src/templates/template-blog-post.js`: { + query: ``, + path: `/blog/2018-07-17-announcing-gatsby-preview/`, + jsonName: `blog-2018-07-17-announcing-gatsby-preview-995`, + componentPath: `/src/templates/template-blog-post.js`, + ...restOfPage + } +} +``` + +Query starts off as empty, but is set during the extractQueries phase by [query-watcher/handleQuery](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js#L68), once the query has compiled by relay (see [Query Extraction](/docs/query-extraction/)). + +### componentChunkName + +The [page.component](#component) (path on disk), but passed (as above), kebab hashed. E.g, the componentChunkName for component + +`/src/templates/template-blog-post.js` + +is + +`component---src-templates-template-blog-post-js` + +TODO: Mention how used by webpack + +### internalComponentName + +If the path is `/`, internalComponentName = `ComponentIndex`. Otherwise, for a path of `/blog/foo`, it would be `ComponentBlogFoo`. + +Created as part of page, but currently unused. + +### page.context + +This is [merged with the page itself](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L153) and then is [passed to graphql](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js#L40) queries as the `context` parameter. + +## Query + +### dataPath + +Path to the page's query result. Relative to `/public/static/d/{modInt}`. Name is kebab hash on `path--${jsonName}`-`result->sha1->base64`. E.g + +`621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc` + +Set after [Query Execution](/docs/query-execution/#save-query-results-to-redux-and-disk) has finished. + +### Redux `jsonDataPaths` namespace (dataPaths) + +Map of page [jsonName](#jsonname) to [dataPath](#datapath). Updated after [Query Execution](/docs/query-execution/#save-query-results-to-redux-and-disk). E.g + +``` +{ + // jsonName -> dataPath + "blog-2018-07-17-announcing-gatsby-preview-995": "621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc" +} +``` + +This is also known via the `dataPaths` variable. + +### Query result file + +`/public/static/d/621/${dataPath}` + +E.g + +`/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json` + +This is the actual result of the GraphQL query that was run for the page `/blog/2018-07-17-announcing-gatsby-preview/`. The contents would look something like: + +```javascript +{ + "data": { + "markdownRemark": { + "html": "

Today we....", + "timeToRead": 2, + "fields": { + "slug": "/blog/2018-07-17-announcing-gatsby-preview/" + }, + "frontmatter": { + "title": "Announcing Gatsby Preview", + "date": "July 17th 2018", + ... + }, + ... + } + }, + "pageContext": { + "slug": "/blog/2018-07-17-announcing-gatsby-preview/", + "prev": { + ... + }, + "next": null + } +} +``` + +For a query such as: + +```javascript +export const pageQuery = graphql` + query($slug: String!) { + markdownRemark(fields: { slug: { eq: $slug } }) { + html + timeToRead + fields { + slug + } + frontmatter { + title + date(formatString: "MMMM Do YYYY") + ... + } + ... + } + } +` +``` + +## Webpack stuff + +### /public/${componentChunkName}-[chunkhash].js + +The final webpack js bundle for the blog template page + +E.g + +`/public/component---src-templates-template-blog-post-js-2df3a086e8d2cdf690aa.js` + +### /.cache/async-requires.js + +Generated javascript file that exports `components` and `data` fields. + +`components` is a mapping from `componentChunkName` to a function that imports the component's original source file path. This is used for code splitting. The import statement is a hint to webpack that that javascript file can be loaded later. The mapping And also provides a hint to the `componentChunkName` + +`data` is a function that imports `/.cache/data.json`. Which is code split in the same way + +E.g + +```js +exports.components = { + "component---src-templates-template-blog-post-js": () => + import("/Users/amarcar/dev/gatsbyjs/gatsby/www/src/templates/template-blog-post.js" /* webpackChunkName: "component---src-templates-template-blog-post-js" */), +} + +exports.data = () => + import("/Users/amarcar/dev/gatsbyjs/gatsby/www/.cache/data.json") +``` + +### .cache/data.json + +During the `pagesWriter` bootstrap phase (last phase), `pages-writer.js` writes this file to disk. It contains `dataPaths` and `pages`. + +`dataPaths` is the same as the definition above. + +`pages` is a dump of the redux `pages` component state. Each page contains: + +- componentChunkName +- jsonName +- path + diff --git a/docs/docs/behind-the-scenes.md b/docs/docs/behind-the-scenes.md new file mode 100644 index 0000000000000..e4e3c4312d246 --- /dev/null +++ b/docs/docs/behind-the-scenes.md @@ -0,0 +1,13 @@ +--- +title: Behind the Scenes +--- + +Curious how Gatsby works under the hood? The pages in this section describe how a Gatsby build works from an internal code/architecture point of view. It should be useful for anyone who needs to work on the internals of Gatsby, or for those who are simply curious how it all works, or perhaps you're a plugin author and need to understand how core works to track down a bug? Come one, come all! + +If you're looking for information on how to _use_ Gatsby to write your own site, or create a plugin, check out the rest of the Gatsby docs. This section is quite low level. + +These docs aren't supposed to be definitive, or tell you everything there is to know. But as you're exploring the Gatsby codebase, you might find yourself wondering what a concept means, or which part of the codebase implements a particular idea. These docs aim to answer those kinds of questions. + +A few more things. These docs are mostly focused on `gatsby build`. Operations specific to `gatsby develop` are mostly ignored. Though this may change in the future. Also, they mostly focus on the happy path, rather than getting bogged down in details of error handling. + +Ready? Dive in by exploring how [APIs/Plugins](/docs/how-plugins-apis-are-run/) work. diff --git a/docs/docs/build-caching.md b/docs/docs/build-caching.md new file mode 100644 index 0000000000000..73d2bdf319941 --- /dev/null +++ b/docs/docs/build-caching.md @@ -0,0 +1,8 @@ +--- +title: Build Caching +--- + +This is a stub. Help our community expand it. + +Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your +pull request gets accepted. diff --git a/docs/docs/data-storage-redux.md b/docs/docs/data-storage-redux.md new file mode 100644 index 0000000000000..e5eb099dc2a0b --- /dev/null +++ b/docs/docs/data-storage-redux.md @@ -0,0 +1,8 @@ +--- +title: Data Storage (Redux) +--- + +This is a stub. Help our community expand it. + +Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your +pull request gets accepted. diff --git a/docs/docs/how-plugins-apis-are-run.md b/docs/docs/how-plugins-apis-are-run.md new file mode 100644 index 0000000000000..640f0d7da2b82 --- /dev/null +++ b/docs/docs/how-plugins-apis-are-run.md @@ -0,0 +1,110 @@ +--- +title: How APIs/plugins are run +--- + +For most sites, plugins take up the majority of the build time. So what's really happening when APIs are called? + +_Note: this section only explains how `gatsby-node` plugins are run. Not browser or ssr plugins_ + +## Early in the build + +Early in the bootstrap phase, we [load all the configured plugins](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/bootstrap/load-plugins/index.js#L40) (and internal plugins) for the site. These are saved into redux under the `flattenedPlugins` namespace. Each plugin in redux contains the following fields: + +- **resolve**: absolute path to the plugin's directory +- **id**: String concatenation of 'Plugin ' and the name of the plugin. E.g `Plugin query-runner` +- **name**: The name of the plugin. E.g `query-runner` +- **version**: The version as per the package.json. Or if it is a site plugin, one is generated from the file's hash +- **pluginOptions**: Plugin options as specified in [gatsby-config.js](/docs/gatsby-config/) +- **nodeAPIs**: A list of node APIs that this plugin implements. E.g `[ 'sourceNodes', ...]` +- **browserAPIs**: List of browser APIs that this plugin implements +- **ssrAPIs**: List of SSR APIs that this plugin implements + +In addition, we also create a lookup from api to the plugins that implement it and save this to redux as `api-to-plugins`. This is implemented in [load-plugins/validate.js](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/bootstrap/load-plugins/validate.js#L106) + +## apiRunInstance + +Some API calls can take a while to finish. So every time an API is run, we create an object called [apiRunInstance](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L179) to track it. It contains the following notable fields: + +- **id**: Unique identifier generated based on type of API +- **api**: The API we're running. E.g `onCreateNode` +- **args**: Any arguments passed to `api-runner-node`. E.g a node object +- **pluginSource**: optional name of the plugin that initiated the original call +- **resolve**: promise resolve callback to be called when the API has finished running +- **startTime**: time that the API run was started +- **span**: opentracing span for tracing builds +- **traceId**: optional args.traceId provided if API will result in further API calls ([see below](#using-traceid-to-await-downstream-api-calls)) + +We immediately place this object into an `apisRunningById` Map, where we track its execution. + +## Running each plugin + +Next, we filter all `flattenedPlugins` down to those that implement the API we're trying to run. For each plugin, we require its `gatsby-node.js` and call its exported API function. E.g if API was `sourceNodes`, it would result in a call to `gatsbyNode['sourceNodes'](...apiCallargs)`. + +## Injected arguments + +API implementations are passed a variety of useful [actions](/docs/actions/) and other interesting functions/objects. These arguments are [created](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L94) each time a plugin is run for an API, which allows us to rebind actions with default information. + +All actions take 3 arguments: + +1. The core information required by the action. E.g for [createNode](/docs/actions/#createNode), we must pass a node +2. The plugin that is calling this action. E.g `createNode` uses this to assign the owner of the new node +3. An object with misc action options: + - **traceId**: [See below](#using-traceid-to-await-downstream-api-calls) + - **parentSpan**: opentracing span (see [tracing docs](/docs/performance-tracing/)) + +Passing the plugin and action options on every single action call would be extremely painful for plugin/site authors. Since we know the plugin, traceId and parentSpan when we're running our API, we can rebind injected actions so these arguments are already provided. This is done in the [doubleBind](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L14) step. + +## Waiting for all plugins to run + +Each plugin is run inside a [map-series](https://www.npmjs.com/package/map-series) promise, which allows them to be executed concurrently. Once all plugins have finished running, we remove them from [apisRunningById](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L246) and fire a `API_RUNNING_QUEUE_EMPTY` event. This in turn, results in any dirty pages being recreated, as well as their queries. Finally, the results are returned. + +## Using traceID to await downstream API calls + +The majority of API calls result in one or more implementing plugins being called. We then wait for them all to complete, and return. But some plugins (e.g [sourceNodes](/docs/node-apis/#sourceNodes)) result in calls to actions that themselves call APIs. We need some way of tracing whether an API call originated from another API call, so that we can wait on all child calls to complete. The mechanism for this is the `traceId`. + +```dot +digraph { + node [ shape="box" ]; + + "initialCall" [ label="apiRunner(`sourceNodes`, {\l traceId: `initial-sourceNodes`,\l waitForCascadingActions: true,\l parentSpan: parentSpan\l})\l " ]; + "apiRunner1" [ label="api-runner-node.js" ]; + "sourceNodes" [ label="plugin.SourceNodes()" ]; + "createNode" [ label="createNode(node)" ]; + "apisRunning" [ label="apisRunningByTraceId[traceId]" ]; + "createNodeReducer" [ label="CREATE_NODE reducer" ]; + "CREATE_NODE" [ label="CREATE_NODE event" ]; + "pluginRunner" [ label="plugin-runner.js" ]; + "onCreateNode" [ label="plugin.onCreateNode()" ]; + "apiRunnerOnCreateNode" [ label="apiRunner(`onCreateNode`, {\l node,\l traceId: action.traceId\l})\l "; ]; + "apiRunner2" [ label="api-runner-node.js" ]; + + "initialCall" -> "apiRunner1"; + "apiRunner1" -> "apisRunning" [ label="set to 1" ]; + "apiRunner1" -> "sourceNodes" [ label="call" ]; + "sourceNodes" -> "createNode" [ label="call (traceID passed via doubleBind)" ]; + "createNode" -> "createNodeReducer" [ label="triggers (action has traceId)" ]; + "createNodeReducer" -> "CREATE_NODE" [ label="emits (event has traceId)" ]; + "CREATE_NODE" -> "pluginRunner" [ label="handled by (event has traceId)" ]; + "pluginRunner" -> "apiRunnerOnCreateNode"; + "apiRunnerOnCreateNode" -> "apiRunner2"; + "apiRunner2" -> "onCreateNode" [ label="call" ]; + "apiRunner2" -> "apisRunning" [ label="increment" ]; +} +``` + +1. The traceID is passed as an argument to the original API runner. E.g + + ```javascript + apiRunner(`sourceNodes`, { + traceId: `initial-sourceNodes`, + waitForCascadingActions: true, + parentSpan: parentSpan, + }) + ``` + +1. We keep track of the number of API calls with this traceId in the [apisRunningByTraceId](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L139) Map. On this first invocation, it will be set to `1`. +1. Using the action rebinding mentioned [above](#injected-arguments), the traceId is passed through to all action calls via the `actionOptions` object. +1. After reducing the Action, a global event is [emitted](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/redux/index.js#L93) which includes the action information +1. For the `CREATE_NODE` and `CREATE_PAGE` events, we need to call the `onCreateNode` and `onCreatePage` APIs respectively. The [plugin-runner](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/redux/plugin-runner.js) takes care of this. It also passes on the traceId from the Action back into the API call. +1. We're back in `api-runner-node.js` and can tie this new API call back to its original. So we increment the value of [apisRunningByTraceId](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L218) for this traceId. +1. Now, whenever an API finishes running (when all its implementing plugins have finished), we decrement `apisRunningByTraceId[traceId]`. If the original API call included the `waitForCascadingActions` option, then we wait until `apisRunningByTraceId[traceId]` == 0 before resolving. diff --git a/docs/docs/internal-data-bridge.md b/docs/docs/internal-data-bridge.md new file mode 100644 index 0000000000000..7f6c5143e5a21 --- /dev/null +++ b/docs/docs/internal-data-bridge.md @@ -0,0 +1,51 @@ +--- +title: Internal Data Bridge +--- + +The Internal Data Bridge is an internal Gatsby plugin located at [internal-plugins/internal-data-bridge](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/internal-data-bridge). Its purpose is to create nodes representing pages, plugins, and site config so that they can be introspected for arbitrary purposes. As of writing, the only usage of this is by the [gatsby-plugin-sitemap](/packages/gatsby-plugin-sitemap) which uses it to... yes you guessed it, create a site map of your site. + +## Example usage + +As a site developer, you can use it write queries to introspect your site's information. E.g get all the jsonNames of your pages. + +```graphql +{ + allSitePage( limit: 10 ) { + edges { + node { + jsonName + } + } + } +} +``` + +Or, get a list of all gatsby plugins that you're using + +```graphql +{ + allSitePlugin(limit: 10) { + edges { + node { + name + } + } + } +} +``` + +## Internal types + +The internal data bridge creates 3 types of nodes that can be introspected. + +### Site + +This is a node that contains fields from your site's `gatsby-config.js`, as well as program information such as host and port for gatsby develop. + +### SitePlugin + +A Node for each plugin in your `gatsby-config.js` that contains the full contents of the plugin's `package.json`. + +### SitePage + +Internal Data Bridge implements [onCreatePage](/docs/node-apis/#onCreatePage) and creates a node of type `SitePage` that represents the created Page. Which allows you to introspect all pages created for your site. diff --git a/docs/docs/node-creation.md b/docs/docs/node-creation.md new file mode 100644 index 0000000000000..5720d5e5de989 --- /dev/null +++ b/docs/docs/node-creation.md @@ -0,0 +1,76 @@ +--- +title: Node Creation +--- + +Nodes are created by calling the [createNode](/docs/actions/#createNode) action. Nodes can be any object. + +A node is stored in redux under the `nodes` namespace, whose state is a map of the node ID to the actual node object. + +## Sourcing Nodes + +Nodes are created in Gatsby by calling [createNode](/docs/actions/#createNode). This happens primarily in the [sourceNodes](/docs/node-apis/#sourceNodes) bootstrap phase. Nodes created during this phase are top level nodes. I.e, they have no parent. This is represented by source plugins setting the node's `parent` field to `___SOURCE___`. Nodes created via transform plugins (who implement [onCreateNode](/docs/node-apis/#onCreateNode)) will have source nodes as their parents, or other transformed nodes. For a rough overview of what happens when source nodes run, see the [traceID illustration](/docs/how-plugins-apis-are-run/#using-traceid-to-await-downstream-api-calls). + +## Parent/Child/Refs + +There are a few different scenarios for creating parent/child relationships. + +### Node relationship storage model + +All nodes in Gatsby are stored in a flat structure in the redux `nodes` namespace. A node's `children` field is an array of node IDS, whose nodes are also at the top level of the redux namespace. Here's an example of the `nodes` namespace. + +```javascript +{ + `id1`: { type: `File`, children: [`id2`, `id3`], ...other_fields }, + `id2`: { type: `markdownRemark`, ...other_fields }, + `id3`: { type: `postsJson`, ...other_fields } +} +``` + +An important note here is that we do not store a distinct collection of each type of child. Rather we store a single collection that they're all packed into. This has some implications on [child field inference](/docs/schema-gql-type/#child-fields-creation) in the Schema Generation phase. + +### Explicitly recording a parent/child relationship + +This occurs when a transformer plugin implements [onCreateNode](/docs/node-apis/#onCreateNode) in order to create some child of the originally created node. In this case, the transformer plugin will call [createParentChildLink](/docs/actions/#createParentChildLink), with the original node, and the newly created node. All this does is push the child's node ID onto the parent's `children` collection and resave the parent to redux. + +This does **not** automatically create a `parent` field on the child node. If a plugin author wishes to allow child nodes to navigate to their parents in GraphQL queries, they must explicitly set `childNode.parent: 'parent.id'` when creating the child node. + +### Foreign Key reference (`___NODE`) + +We've established that child nodes are stored at the top level in redux, and are referenced via ids in their parent's `children` collection. The same mechanism drives foreign key relationships. Foreign key fields have a `___NODE` suffix on the field name. At query time, Gatsby will take the field's value as an ID, and search redux for a matching node. This is explained in more detail in [schema gqlTypes](/docs/schema-gql-type#foreign-key-reference-___node). + +### Plain objects at creation time + +Let's say you create the following node by passing it to `createNode` + +```javascript +{ + foo: 'bar', + baz: { + car: 10 + } +} +``` + +The value for `baz` is itself an object. That value's parent is the top level object. In this case, Gatsby simply saves the top level node as is to redux. It doesn't attempt to extract `baz` into its own node. It does however track the subobject's root NodeID using [Node Tracking](/docs/node-tracking/) + +During schema compilation, Gatsby will infer the sub object's type while [creating the gqlType](/docs/schema-gql-type#plain-object-or-value-field). + +## Fresh/stale nodes + +Every time a build is re-run, there is a chance that a node that exists in the redux store no longer exists in the original data source. E.g a file might be deleted from disk between runs. We need a way to indicate that fact to Gatsby. + +To track this, there is a redux `nodesTouched` namespace that tracks whether a particular node ID has been touched. This occurs whenever a node is created (handled by [CREATE_NODE](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/reducers/nodes-touched.js)), or an explicit call to [touchNode](/docs/actions/#touchNode). + +When a `source-nodes` plugin runs again, it generally recreates nodes (which automatically touches them too). But in some cases, such as [transformer-screenshot](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-transformer-screenshot/src/gatsby-node.js#L56), a node might not change, but we still want to keep it around for the build. In these cases, we must explicitly call `touchNode`. + +Any nodes that aren't touched by the end of the `source-nodes` phase, are deleted. This is performed via a diff between the `nodesTouched` and `nodes` redux namespaces, in [source-nodes.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/utils/source-nodes.js) + +## Changing a node's fields + +From a site developer's point of view, nodes are immutable. In the sense that if you simply change a node object, those changes will not be seen by other parts of Gatsby. To make a change to a node, it must be persisted to redux via an action. + +So, how do you add a field to an existing node? E.g perhaps in onCreateNode, you want to add a transformer specific field? You can call [createNodeField]() and this will simply add your field to the node's `node.fields` object and then persists it to redux. This can then be referenced by other parts of your plugin at later stages of the build. + +## Node Tracking + +When a node is created, `createNode` will track all its fields against its nodeId. See [Node Tracking Docs](/docs/node-tracking/) for more. diff --git a/docs/docs/node-tracking.md b/docs/docs/node-tracking.md new file mode 100644 index 0000000000000..ee9e7b3a7317d --- /dev/null +++ b/docs/docs/node-tracking.md @@ -0,0 +1,62 @@ +--- +title: Node Tracking +--- + +## Track Nodes + +You may see calls to `trackInlineObjectsInRootNode()` and `findRootNodeAncestor()` in some parts of the code. These are both defined in [schema/node-tracking.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/node-tracking.js). Node tracking is the tracking of relationships between a node's object values (not children), and the node's ID. E.g Take, the following node: + +```javascript +let nodeA = { + id: `id2`, + internal: { + type: `footype` + }, + foo: { + myfile: "blog/my-blog.md", + b: 2 + }, + bar: 7, + parent: `id1`, + baz: [ { x: 8 }, 9 ] +} +``` + +Its sub objects are `foo` (value = `{ myfile: "blog/my-blog.md", b: 2}`), and those in the `baz` array (`{ x: 8 }`). Node tracking will track those back to the top level node's ID (`id2` in this case). The [trackInlineObjectsinRootNode()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/node-tracking.js#L32) function takes care of this and records those relationships in the [rootNodeMap](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/node-tracking.js#L9) WeakMap. E.g after calling `trackInlineObjectsInRootNode(nodeA)`, `rootNodeMap` would contain the following records: + +```javascript +// rootNodeMap: +{ + { blog: "blog/my-blog.md", b: 2 } => "id2", // from `foo` field + { x: 8 } => "id2", // from `baz` array + { // top level object is tracked too + id: `id2`, + internal: { // internal is not mapped + type: `footype` + }, + foo: { + blog: "blog/my-blog.md", + b: 2 + }, + bar: 7, + parent: `id1`, + baz: [ { x: 8 }, 9 ] + } => "id2" +} +``` + +## Find Root Nodes + +To access this information, `node-tracking.js` provides the [findRootNodeAncestor()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/node-tracking.js#L52) function. It takes an object, and looks up its parent's nodeID in `rootNodeMap`. It then finds the actual node in redux. It then gets that node's `parent` ID, and gets the parent node from redux. And continues in this way until the root node is found. + +In the above example, `nodeA` has parent `id1`. So `findRootNodeAncestor({ blog: "blog/my-blog.md", b: 2 })` would return the node for `id1` (the parent). + +## Why/Where? + +Where is node-tracking used? First up, nodes are tracked in 2 places. Firstly, in [createNode](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/actions.js#L539), every time a node is created, we link all its sub objects to the new NodeID. Nodes are also tracked whenever they are resolved in [run-sift](/docs/schema-sift/#3-resolve-inner-query-fields-on-all-nodes). This is necessary because [custom plugin fields](/docs/schema-input-gql/#inferring-input-filters-from-plugin-fields/) might return new objects that weren't created when the node was initially made. + +Now, where do we use this information? In 2 places. + +1. In the `File` type resolver. It is used to lookup the node's root, which should be of type `File`. We can then use that root node's base directory attribute to create the full path of the resolved field's value, and therefore find the actual `File` node that the string value is desribing. See [File GqlType inference](http://localhost:8000/docs/schema-gql-type/#file-types) for more info. +1. To recursively look up node descriptions in [type-conflict-reporter.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/type-conflict-reporter.js) + diff --git a/docs/docs/page-creation.md b/docs/docs/page-creation.md new file mode 100644 index 0000000000000..3a145cdf25363 --- /dev/null +++ b/docs/docs/page-creation.md @@ -0,0 +1,21 @@ +--- +title: Page Creation +--- + +A page is created by calling the [createPage](/docs/actions/#createPage) action. There are two main side effects that occur when a page is created. + +1. The `pages` redux namespace is updated +1. The `components` redux namespace is updated +1. `onCreatePage` API is executed + +## Update Pages redux namespace + +The `pages` redux namespace is a map of page `path` to page object. The [pages reducer](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/reducers/pages.js) takes care of updating this on a `CREATE_PAGE` action. It also creates a [Foreign Key Reference](/docs/schema-gql-type/#foreign-key-reference-___node) to plugin that created the page by adding a `pluginCreator___NODE` field. + +## Update Components redux namespace + +The `components` redux namespace is a map of [componentPath](/docs/behind-the-scenes-terminology/#component) (file with React component) to the Component object. A Component object is simply the Page object but with an empty query string (that will be set during [Query Extraction](/docs/query-extraction/#store-queries-in-redux)). + +## onCreatePage API + +Every time a page is created, plugins have the opportunity to handle its [onCreatePage](/docs/node-apis/#onCreatePage) event. This is used for things like creating `SitePage` nodes in [Internal Data Bridge](/docs/internal-data-bridge/), and for "path" related plugins such as [gatsby-plugin-create-client-paths](/packages/gatsby-plugin-create-client-paths/) and [gatsby-plugin-remove-trailing-slashes](/packages/gatsby-plugin-remove-trailing-slashes/). diff --git a/docs/docs/page-node-dependencies.md b/docs/docs/page-node-dependencies.md new file mode 100644 index 0000000000000..f00d1a3c53c3b --- /dev/null +++ b/docs/docs/page-node-dependencies.md @@ -0,0 +1,66 @@ +--- +title: Page -> Node Dependency Tracking +--- + +In almost every GraphQL Resolver, you'll see the [createPageDependency](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/actions.js#L788), or [getNodeAndSavePathDependency](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/index.js#L198) functions. These are responsible for recording which nodes are dependended on by which pages. In `gatsby develop` mode, if a node's content changes, we re-run pages whose queries depend on that node. This is one of the things that makes `gatsby develop` so awesome. + +## How dependencies are recorded + +Recording of Page -> Node dependencies are handled by the [createPageDependency](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/actions.js#L788) action. It takes the page (in the form of its `path`), and either a `nodeId`, or `connection`. + +If a `nodeId` is passed, we're telling Gatsby that the page depends specifically on this node. So, if the node is changed, then the page's query needs to be re-executed. + +`connection` is a Type string. E.g `MarkdownRemark`, or `File`. If `createPageDependency` is called with a page path and a `connection`, we are telling Gatsby that this page depends on all nodes of this type. Therefore if any node of this type changes (e.g a change to a markdown node), then this page must be rebuilt. This variant is only called from [run-sift.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/run-sift.js#L264) when we're running a query such as `allFile`, or `allMarkdownRemark`. See [Schema Connections](/docs/schema-connections/) for more info. + +## How dependencies are stored + +Page -> Node dependencies are tracked via the `componentDataDependencies` redux namespace. `createPageDependency` is the only way to mutate it. The namespace is comprised of two sub structures: + +```javascript +{ + nodes: { ... }, // mapping nodeId -> pages + connections: { ... } // mapping of type names -> pages +} +``` + +**Nodes** is a map of nodeID to the set of pages that depend on that node. E.g + +```javascript +// state.componentDataDependencies.nodes +{ + `ID of Some MarkdownRemark node`: [ + `blogs/my-blog1`, + `blogs/my-blog2` + ], + `otherId`: [ `more pages`, ...]. + ... +} +``` + +**Connections** is a map of type name to the set of pages that depend on that type. e.g + +```javascript +// state.componentDataDependencies.connections +{ + `MarkdownRemark`: [ + `blogs/my-blog1`, + `blogs/my-blog2` + ], + `File`: [ `more pages`, ... ], + ... +} +``` + +## How dependency information is used + +Page -> Node dependencies are used entirely during query execution to figure out which nodes are "dirty", and therefore which page's queries need to be re-executed. This occurs in `page-query-runner.js` in the [findIdsWithoutDataDependencies](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L89) and [findDirtyIds](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L171) functions. This is described in greater detail in the [Query Execution](http://localhost:8000/docs/query-execution/) docs. + +## Other forms + +### add-page-dependency.js + +[redux/actions/add-page-dependency.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/actions/add-page-dependency.js) is a wrapper around the `createPageDependency` action that performs some additional performance optimizations. It should be used instead of the raw action. + +### getNodeAndSavePathDependency action + +The [getNodeAndSavePathDependency](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/index.js#L198) action simply calls `getNode`, and then calls `createPageDependency` using that result. It is a programmer convenience. diff --git a/docs/docs/query-behind-the-scenes.md b/docs/docs/query-behind-the-scenes.md new file mode 100644 index 0000000000000..ef078cf2b729f --- /dev/null +++ b/docs/docs/query-behind-the-scenes.md @@ -0,0 +1,13 @@ +--- +title: How Queries Work +--- + +We're talking about GraphQL queries here. These can be tagged graphql expressions at the bottom of your component source file (e.g [query for Gatsby frontpage](https://github.com/gatsbyjs/gatsby/blob/master/www/src/pages/index.js#L225)), StaticQueries within your components (e.g [showcase site details](https://github.com/gatsbyjs/gatsby/blob/master/www/src/components/showcase-details.js#L103)), or fragments created by plugins (e.g [gatsby-source-contentful](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-source-contentful/src/fragments.js)). + +Note that we are NOT talking about queries involved in the creation of your pages, which is usually performed in your site's gatsby-node.js (e.g [Gatby's website](https://github.com/gatsbyjs/gatsby/blob/master/www/gatsby-node.js#L85)). We're only talking about queries that are tied to particular pages or templates. + +Almost all logic to do with queries is in the internal-plugin [query-runner](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner). There are two steps involved in a Query's life time. The first is extracting it, and the second is running it. These are separated into two bootstrap phases. + +1. [Query Extraction](/docs/query-extraction/) +2. [Query Execution](/docs/query-execution/) + diff --git a/docs/docs/query-execution.md b/docs/docs/query-execution.md new file mode 100644 index 0000000000000..0145bd3d16657 --- /dev/null +++ b/docs/docs/query-execution.md @@ -0,0 +1,135 @@ +--- +title: Query Execution +--- + +### Query Execution + +Query Execution is kicked off by bootstrap by calling [page-query-runner.js runInitialQuerys()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L29). The main files involved in this step are: + +- [page-query-runner.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) +- [query-queue.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) +- [query-runner.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js) + +Here's an overview of how it all relates: + +```dot +digraph { + compound = true; + + subgraph cluster_other { + style = invis; + extractQueries [ label = "query-watcher.js", shape = box ]; + componentsDD [ label = "componentDataDependencies\l(redux)", shape = cylinder ]; + components [ label = "components\l (redux)", shape = cylinder ]; + createNode [ label = "CREATE_NODE action", shape = box ]; + } + + subgraph cluster_pageQueryRunner { + label = "page-query-runner.js" + + dirtyActions [ label = "dirtyActions", shape = cylinder ]; + extractedQueryQ [ label = "queueQueryForPathname()", shape = box ]; + findIdsWithoutDD [ label = "findIdsWithoutDataDependencies()", shape = box ]; + findDirtyActions [ label = "findDirtyActions()", shape = box ]; + queryJobs [ label = "runQueriesForPathnames()", shape = box ]; + + extractedQueryQ -> queryJobs; + findIdsWithoutDD -> queryJobs; + dirtyActions -> findDirtyActions [ weight = 100 ]; + findDirtyActions -> queryJobs; + } + + subgraph cluster_queryQueue { + label = "query-queue.js"; + queryQ [ label = "better-queue", shape = box ]; + } + + subgraph cluster_queryRunner { + label = "query-runner.js" + graphqlJs [ label = "graphqlJs(schema, query, context, ...)" ]; + result [ label = "Query Result" ]; + + graphqlJs -> result; + } + + diskResult [ label = "/public/static/d/${dataPath}", shape = cylinder ]; + jsonDataPaths [ label = "jsonDataPaths\l(redux)", shape = cylinder ]; + + result -> diskResult; + result -> jsonDataPaths; + + extractQueries -> extractedQueryQ; + componentsDD -> findIdsWithoutDD; + components -> findIdsWithoutDD; + createNode -> dirtyActions; + + queryJobs -> queryQ [ lhead = cluster_queryQueue ]; + + queryQ -> graphqlJs [ lhead = cluster_queryRunner ]; +} +``` + +#### Figuring out which queries need to be executed + +The first thing this query does is figure out what queries even need to be run. You would think this would simply be a matter of running the Queries that were enqueued in [Extract Queries](/docs/query-extraction/), but matters are complicated by support for `gatsby develop`. Below is the logic for figuring out which queries need to be executed (code is in [runQueries()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L36)). + +##### Already queued queries + +All queries queued after being extracted (from `query-watcher.js`). + +##### Queries without node dependencies + +All queries whose component path isn't listed in `componentDataDependencies`. As a recap, in [Schema Generation](/docs/schema-generation/), we showed that all Type resolvers record a dependency between the page whose query we're running and any nodes that were successfully resolved. So, If a component is declared in the `components` redux namespace (occurs during [Page Creation](/docs/page-creation/)), but is *not* contained in `componentDataDependencies`, then by definition, the query has not been run. Therefore we need to run it. Checkout [Page -> Node Dependencies](/docs/page-node-dependencies/) for more info. The code for this step is in [findIdsWithoutDataDependencies](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L89). + +##### Pages that depend on dirty nodes + +In `gatsby develop` mode, every time a node is created, or is updated (e.g via editing a markdown file), we add that node to the [enqueuedDirtyActions](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L61) collection. When we execute our queries, we can lookup all nodes in this collection and map them to pages that depend on them (as described above). These pages' queries must also be executed. In addition, this step also handles dirty `connections` (see [Schema Connections](/docs/schema-connections/)). Connections depend on a node's type. So if a node is dirty, we mark all connection nodes of that type dirty as well. The code for this step is in [findDirtyIds](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L171). _Note: dirty ids is really talking about dirty paths_. + +#### Queue Queries for Execution + +We now have the list of all pages that need to be executed (linked to their Query information). Let's queue them for execution (for realz this time). A call to [runQueriesForPathnames](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L21) kicks off this step. For each page or static query, we create a Query Job that looks something like: + +```javascript +{ + id: // page path, or static query hash + hash: // only for static queries + jsonName: // jsonName of static query or page + query: // raw query text + componentPath: // path to file where query is declared + isPage: // true if not static query + context: { + path: // if staticQuery, is jsonName of component + ...page // page object. Not for static queries + ...page.context // not for static queries + } +} +``` + +This Query Job contains everything we need to execute the query (and do things like recording dependencies between pages and nodes). So, we push it onto the queue in [query-queue.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) and then wait for the queue to empty. Let's see how `query-queue` works. + +#### Query Queue Execution + +[query-queue.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) creates a [better-queue](https://www.npmjs.com/package/better-queue) queue that offers advanced features like parallel execution, which is handy since querys do not depend on each other so we can take advantage of this. Every time an item is consumed from the queue, we call [query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js) where we finally actually execute the query! + +Query execution involves calling the [graphql-js](https://graphql.org/graphql-js/) library with 3 pieces of information: + +1. The Gatsby schema that was inferred during [Schema Generation](/docs/schema-generation/). +1. The raw query text. Obtained from the Query Job. +1. The Context, also from the Query Job. Has the page's `path` amongst other things so that we can record [Page -> Node Dependencies](/docs/page-node-dependencies/). + +Graphql-js will parse the query, and executes the top level query. E.g `allMarkdownRemark( limit: 10 )` or `file( relativePath: { eq: "blog/my-blog.md" } )`. These will invoke the resolvers defined in [Schema Connections](/docs/schema-connections/) or [GQL Type](/docs/schema-gql-type/), which both use sift to query over all nodes of the type in redux. The result will be passed through the inner part of the graphql query where each type's resolver will be invoked. The vast majority of these will be `identity` functions that just return the field value. Some however could call a [custom plugin field](/docs/schema-gql-type/#plugin-fields) resolver. These in turn might perform side effects such as generating images. This is why the query execution phase of bootstrap often takes the longest. + +Finally, a result is returned. + +#### Save Query results to redux and disk + +As queries are consumed from the queue and executed, their results are saved to redux and disk for consumption later on. This involves converting the result to pure JSON, and then saving it to its [dataPath](/docs/behind-the-scenes-terminology/#datapath). Which is relative to `public/static/d`. The data path includes the jsonName and hash. E.g: for the page `/blog/2018-07-17-announcing-gatsby-preview/`, the queries results would be saved to disk as something like: + +``` +/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json +``` + +For static queries, instead of using the page's jsonName, we just use a hash of the query. + +Now we need to store the association of the page -> the query result in redux so we can recall it later. This is accomplished via the [json-data-paths](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/reducers/json-data-paths.js) reducer which we invoke by creating a `SET_JSON_DATA_PATH` action with the page's jsonName and the saved dataPath. + diff --git a/docs/docs/query-extraction.md b/docs/docs/query-extraction.md new file mode 100644 index 0000000000000..32e68ed23dfd6 --- /dev/null +++ b/docs/docs/query-extraction.md @@ -0,0 +1,110 @@ +--- +title: Query Extraction +--- + +### Extracting Queries from Files + +Up until now, we have [sourced all nodes](/docs/node-creation/) into redux, [inferred a schema](/docs/schema-generation/) from them, and [created all pages](/docs/page-creation/). The next step is to extract and compile all graphql queries from our source files. The entrypoint to this phase is [query-watcher extractQueries()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js), which immediately compiles all graphql queries by calling into [query-compiler.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-compiler.js). + +#### Query Compilation + +The first thing it does is use [babylon-traverse](https://babeljs.io/docs/en/next/babel-traverse.html) to load all javascript files in the site that have graphql queries in them. This produces AST results that are passed to the [relay-compiler](https://facebook.github.io/relay/docs/en/compiler-architecture.html). This accomplishes a couple of things: + +1. It informs us of any malformed queries, which are promptly reported back to the user. +1. It builds a tree of queries and fragments they depend on. And outputs a single optimized query string with the fragments. + +After this step, we will have a map of file paths (of site files with queries in them) to Query Objects, which contain the raw optimized query text, as well as other metadata such as the component path and page `jsonName`. The following diagram shows the flow involved during query compilation + +```dot +digraph { + fragments [ label = "fragments. e.g\l.cache/fragments/fragment1.js", shape = cylinder ]; + srcFiles [ label = "source files. e.g\lsrc/pages/my-page.js", shape = cylinder ]; + components [ label = "redux.state.components\l(via createPage)", shape = cylinder ]; + schema [ label = "Gatsby schema", shape = cylinder, URL = "/docs/schema-generation/" ]; + + subgraph cluster_compiler { + label = "query-compiler.js"; + fileQueries [ label = "files containing queries", shape = box ]; + babylon [ label = "parse files with babylon\lfilter those with queries" ]; + queryAst [ label = "QueryASTs", shape = box ]; + relayCompiler [ label = "Relay Compiler" ]; + queries [ label = "{ Queries | { filePath | query } }", shape = record ]; + query [ label = "{\l name: filePath,\l text: rawQueryText,\l originalText: original text from file,\l path: filePath,\l isStaticQuery: if it is,\l hash: hash of query\l}\l ", shape = box ]; + + } + + fileQueries -> babylon; + babylon -> queryAst; + queryAst -> relayCompiler; + relayCompiler -> queries; + queries:query -> query; + fragments -> fileQueries; + srcFiles -> fileQueries; + components -> fileQueries; + schema -> relayCompiler; + + fragments -> srcFiles [ style = invis ]; + fragments -> components [ style = invis ]; +} +``` + +#### Store Queries in Redux + +We're now in the [handleQuery](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js#L68) function. + +If the query is a `StaticQuery`, we call the `replaceStaticQuery` action to save it to to the `staticQueryComponents` namespace which is a mapping from a component's path to an object that contains the raw GraphQL Query amonst other things. More details in [Static Queries](/docs/static-vs-normal-queries/). We also remove component's `jsonName` from the `components` redux namespace. See [Page -> Node Dependencies](/docs/page-node-dependencies/). + +If the query is just a normal every day query (not StaticQuery), then we update its component's `query` in the redux `components` namespace via the [replaceComponentQuery](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/actions.js#L827) action. + +```dot +digraph { + compound = true; + + compiler [ label = "query-compiler.js" ]; + + subgraph cluster_watcher { + label = "query-watcher.js:handleQuery()" + query [ label = "{\l name: filePath,\l text: rawQueryText,\l originalText: original text from file,\l path: filePath,\l isStaticQuery: if it is,\l hash: hash of query\l}\l ", shape = box ]; + replaceStaticQuery [ label = "replaceStaticQuery()" ]; + staticQueryComponents [ label = "staticQueryComponents\l (redux)", shape = cylinder ]; + replaceComponentQuery [ label = "replaceComponentQuery()" ]; + components [ label = "components\l (redux)", shape = cylinder ]; + + query -> replaceStaticQuery [ label = "if static query" ]; + query -> replaceComponentQuery [ label = "if not static" ]; + replaceStaticQuery -> staticQueryComponents; + replaceComponentQuery -> components [ label = "set `query` attribute" ]; + } + + compiler -> query [ label = "for each compiled query", lhead = cluster_watcher ]; +} +``` + + +#### Queue for execution + +Now that we've saved our query, we're ready to queue it for execution. Query execution is mainly handled by [page-query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js), so we accomplish this by passing the component's path to `queueQueryForPathname` function. + + +```dot +digraph { + compound = true; + compiler [ label = "query-compiler.js" ]; + + subgraph cluster_watcher { + label = "query-watcher.js:handleQuery()" + query [ label = "{\l name: filePath,\l text: rawQueryText,\l originalText: original text from file,\l path: filePath,\l isStaticQuery: if it is,\l hash: hash of query\l}\l ", shape = box ]; + + } + + subgraph cluster_pageQueryRunner { + label = "page-query-runner.js" + queueQueryForPathname [ label = "queueQueryForPathname()" ]; + } + + compiler -> query [ label = "for each compiled query", lhead = cluster_watcher ]; + query -> queueQueryForPathname [ label = "queue for execution" ]; +} +``` + +Now let's learn about [Query Execution](/docs/query-execution/). diff --git a/docs/docs/recipes.md b/docs/docs/recipes.md index 7ed446fb491f8..408d46ab984d5 100644 --- a/docs/docs/recipes.md +++ b/docs/docs/recipes.md @@ -103,3 +103,4 @@ Transforming data in Gatsby is also plugin-driven; Transformer plugins take data - Walk through an example using the `gatsby-transformer-remark` plugin to transform markdown files [tutorial part six](/tutorial/part-six/#transformer-plugins) - Search available transformer plugins in the [Gatsby library](/plugins/?=transformer) + diff --git a/docs/docs/schema-connections.md b/docs/docs/schema-connections.md new file mode 100644 index 0000000000000..fbc9d9a06638c --- /dev/null +++ b/docs/docs/schema-connections.md @@ -0,0 +1,98 @@ +--- +title: Schema connections +--- + +## What are schema connections? + +So far in schema generation, we have covered how [GraphQL types are inferred](/docs/schema-gql-type), how [query arguments for types](/docs/schema-input-gql) are created, and how [sift resolvers](/docs/schema-sift) work. But all of these only allow querying down to a single node of a type. Schema connections is the ability to query over **collections** of nodes of a type. For example, if we want to query all markdown nodes by some criteria, it will allow us to write queries such as: + +```graphql +{ + allMarkdownRemark(filter: {frontmatter: {tags: {in: "wordpress"}}}) { + edges { + node { + ... + } + } + } +} +``` + +Other features covered by schema connections are aggregators and reducers such as `distinct`, `group` and `totalCount`, `edges`, `skip`, `limit`, and more. + +### Connection/Edge + +A connection is an abstraction that describes a collection of nodes of a type, and how to query and navigate through them. In the above example query, `allMarkdownRemark` is a Connection Type. Its field `edges` is analagous to `results`. Each Edge points at a `node` (in the collection of all markdownRemark nodes), but it also points to the logical `next` and `previous` nodes, relative to the `node` in the collection (meaningful if you provided a `sort` arg). + +_Fun Fact: This stuff is all based on [relay connections](https://facebook.github.io/relay/graphql/connections.htm) concepts_ + +The ConnectionType also defines input args to perform paging using the `skip/limit` pattern. The actual logic for paging is defined in the [graphql-skip-limit](https://www.npmjs.com/package/graphql-skip-limit) library in [arrayconnection.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/graphql-skip-limit/src/connection/arrayconnection.js). It is invoked as the last part of the [run-sift](/docs/schema-sift#5-run-sift-query-on-all-nodes) function. To aid in paging, the ConnectionType also defines a `pageInfo` field with a `hasNextPage` field. + +The ConnectionType is defined in the [graphql-skip-limit connection.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/graphql-skip-limit/src/connection/connection.js) file. Its construction function takes a Type, and uses it to create a connectionType. E.g passing in `MarkdownRemark` Type would result in a `MarkdownRemarkConnection` type whose `edges` field would be of type `MarkdownRemarkEdge`. + +### GroupConnection + +A GroupConnection is a Connection with extended functionality. Instead of simply providing the means to access nodes in a collection, it allows you to group those nodes by one of its fields. It _is_ a `Connection` Type itself, but with 3 new fields: `field`, `fieldValue`, and `totalCount`. It adds a new input argument to `ConnectionType` whose value can be any (possibly nested) field on the original type. + +The creation of the GroupConnection is handled in [build-connection-fields.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-connection-fields.js#L57). It's added as the `group` field to the top level type connection. This is most easily shown in the below diagram. + +```dot +digraph structs { + node [shape=Mrecord]; + mdConn [ label = "{ MarkdownRemarkConnection\l (allMarkdownRemark) | pageInfo | edges | group | distinct | totalCount }" ]; + mdEdge [ label = "{ MarkdownRemarkEdge | node | next | previous }" ]; + mdGroupConn [ label = "{ MarkdownRemarkGroupConnectionConnection | pageInfo | edges | field | fieldValue | totalCount }" ]; + mdGroupConnEdge [ label = "{ MarkdownRemarkGroupConnectionEdge | node | next | previous }" ]; + mdConn:group -> mdGroupConn; + mdConn:edges -> mdEdge; + mdGroupConn:edges -> mdGroupConnEdge; +} +``` + +Let's see this in practice. Say we were trying to group all markdown nodes by their author. We would query the top level `MarkdownRemarkConnection` (`allMarkdownRemark`) which would return a `MarkdownRemarkConnection` with this new group input argument, which would return a `MarkdownRemarkGroupConnectionConnection` field. E.g: + +```graphql +{ + allMarkdownRemark { + group(field: frontmatter___author) { + fieldValue + edges { + node { + frontmatter { + title + } + }, + }, + } + } +} +``` + +#### Field enum value + +The `frontmatter___author` value is interesting. It describes a nested field. I.e, we want to group all markdown nodes by their `frontmatter.author` field. The author field in each frontmatter subobject. So why not use a period? The problem is that GraphQL doesn't allow periods in fields names, so we instead use `___`, and then in the [resolver](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-connection-fields.js#L69), we convert it back to a period. + +The second interesting thing is that `frontmatter___author` is not a string, but rather a GraphQL enum. You can verify this by using intellisense in GraphiQL to see all possible values. This implies that Gatsby has generated all possible field names. Which is true! To do this, we create an [exampleValue](/docs/schema-gql-type#gqltype-creation) and then use the [flat](https://www.npmjs.com/package/flat) library to flatten the nested object into string keys, using `___` delimeters. This is handled by the [data-tree-utils.js/buildFieldEnumValues](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/data-tree-utils.js#L277) function. + +Note, the same enum mechanism is used for creation of `distinct` fields + +#### Group Resolver + +The resolver for the Group type is created in [build-connection-fields.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-connection-fields.js#L57). It operates on the result of the core connection query (e.g `allMarkdownRemark`), which is a `Connection` object with edges. From these edges, we retrieve all the nodes (each edge has a `node` field). And now we can use lodash to group those nodes by the fieldname argument (e.g `field: frontmatter___author`). + +If sorting was specified ([see below](#sorting)), we sort the groups by fieldname, and then apply any `skip/limit` arguments using the [graphql-skip-limit](https://www.npmjs.com/package/graphql-skip-limit) library. Finally we are ready to fill in our `field`, `fieldValue`, and `totalCount` fields on each group, and we can return our resolved node. + +### Input filter creation + +Just like in [gql type input filters](/docs/schema-input-gql), we must generate standard input filters on our connectiontype arguments. As a reminder, these allow us to query any fields by predicates such as `{ eq: "value" }`, or `{ glob: "foo*" }`. This is covered by the same functions (in [infer-graphql-object-type.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-input-fields.js)), except that we're passing in Connection types instead of basic types. The only difference is that we use the `sort` field ([see below](#sorting)) + +### Sorting + +A `sort` argument can be added to the `Connection` type (not the `GroupConnection` type). You can sort by any (possibly nested) field in the connection results. These are enums that are created via the same mechanism described in [enum fields](#field-enum-value). Except that the inference of these enums occurs in [infer-graphql-input-type.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-input-fields.js#L302). + +The Sort Input Type itself is created in [build-node-connections.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-connections.js#L49) and implemented by [create-sort-field.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/create-sort-field.js). The actual sorting occurs in run-sift (below). + +### Connection Resolver (sift) + +Finally, we're ready to define the resolver for our Connection type (in [build-node-connections.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-connections.js#L65)). This is where we come up with the name `all${type}` (e.g `allMarkdownRemark`) that is so common in Gatsby queries. The resolver is fairly simple. It uses the [sift.js](https://www.npmjs.com/package/sift) library to query across all nodes of the same type in redux. The big difference is that we supply the `connection: true` parameter to `run-sift.js` which is where sorting, and pagination is actually executed. See [Querying with Sift](/docs/schema-sift) for how this actually works. + diff --git a/docs/docs/schema-generation.md b/docs/docs/schema-generation.md new file mode 100644 index 0000000000000..9c81c1a34096a --- /dev/null +++ b/docs/docs/schema-generation.md @@ -0,0 +1,65 @@ +--- +title: Schema Generation +--- + +Once the nodes have been sourced and transformed, the next step is to generate the GraphQL Schema. This is one of the more complex parts of the Gatsby code base. In fact, as of writing, it accounts for a third of the lines of code in core Gatsby. It involves inferring a GraphQL schema from all the nodes that have been sourced and transformed so far. Read on to find out how it's done. + +### Group all nodes by type + +Each sourced or transformed node has a `node.internal.type`, which is set by the plugin that created it. E.g, the `source-filesystem` plugin [sets the type to File](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-source-filesystem/src/create-file-node.js#L46). The `transformer-json` plugin creates a dynamic type [based on the parent node](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-transformer-json/src/gatsby-node.js#L48). E.g `PostsJson` for a `posts.json` file. + +During the schema generation phase, we must generate what's called a `ProcessedNodeType` in Gatsby. This is a simple structure that builds on top of a [graphql-js GraphQLObjectType](https://graphql.org/graphql-js/type/#graphqlobjecttype). Our goal in the below steps is to infer and construct this object for each unique node type in redux. + +The flow is summarized by the below graph. It shows the intermediate transformations or relevant parts of the user's graphql query that are performed by code in the Gatsby [schema folder](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/schema), finally resulting in the [ProcessedNodeType](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L182). It uses the example of building a `File` GraphQL type. + +```dot +digraph graphname { + "pluginFields" [ label = "custom plugin fields\l{\l publicURL: {\l type: GraphQLString,\l resolve(file, a, c) { ... }\l }\l}\l ", shape = box ]; + "typeNodes" [ label = "all redux nodes of type\le.g internal.type === `File`", shape = "box" ]; + "exampleValue" [ label = "exampleValue\l{\l relativePath: `blogs/my-blog.md`,\l accessTime: 8292387234\l}\l ", shape = "box" ]; + "resolve" [ label = "ProcessedNodeType\l including final resolve()", shape = box ]; + "gqlType" [ label = "gqlType (GraphQLObjectType)\l{\l fields,\l name: `File`\l}\l ", shape = box ]; + "parentChild" [ label = "Parent/Children fields\lnode {\l childMarkdownRemark { html }\l parent { id }\l}\l ", shape = "box" ]; + "objectFields" [ label = "Object node fields\l node {\l relativePath,\l accessTime\l}\l ", shape = "box" ]; + "inputFilters" [ label = "InputFilters\lfile({\l relativePath: {\l eq: `blogs/my-blog.md`\l }\l})\l ", shape = box ] + + "pluginFields" -> "inputFilters"; + "pluginFields" -> "gqlType"; + "objectFields" -> "gqlType"; + "parentChild" -> "gqlType" + "gqlType" -> "inputFilters"; + "typeNodes" -> "exampleValue"; + "typeNodes" -> "parentChild"; + "typeNodes" -> "resolve"; + "exampleValue" -> "objectFields"; + "inputFilters" -> "resolve"; + "gqlType" -> "resolve"; +} +``` + +### For each unique Type + +The majority of schema generation code kicks off in [build-node-types.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js). The below steps will be executed for each unique type. + +#### 1. Plugins create custom fields + +Gatsby infers GraphQL Types from the fields on the sourced and transformed nodes. But before that, we allow plugins to create their own custom fields. For example, `source-filesystem` creates a [publicURL](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-source-filesystem/src/extend-file-node.js#L11) field that when resolved, will copy the file into the `public/static` directory and return the new path. + +To declare custom fields, plugins implement the [setFieldsOnGraphQLNodeType](/docs/node-apis/#setFieldsOnGraphQLNodeType) API and apply the change only to types that they care about (e.g source-filesystem [only proceeds if type.name = `File`](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-source-filesystem/src/extend-file-node.js#L6). During schema generation, Gatsby will call this API, allowing the plugin to declare these custom fields, [which are returned](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L151) to the main schema process. + +#### 2. Create a "GQLType" + +This step is quite complex, but at its most basic, it infers GraphQL Fields by constructing an `exampleObject` that merges all fields of the type in Redux. It uses this to infer all possible fields and their types, and construct GraphQL versions of them. It does the same for fields created by plugins (like in step 1). This step is explained in detail in [GraphQL Node Types Creation](/docs/schema-gql-type). + +#### 3. Create Input filters + +This step creates GraphQL input filters for each field so the objects can be queried by them. More details in [Building the Input Filters](/docs/schema-input-gql). + +#### 4. ProcessedTypeNode creation with resolve implementation + +Finally, we have everything we need to construct our final Gatsby Type object (known as `ProcessedTypeNode`). This contains the input filters and gqlType created above, and implements a resolve function for it using sift. More detail in the [Querying with Sift](/docs/schema-sift) section. + +#### 5. Create Connections for each type + +We've inferred all GraphQL Types, and the ability to query for a single node. But now we need to be able to query for collections of that type (e.g `allMarkdownRemark`). [Schema Connections](/docs/schema-connections/) takes care of that. + diff --git a/docs/docs/schema-gql-type.md b/docs/docs/schema-gql-type.md new file mode 100644 index 0000000000000..e801fb309b42e --- /dev/null +++ b/docs/docs/schema-gql-type.md @@ -0,0 +1,206 @@ +--- +title: GraphQL Node Types Creation +--- + +Gatsby creates a [GraphQLObjectType](https://graphql.org/graphql-js/type/#graphqlobjecttype) for each distinct `node.internal.type` that is created during the source-nodes phase. Find out below how this is done. + +## GraphQL Types for each type of node + +When running a GraphQL query, there are a variety of fields that you will want to query. Let's take an example, say we have the below query: + +```graphql +{ + file( relativePath: { eq: `blogs/my-blog.md` } ) { + childMarkdownRemark { + frontmatter: { + title + } + } + } +} +``` + +When GraphQL runs, it will query all `file` nodes by their relativePath and return the first node that satisfies that query. Then, it will filter down the fields to return by the inner expression. I.e `{ childMarkdownRemark ... }`. The building of the query arguments is covered by the [Inferring Input Filters](/docs/schema-input-gql) doc. This section instead explains how the inner filter schema is generated (it must be generated before input filters are inferred). + +During the [sourceNodes](/docs/node-apis/#sourceNodes) phase, let's say that [gatsby-source-filesystem](/packages/gatsby-source-filesystem) ran and created a bunch of `File` nodes. Then, different transformers react via [onCreateNode](/docs/node-apis/#onCreateNode), resulting in children of different `node.internal.type`s being created. + +There are 3 categories of node fields that we can query. + +#### Fields on the created node object. E.g + +```graphql +node { + relativePath, + extension, + size, + accessTime +} +``` + +#### Child/Parent. E.g: + +```graphql +node { + childMarkdownRemark, + childrenPostsJson, + children, + parent +} +``` + +#### fields created by setFieldsOnGraphQLNodeType + +```graphql +node { + publicURL +} +``` + +Each of these categories of fields is created in a different way, explained below. + +## gqlType Creation + +The Gatsby term for the GraphQLObjectType for a unique node type, is `gqlType`. GraphQLObjectTypes are simply objects that define the type name and fields. The field definitions are created by the [createNodeFields](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L48) function in `build-node-types.js`. + +An important thing to note is that all gqlTypes are created before their fields are inferred. This allows fields to be of types that haven't yet been created due to their order of compilation. This is accomplished by the use of `fields` [being a function](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L167) (basically lazy functions). + +The first step in inferring GraphQL Fields is to generate an `exampleValue`. It is the result of merging all fields of all nodes of the type in question. This `exampleValue` will therefore contain all potential field names and values, which allows us to infer each field's types. The logic to create it is in [getExampleValues](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/data-tree-utils.js#L305). + +With the exampleValue in hand, we can use each of its key/values to infer the Type's fields (broken down by the 3 categories above). + +### Fields on the created node object + +Fields on the node that were created directly by the source and transform plugins. E.g for `File` type, these would be `relativePath`, `size`, `accessTime` etc. + +The creation of these fields is handled by the [inferObjectStructureFromNodes](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-type.js#L317) function in [infer-graphql-type.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-type.js). Given an object, a field could be in one of 3 sub-categories: + +1. It involves a mapping in [gatsby-config.js](/docs/gatsby-config/#mapping-node-types) +2. It's value is a foreign key reference to some other node (ends in `___NODE`) +3. It's a plain object or value (e.g String, number, etc) + +#### Mapping field + +Mappings are explained in the [gatsby-config.js docs](/docs/gatsby-config/#mapping-node-types). If the object field we're generating a GraphQL type for is configured in the gatsby-config mapping, then we handle it specially. + +Imagine our top level Type we're currently generating fields for is `MarkdownRemark.frontmatter`. And the field we are creating a GraphQL field for is called `author`. And, that we have a mapping setup of: + +```javascript +mapping: { + "MarkdownRemark.frontmatter.author": `AuthorYaml.name`, +}, +``` + +The field generaton in this case is handled by [inferFromMapping](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-type.js#L129). The first step is to find the type that is mapped to. In this case, `AuthorYaml`. This is known as the `linkedType`. That type will have a field to link by. In this case `name`. If one is not supplied, it defaults to `id`. This field is known as `linkedField` + +Now we can create a GraphQL Field declaration whose type is `AuthorYaml` (which we look up in list of other `gqlTypes`). The field resolver will get the value for the node (in this case, the author string), and then search through the react nodes until it finds one whose type is `AuthorYaml` and whose `name` field matches the author string. + +#### Foreign Key reference (`___NODE`) + +If not a mapping field, it might instead end in `___NODE`, signifying that its value is an ID that is a foreign key reference to another node in redux. Check out [Create a Source Plugin](/docs/create-source-plugin/#create-source-plugin) for how this works from a user point of view. Behind the scenes, the field inference is handled by [inferFromFieldName](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-type.js#L204). + +This is actually quite similar to the mapping case above. We remove the `___NODE` part of the field name. E.g `author___NODE` would become `author`. Then, we find our `linkedNode`. I.e given the example value for `author` (which would be an ID), we find its actual node in redux. Then, we find its type in processed types by its `internal.type`. Note, that also like in mapping fields, we can define the `linkedField` too. This can be specified via `nodeFieldname___NODE___linkedFieldName`. E.g for `author___NODE___name`, the linkedField would be `name` instead of `id`. + +Now we can return a new GraphQL Field object, whose type is the one found above. Its resolver searches through all redux nodes until it finds one with the matching ID. As usual, it also creates a [page dependency](/docs/page-node-dependencies/), from the query context's path to the node ID. + +If the foreign key value is an array of IDs, then instead of returning a Field declaration for a single field, we return a `GraphQLUnionType`, which is a union of all the distinct linked types in the array. + +#### Plain object or value field + +If the field was not handled as a mapping or foreign key reference, then it must be a normal every day field. E.g a scalar, string, or plain object. These cases are handled by [inferGraphQLType](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-type.js#L38). + +The core of this step creates a GraphQL Field object, where the type is inferred directly via the result of `typeof`. E.g `typeof(value) === 'boolean'` would result in type `GraphQLBoolean`. Since these are simple values, resolvers are not defined (graphql-js takes care of that for us). + +If however, the value is an object or array, we recurse, using [inferObjectStructureFromNodes](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-type.js#L317) to create the GraphQL fields. + +In addition, Gatsby creates custom GraphQL types for `File` ([types/type-file.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/types/type-file.js)) and `Date` ([types/type-date.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/types/type-file.js)). If the value of our field is a string that looks like a filename or a date (handled by [should-infer](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-type.js#L52) functions), then we return the appropariate custom type. + +### Child/Parent fields + +#### Child fields creation + +Let's continue with the `File` type example. There are many transformer plugins that implement `onCreateNode` for `File` nodes. These produce `File` children that are of their own type. E.g `markdownRemark`, `postsJson`. + +Gatsby stores these children in redux as IDs in the parent's `children` field. And then stores those child nodes as full redux nodes themselves (see [Node Creation](/docs/node-creation/#node-relationship-storage-model) for more). E.g for a File node with two children, it will be stored in the redux `nodes` namespace as: + +```javascript +{ + `id1`: { type: `File`, children: [`id2`, `id3`], ...other_fields }, + `id2`: { type: `markdownRemark`, ...other_fields }, + `id3`: { type: `postsJson`, ...other_fields } +} +``` + +An important note here is that we do not store a distinct collection of each type of child. Rather we store a single collection that they're all packed into. The benefit of this is that we can easily create a `File.children` field that returns all children, regardless of type. The downside is that the creation of fields such as `File.childMarkdownRemark` and `File.childrenPostsJson` is more complicated. This is what [createNodeFields](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L48) does. + +Another convenience Gatsby provides is the ability to query a node's `child` or `children`, depending on whether a parent node has 1 or more children of that type. + +#### child resolvers + +When defining our parent `File` gqlType, [createNodeFields](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L48) will iterate over the distinct types of its children, and create their fields. Let's say one of these child types is `markdownRemark`. Let's assume there is only one `markdownRemark` child per `File`. Therefore, its field name is `childMarkdownRemark`. Now, we must create its graphql Resolver. + +``` +resolve(node, args, context, info) +``` + +The resolve function will be called when we are running queries for our pages. A query might look like: + +```graphql +query { + file( relativePath { eq: "blog/my-blog.md" } ) { + childMarkdownRemark { html } + } +} +``` + +To resolve `file.childMarkdownRemark`, we take the node we're resolving, and filter over all of its `children` until we find one of type `markdownRemark`, which is returned. Remember that `children` is a collection of IDs. So as part of this, we lookup the node by ID in redux too. + +But before we return from the resolve function, remember that we might be running this query within the context of a page. If that's the case, then whenever the node changes, the page will need to be rerendered. To record that fact, we call call [createPageDependency](/docs/page-node-dependencies/) with the node ID and the page, which is a field in the `context` object in the resolve function signature. + +#### parent field + +When a node is created as a child of some node (parent), that fact is stored in the child's `parent` field. The value of which is the ID of the parent. The [GraphQL resolver](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L57) for this field looks up the parent by that ID in redux and returns it. It also creates a [page dependency](/docs/page-node-dependencies/), to record that the page being queried depends on the parent node. + +### Plugin fields + +These are fields created by plugins that implement the [setFieldsOnGraphQLNodeType](/docs/node-apis/#setFieldsOnGraphQLNodeType) API. These plugins return full GraphQL Field declarations, complete with type and resolve functions. + +### File types + +As described in [plain object or value field](#plain-object-or-value-field), if a string field value looks like a file path, then we infer `File` as the field's type. The creation of this type occurs in [type-file.js setFileNodeRootType()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/types/type-file.js#L18). It is called just after we have created the GqlType for `File` (only called once). + +It creates a new GraphQL Field Config whose type is the just created `File` GqlType, and whose resolver converts a string into a File object. Here's how it works: + +Say we have a `data/posts.json` file that has been sourced (of type `File`), and then the [gatsby-transformer-json](/packages/gatsby-transformer-json) transformer creates a child node (of type `PostsJson`) + +```javascript +// data/posts.json +[ + { + "id": "1685001452849004065", + "text": "Venice is 👌", + "image": "images/BdiU-TTFP4h.jpg", + } +] +``` + +Notice that the image value looks like a file. Therefore, we'd like to query it as if it were a file, and get its relativePath, accessTime, etc. + +```graphql +{ + postsJson( id: { eq: "1685001452849004065" } ) { + image { + relativePath, + accessTime + } + } +} +``` + +The [File type resolver](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/types/type-file.js#L135) takes care of this. It gets the value (`images/BdiU-TTFP4h.jpg`). It then looks up this node's root NodeID via [Node Tracking](/docs/node-tracking/) which returns the original `data/posts.json` file. It creates a new filename by concatenating the field value onto the parent node's directory. + +I.e `data` + `images/BdiU-TTFP4h.jpg` = `data/images/BdiU-TTFP4h.jpg`. + +And then finally it searches redux for the first `File` node whose path matches this one. This is our proper resolved node. We're done! + + + diff --git a/docs/docs/schema-input-gql.md b/docs/docs/schema-input-gql.md new file mode 100644 index 0000000000000..7b6780f4d5cab --- /dev/null +++ b/docs/docs/schema-input-gql.md @@ -0,0 +1,138 @@ +--- +title: Inferring Input Filters +--- + +## Input Filters vs gqlType + +In [gqlTypes](/docs/schema-gql-type), we inferred a Gatsby Node's main fields. These allow us to query a node's children, parent and object fields. But these are only useful once a top level GraphQL Query has returned results. In order to query by those fields, we must create GraphQL objects for input filters. E.g, querying for all markdownRemark nodes that have 4 paragraphs. + +```graphql +{ + markdownRemark(wordCount: { paragraphs: { eq: 4 } }) { + html + } +} +``` + +The arguments (`wordcount: {paragraphs: {eq: 4}}`) to the query are known as Input filters. In graphql-js, they are the [GraphQLInputObjectType](https://graphql.org/graphql-js/type/#graphqlinputobjecttype). This section covers how these Input filters are inferred. + +### Inferring input filters from example node values + +The first step is to generate an input field for each type of field on the redux nodes. For example, we might want to query markdown nodes by their front matter author: + +```graphql +{ + markdownRemark(frontmatter: { author: { eq: "F. Scott Fitzgerald" } }) { + id + } +} +``` + +This step is handled by [inferInputObjectStrctureFromNodes](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-input-fields.js#L235). First, we generate an example Value (see [gqlTypes](/docs/schema-gql-type#gqltype-creation)). For each field on the example value (e.g `author`), we create a [GraphQLInputObjectType](https://graphql.org/graphql-js/type/#graphqlinputobjecttype) with an appropriate name. The fields for Input Objects are predicates that depend on the value's `typeof` result. E.g for a String, we need to be able to query by `eq`, `regex` etc. If the value is an object itself, then we recurse, building its fields as above. + +If the key is a foreign key reference (ends in `___NODE`), then we find the field's linked Type first, and progress as above (for more on how foreign keys are implemented, see [gqlType](/docs/schema-gql-type#foreign-key-reference-___node)). After this step, we will end up with an Input Object type such as . + +```javascript +{ + `MarkdownRemarkFrontmatterAuthor`: { + name: `MarkdownRemarkFrontmatterAuthorInputObject`, + fields: { + `MarkdownRemarkFrontmatterAuthorName` : { + name: `MarkdownRemarkFrontmatterAuthorNameQueryString`, + fields: { + eq: { type: GraphQLString }, + ne: { type: GraphQLString }, + regex: { type: GraphQLString }, + glob: { type: GraphQLString }, + in: { type: new GraphQLList(GraphQLString) }, + } + } + } + } +} +``` + +### Inferring input filters from plugin fields + +Plugins themselves have the opportunity to create custom fields that apply to ALL nodes of a particular type, as opposed to having to expicitly add the field on every node creation. An example would be `markdownRemark` which adds a `wordcount` field to each node automatically. This section deals with the generation of input filters so that we can query by these fields as well. E.g: + +```graphql +{ + markdownRemark(wordCount: { paragraphs: { eq: 4 } }) { + html + } +} +``` + +Plugins add custom fields by implementing the [setFieldsOnGraphQLNodeType](/docs/node-apis/#setFieldsOnGraphQLNodeType) API. They must return a full GraphQLObjectType, complete with `resolve` function. Once this API has been run, the fields are passed to [inferInputObjectStructureFromFields](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-input-fields-from-fields.js#L195), which will generate input filters for thew new fields. The result would look something like: + +```javascript +{ //GraphQLInputObjectType + name: `WordCountwordcountInputObject`, + fields: { + `paragraphs`: { + type: { // GraphQLInputObjectType + name: `WordCountParagraphsQueryInt`, + fields: { + eq: { type: GraphQLInt }, + ne: { type: GraphQLInt }, + gt: { type: GraphQLInt }, + gte: { type: GraphQLInt }, + lt: { type: GraphQLInt }, + lte: { type: GraphQLInt }, + in: { type: new GraphQLList(GraphQLInt) }, + } + } + } + } +} +``` + +As usual, the input filter fields (`eq`, `lt`, `gt`, etc) are based on the type of the field (`Int` in this case), which is defined by the plugin. + +### Merged result + +Now that we've generated input fields from the redux nodes and from custom plugin fields, we merge them together. E.g + +```javascript +{ + + // from infer input fields from object + `MarkdownRemarkAuthor`: { + name: `MarkdownRemarkAuthorInputObject`, + fields: { + `MarkdownRemarkAuthorName` : { + name: `MarkdownRemarkAuthorNameQueryString`, + fields: { + eq: { type: GraphQLString }, + ne: { type: GraphQLString }, + regex: { type: GraphQLString }, + glob: { type: GraphQLString }, + in: { type: new GraphQLList(GraphQLString) }, + } + } + } + }, + + // From infer input fields from fields + `wordCount`: { //GraphQLInputObjectType + name: `WordCountwordcountInputObject`, + fields: { + `paragraphs`: { + type: { // GraphQLInputObjectType + name: `WordCountParagraphsQueryInt`, + fields: { + eq: { type: GraphQLInt }, + ne: { type: GraphQLInt }, + gt: { type: GraphQLInt }, + gte: { type: GraphQLInt }, + lt: { type: GraphQLInt }, + lte: { type: GraphQLInt }, + in: { type: new GraphQLList(GraphQLInt) }, + } + } + } + } + } +} +``` diff --git a/docs/docs/schema-sift.md b/docs/docs/schema-sift.md new file mode 100644 index 0000000000000..180512cb89322 --- /dev/null +++ b/docs/docs/schema-sift.md @@ -0,0 +1,121 @@ +--- +title: Querying with Sift +--- + +## Summary + +Gatsby stores all data loaded during the source-nodes phase in redux. And it allows you to write GraphQL queries to query that data. But Redux is a plain javascript object store. So how does Gatsby query over those nodes using the GraphQL Query language? + +The answer is that it uses the [sift.js](https://github.com/crcn/sift.js/tree/master) library. It is a port of the MongoDB query language that works over plain javascript objects. It turns out that mongo's query language is very compatible with GraphQL. + +Most of the logic below is in the the [run-sift.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/run-sift.js) file, which is called from the [ProcessedNodeType `resolve()`](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/build-node-types.js#L191) function. + +## ProcessedNodeType Resolve Function + +Remember, at the point this resolve function is created, we have been iterating over all the distinct `node.internal.type`s in the redux `nodes` namespace. So for instance we might be on the `MarkdownRemark` type. Therefore the `resolve()` function closes over this type name and has access to all the nodes of that type. + +The `resolve()` function calls `run-sift.js`, and provides it with the following arguments: + +- GraphQLArgs (as js object). Within a filter. E.g `wordcount: { paragraphs: { eq: 4 } }` +- All nodes in redux of this type. E.g where `internal.type == MmarkdownRemark'` +- Context `path`, if being called as part of a [page query](/docs/query-execution/#query-queue-execution) +- typeName. E.g `markdownRemark` +- gqlType. See [more on gqlType](/docs/schema-gql-type) + +For example: + +```javascript +runSift({ + args: { + filter: { // Exact args from GraphQL Query + wordcount: { + paragraphs: { + eq: 4 + } + } + } + }, + nodes: ${latestNodes}, + path: context.path, // E.g /blogs/my-blog + typeName: `markdownRemark`, + type: ${gqlType} +}) +``` + +## Run-sift.js + +This file converts GraphQL Arguments into sift queries and applies them to the collection of all nodes of this type. The rough steps are: + +1. Convert query args to sift args +1. Drop leaves from args +1. Resolve inner query fields on all nodes +1. Track newly realized fields +1. Run sift query on all nodes +1. Create Page dependency if required + +### 1. Convert query args to sift args + +Sift expects all field names to be prepended by a `$`. The [siftify-args](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/run-sift.js#L58) function takes care of this. It descends the args tree, performing the following transformations for each field key/value scenario. + +- field key is`elemMatch`? Change to `$elemMatch`. Recurse on value object +- field value is regex? Apply regex cleaning +- field value is glob, use [minimatch](https://www.npmjs.com/package/minimatch) library to convert to Regex +- normal value, prepend `$` to field name. + +So, the above query would become: + +```javascript +{ + `$wordcount`: { + `$paragraphs`: { + `$eq`: 4 + } + } +} +``` + +### 2. Drop leaves (e.g `{eq: 4}`) from args + +To assist in step 3, we create a version of the siftified args called `fieldsToSift` that has all leaves of the args tree replaced with boolean `true`. This is handled by the [extractFieldsToSift](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/run-sift.js#L84) function. `fieldsToSift` would look like this after the function is applied: + +```javascript +{ + `wordcount`: { + `paragraphs`: true + } +} +``` + +### 3. Resolve inner query fields on all nodes + +Step 4 will perform the actual sift query over all the nodes, returning the first one that matches the query. But we must remember that the nodes that are in redux only include data that was explicitly created by their source or transform plugins. If instead of creating a data field, a plugin used `setFieldsOnGraphQLNodeType` to define a custom field, then we have to manually call that field's resolver on each node. The args in step 2 is a great example. The `wordcount` field is defined by the [gatsby-transformer-remark](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-transformer-remark/src/extend-node-type.js#L416) plugin, rather than created during the creation of the remark node. + +The [nodesPromise](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/run-sift.js#L168) function iterates over all nodes of this type. Then, for each node, [resolveRecursive](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/run-sift.js#L112) descends the `siftToFields` tree, getting the field name, and then finding its gqlType, and then calling that type's `resolve` function manually. E.g, for the above example, we would find the gqlField for `wordcount` and call its resolve field: + +```javascript +markdownRemarkGqlType.resolve(node, {}, {}, { fieldName: `wordcount` }) +``` + +Note that the graphql-js library has NOT been invoked yet. We're instead calling the appropriate gqlType resolve function manually. + +The resolve method in this case would return a paragraph node, which also needs to be properly resolved. So We descend the `fieldsToSift` arg tree and perform the above operation on the paragraph node (using the found paragraph gqlType). + +After `resolveRecursive` has finished, we will have "realized" all the query fields in each node, giving us confidence that we can perform the query with all the data being there. + +### 4. Track newly realized fields + +Since new fields on the node may have been created in this process, we call `trackInlineObjectsInRootNode()` to track these new objects. See [Node Tracking](/docs/node-tracking/) docs for more. + +### 5. Run sift query on all nodes + +Now that we've realized all fields that need to be queried, on all nodes of this type, we are finally ready to apply the sift query. This step is handled by [tempPromise](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/run-sift.js#L214). It simply concatenates all the top level objects in the args tree together with a sift `$and` expression, and then iterates over all nodes returning the first one that satisfies the sift expression. + +In the case that `connection === true` (argument passed to run-sift), then instead of just choosing the first argument, we will select ALL nodes that match the sift query. If the GraphQL query specified `sort`, `skip`, or `limit` fields, then we use the [graphql-skip-limit](https://www.npmjs.com/package/graphql-skip-limit) library to filter down to the appropriate results. See [Schema Connections](/docs/schema-connections) for more info. + +### 6. Create Page dependency if required + +Assuming we find a node (or multiple if `connection` === true), we finish off by recording the page that initiated the query (in the `path` field) depends on the found node. More on this in [Page -> Node Dependencies](/docs/page-node-dependencies/). + +## Note about plugin resolver side effects + +As [mentioned above](#3-resolve-inner-query-fields-on-all-nodes), `run-sift` must "realize" all query fields before querying over them. This involves calling the resolvers of custom plugins on **each node of that type**. Therefore, if a resolver performs side effects, then these will be triggered, regardless of whether the field result actually matches the query. diff --git a/docs/docs/static-vs-normal-queries.md b/docs/docs/static-vs-normal-queries.md new file mode 100644 index 0000000000000..bfeedce8c6f7f --- /dev/null +++ b/docs/docs/static-vs-normal-queries.md @@ -0,0 +1,41 @@ +--- +title: Static vs Normal Queries +--- + +## TODO Difference between normal and Static Queries + +Static Queries don't need to get run for each page. Just once + +### staticQueryComponents + +Started here because they're referenced in page-query-runner:findIdsWithDataDependencies. + +The redux `staticQueryComponents` is a map fronm component jsonName to StaticQueryObject. E.g + +```javascript +{ + `blog-2018-07-17-announcing-gatsby-preview-995` : { + name: `/path/to/component/file`, + componentPath: `/path/to/component/file`, + id: `blog-2018-07-17-announcing-gatsby-preview-995`, + jsonName: `blog-2018-07-17-announcing-gatsby-preview-995`, + query: `raw GraphQL Query text including fragments`, + hash: `hash of graphql text` + } +} +``` + +The `staticQueryComponents` redux namespace is owned by the `static-query-components.js` reducer with reacts to `REPLACE_STATIC_QUERY` actinos. + +It is created in query-watcher. TODO: Check other usages + +TODO: in query-watcher.js/handleQuery, we remove jsonName from dataDependencies. How did it get there? Why is jsonName used here, but for other dependencies, it's a path? + +### Usages + +- [websocket-manager](TODO). TODO +- [query-watcher](TODO). + - `getQueriesSnapshot` returns map with snapshot of `state.staticQueryComponents` + - handleComponentsWithRemovedQueries. For each staticQueryComponent, if passed in queries doesn't include `staticQueryComponent.componentPath`. TODO: Where is StaticQueryComponent created? TODO: Where is queries passed into `handleComponentsWithRemovedQueries`? + + TODO: Finish above diff --git a/packages/gatsby-remark-graphviz/README.md b/packages/gatsby-remark-graphviz/README.md index 6800a796f7f47..81547c7976d76 100644 --- a/packages/gatsby-remark-graphviz/README.md +++ b/packages/gatsby-remark-graphviz/README.md @@ -44,6 +44,10 @@ Which will be rendered using viz.js and the output html will replace the code bl In your gatsby-config.js, make sure you place this plugin before other remark plugins that modify code blocks (like prism). +## Caveats + +In your gatsby-config.js, make sure you place this plugin before other remark plugins that modify code blocks (like prism). + ## Alternatives If you want a broader range of drawing options, checkout [gatsby-remark-draw](https://www.npmjs.com/package/gatsby-remark-draw). It provides SvgBobRus, Graphviz, and Mermaid, but note that you must have these already installed on your system diff --git a/www/gatsby-config.js b/www/gatsby-config.js index 6147fb0646a83..1687c882b49cb 100644 --- a/www/gatsby-config.js +++ b/www/gatsby-config.js @@ -55,6 +55,7 @@ module.exports = { resolve: `gatsby-transformer-remark`, options: { plugins: [ + `gatsby-remark-graphviz`, `gatsby-remark-code-titles`, { resolve: `gatsby-remark-images`, diff --git a/www/package.json b/www/package.json index 50bf8975080e6..08b58f453b496 100644 --- a/www/package.json +++ b/www/package.json @@ -33,6 +33,7 @@ "gatsby-remark-code-titles": "^1.0.2", "gatsby-remark-copy-linked-files": "next", "gatsby-remark-images": "next", + "gatsby-remark-graphviz": "next", "gatsby-remark-prismjs": "next", "gatsby-remark-responsive-iframe": "next", "gatsby-remark-smartypants": "next", diff --git a/www/src/data/sidebars/doc-links.yaml b/www/src/data/sidebars/doc-links.yaml index 7e921c300fa8d..60adfe5ebbb77 100644 --- a/www/src/data/sidebars/doc-links.yaml +++ b/www/src/data/sidebars/doc-links.yaml @@ -255,6 +255,47 @@ link: /docs/prpl-pattern/ - title: Querying data with GraphQL link: /docs/querying-with-graphql/ +- title: BEHIND THE SCENES + link: /docs/behind-the-scenes/ + items: + - title: How APIS/Plugins Are Run + link: /docs/how-plugins-apis-are-run/ + - title: Node Creation + link: /docs/node-creation/ + - title: Schema Generation + link: /docs/schema-generation/ + items: + - title: Building the GqlType + link: /docs/schema-gql-type/ + - title: Building the Input Filters + link: /docs/schema-input-gql/ + - title: Querying with Sift + link: /docs/schema-sift/ + - title: Connections + link: /docs/schema-connections/ + - title: Page Creation + link: /docs/page-creation/ + - title: Page -> Node Dependencies + link: /docs/page-node-dependencies/ + - title: Node Tracking + link: /docs/node-tracking/ + - title: Internal Data Bridge + link: /docs/internal-data-bridge/ + - title: Queries + link: /docs/query-behind-the-scenes/ + items: + - title: Query Extraction + link: /docs/query-extraction/ + - title: Query Execution + link: /docs/query-execution/ + - title: Normal vs StaticQueries + link: /docs/static-vs-normal-queries/ + - title: Data Storage (Redux)* + link: /docs/data-storage-redux/ + - title: Build Caching* + link: /docs/build-caching/ + - title: Terminology + link: /docs/behind-the-scenes-terminology/ - title: Advanced Tutorials items: - title: Making a site with user authentication*