Skip to content

Commit

Permalink
behind the scenes docs (#7510)
Browse files Browse the repository at this point in the history
* add gatsby-remark-graphviz package to gatsby/www
* Add BEHIND THE SCENES pages
  * How APIS/Plugins Are Run
  * Node Creation
  * Schema Generation
    * Building the GqlType
    * Building the Input Filters
    * Querying with Sift
    * Connections
  * Page Creation
  * Page -> Node Dependencies
  * Node Tracking
  * Internal Data Bridge
  * Queries
    * Query Extraction
    * Query Execution
    * Normal vs StaticQueries
  * Data Storage (Redux)*
  * Build Caching*
  * Terminology
  • Loading branch information
Moocar authored Sep 9, 2018
1 parent 95a898c commit ca7d0ed
Show file tree
Hide file tree
Showing 24 changed files with 1,611 additions and 0 deletions.
221 changes: 221 additions & 0 deletions docs/docs/behind-the-scenes-terminology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
---
title: Terminology
---

Throughout the Gatsby code, you'll see the below object fields and variables mentioned. Their definitions and reason for existence are defined below.

## Page

### Page Object

created by calls to [createPage](/docs/actions/#createPage) (see [Page Creation](/docs/page-creation)).

- [path](#path)
- [matchPath](#matchpath)
- [jsonName](#jsonname)
- [component](#component)
- [componentChunkName](#componentchunkname)
- [internalComponentName](#internalcomponentname) (unused)
- [context](#pagecontext)
- updatedAt

The above fields are explained below

### path

The publicly accessible path in the web URL to access the page in question. E.g

`/blog/2018-07-17-announcing-gatsby-preview/`.

It is created when the page object is created (see [Page Creation](/docs/page-creation/))

### Redux `pages` namespace

Contains a map of Page [path](#path) -> [Page object](#page-object).

### matchPath

Think of this instead as `client matchPath`. It is ignored when creating pages during the build. But on the frontend, when resolving the page from the path ([find-path.js]()), it is used (via [reach router](https://github.com/reach/router/blob/master/src/lib/utils.js)) to find the matching page. Note that the [pages are sorted](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/pages-writer.js#L38) so that those with matchPaths are at the end, so that explicit paths are matched first.

This is also used by [gatsby-plugin-create-client-paths](/packages/gatsby-plugin-create-client-paths/?=client). It duplicates pages whose path match some client-only prefix (e.g `/app/`). The duplicated page has a `matchPath` so that it is resolved first on the front end.

It is also used by [gatsby-plugin-netlify](http://localhost:8000/packages/gatsby-plugin-netlify/?=netlify) when creating `_redirects`.

### jsonName

The logical name for the query result of a page. Created during [createPage](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/actions.js#L229). Name is constructed using kebabHash of page path. E.g. For above pagePath, it is:

`blog-2018-07-17-announcing-gatsby-preview-995`

### component

The path on disk to the javascript file containing the React component. E.g

`/src/templates/template-blog-post.js`

Think of this as `componentPath` instead.

### Redux `components ` namespace

Mapping from `component` (path on disk) to its [Page object](#page-object). It is created every time a page is created (by listening to `CREATE_PAGE`).

```javascript
{
`/src/templates/template-blog-post.js`: {
query: ``,
path: `/blog/2018-07-17-announcing-gatsby-preview/`,
jsonName: `blog-2018-07-17-announcing-gatsby-preview-995`,
componentPath: `/src/templates/template-blog-post.js`,
...restOfPage
}
}
```

Query starts off as empty, but is set during the extractQueries phase by [query-watcher/handleQuery](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js#L68), once the query has compiled by relay (see [Query Extraction](/docs/query-extraction/)).

### componentChunkName

The [page.component](#component) (path on disk), but passed (as above), kebab hashed. E.g, the componentChunkName for component

`/src/templates/template-blog-post.js`

is

`component---src-templates-template-blog-post-js`

TODO: Mention how used by webpack

### internalComponentName

If the path is `/`, internalComponentName = `ComponentIndex`. Otherwise, for a path of `/blog/foo`, it would be `ComponentBlogFoo`.

Created as part of page, but currently unused.

### page.context

This is [merged with the page itself](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L153) and then is [passed to graphql](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js#L40) queries as the `context` parameter.

## Query

### dataPath

Path to the page's query result. Relative to `/public/static/d/{modInt}`. Name is kebab hash on `path--${jsonName}`-`result->sha1->base64`. E.g

`621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc`

Set after [Query Execution](/docs/query-execution/#save-query-results-to-redux-and-disk) has finished.

### Redux `jsonDataPaths` namespace (dataPaths)

Map of page [jsonName](#jsonname) to [dataPath](#datapath). Updated after [Query Execution](/docs/query-execution/#save-query-results-to-redux-and-disk). E.g

```
{
// jsonName -> dataPath
"blog-2018-07-17-announcing-gatsby-preview-995": "621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc"
}
```

This is also known via the `dataPaths` variable.

### Query result file

`/public/static/d/621/${dataPath}`

E.g

`/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json`

This is the actual result of the GraphQL query that was run for the page `/blog/2018-07-17-announcing-gatsby-preview/`. The contents would look something like:

```javascript
{
"data": {
"markdownRemark": {
"html": "<p>Today we....",
"timeToRead": 2,
"fields": {
"slug": "/blog/2018-07-17-announcing-gatsby-preview/"
},
"frontmatter": {
"title": "Announcing Gatsby Preview",
"date": "July 17th 2018",
...
},
...
}
},
"pageContext": {
"slug": "/blog/2018-07-17-announcing-gatsby-preview/",
"prev": {
...
},
"next": null
}
}
```

For a query such as:

```javascript
export const pageQuery = graphql`
query($slug: String!) {
markdownRemark(fields: { slug: { eq: $slug } }) {
html
timeToRead
fields {
slug
}
frontmatter {
title
date(formatString: "MMMM Do YYYY")
...
}
...
}
}
`
```

## Webpack stuff

### /public/${componentChunkName}-[chunkhash].js

The final webpack js bundle for the blog template page

E.g

`/public/component---src-templates-template-blog-post-js-2df3a086e8d2cdf690aa.js`

### /.cache/async-requires.js

Generated javascript file that exports `components` and `data` fields.

`components` is a mapping from `componentChunkName` to a function that imports the component's original source file path. This is used for code splitting. The import statement is a hint to webpack that that javascript file can be loaded later. The mapping And also provides a hint to the `componentChunkName`

`data` is a function that imports `/.cache/data.json`. Which is code split in the same way

E.g

```js
exports.components = {
"component---src-templates-template-blog-post-js": () =>
import("/Users/amarcar/dev/gatsbyjs/gatsby/www/src/templates/template-blog-post.js" /* webpackChunkName: "component---src-templates-template-blog-post-js" */),
}

exports.data = () =>
import("/Users/amarcar/dev/gatsbyjs/gatsby/www/.cache/data.json")
```

### .cache/data.json

During the `pagesWriter` bootstrap phase (last phase), `pages-writer.js` writes this file to disk. It contains `dataPaths` and `pages`.

`dataPaths` is the same as the definition above.

`pages` is a dump of the redux `pages` component state. Each page contains:

- componentChunkName
- jsonName
- path

13 changes: 13 additions & 0 deletions docs/docs/behind-the-scenes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Behind the Scenes
---

Curious how Gatsby works under the hood? The pages in this section describe how a Gatsby build works from an internal code/architecture point of view. It should be useful for anyone who needs to work on the internals of Gatsby, or for those who are simply curious how it all works, or perhaps you're a plugin author and need to understand how core works to track down a bug? Come one, come all!

If you're looking for information on how to _use_ Gatsby to write your own site, or create a plugin, check out the rest of the Gatsby docs. This section is quite low level.

These docs aren't supposed to be definitive, or tell you everything there is to know. But as you're exploring the Gatsby codebase, you might find yourself wondering what a concept means, or which part of the codebase implements a particular idea. These docs aim to answer those kinds of questions.

A few more things. These docs are mostly focused on `gatsby build`. Operations specific to `gatsby develop` are mostly ignored. Though this may change in the future. Also, they mostly focus on the happy path, rather than getting bogged down in details of error handling.

Ready? Dive in by exploring how [APIs/Plugins](/docs/how-plugins-apis-are-run/) work.
8 changes: 8 additions & 0 deletions docs/docs/build-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Build Caching
---

This is a stub. Help our community expand it.

Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your
pull request gets accepted.
8 changes: 8 additions & 0 deletions docs/docs/data-storage-redux.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Data Storage (Redux)
---

This is a stub. Help our community expand it.

Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your
pull request gets accepted.
110 changes: 110 additions & 0 deletions docs/docs/how-plugins-apis-are-run.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: How APIs/plugins are run
---

For most sites, plugins take up the majority of the build time. So what's really happening when APIs are called?

_Note: this section only explains how `gatsby-node` plugins are run. Not browser or ssr plugins_

## Early in the build

Early in the bootstrap phase, we [load all the configured plugins](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/bootstrap/load-plugins/index.js#L40) (and internal plugins) for the site. These are saved into redux under the `flattenedPlugins` namespace. Each plugin in redux contains the following fields:

- **resolve**: absolute path to the plugin's directory
- **id**: String concatenation of 'Plugin ' and the name of the plugin. E.g `Plugin query-runner`
- **name**: The name of the plugin. E.g `query-runner`
- **version**: The version as per the package.json. Or if it is a site plugin, one is generated from the file's hash
- **pluginOptions**: Plugin options as specified in [gatsby-config.js](/docs/gatsby-config/)
- **nodeAPIs**: A list of node APIs that this plugin implements. E.g `[ 'sourceNodes', ...]`
- **browserAPIs**: List of browser APIs that this plugin implements
- **ssrAPIs**: List of SSR APIs that this plugin implements

In addition, we also create a lookup from api to the plugins that implement it and save this to redux as `api-to-plugins`. This is implemented in [load-plugins/validate.js](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/bootstrap/load-plugins/validate.js#L106)

## apiRunInstance

Some API calls can take a while to finish. So every time an API is run, we create an object called [apiRunInstance](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L179) to track it. It contains the following notable fields:

- **id**: Unique identifier generated based on type of API
- **api**: The API we're running. E.g `onCreateNode`
- **args**: Any arguments passed to `api-runner-node`. E.g a node object
- **pluginSource**: optional name of the plugin that initiated the original call
- **resolve**: promise resolve callback to be called when the API has finished running
- **startTime**: time that the API run was started
- **span**: opentracing span for tracing builds
- **traceId**: optional args.traceId provided if API will result in further API calls ([see below](#using-traceid-to-await-downstream-api-calls))

We immediately place this object into an `apisRunningById` Map, where we track its execution.

## Running each plugin

Next, we filter all `flattenedPlugins` down to those that implement the API we're trying to run. For each plugin, we require its `gatsby-node.js` and call its exported API function. E.g if API was `sourceNodes`, it would result in a call to `gatsbyNode['sourceNodes'](...apiCallargs)`.

## Injected arguments

API implementations are passed a variety of useful [actions](/docs/actions/) and other interesting functions/objects. These arguments are [created](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L94) each time a plugin is run for an API, which allows us to rebind actions with default information.

All actions take 3 arguments:

1. The core information required by the action. E.g for [createNode](/docs/actions/#createNode), we must pass a node
2. The plugin that is calling this action. E.g `createNode` uses this to assign the owner of the new node
3. An object with misc action options:
- **traceId**: [See below](#using-traceid-to-await-downstream-api-calls)
- **parentSpan**: opentracing span (see [tracing docs](/docs/performance-tracing/))

Passing the plugin and action options on every single action call would be extremely painful for plugin/site authors. Since we know the plugin, traceId and parentSpan when we're running our API, we can rebind injected actions so these arguments are already provided. This is done in the [doubleBind](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L14) step.

## Waiting for all plugins to run

Each plugin is run inside a [map-series](https://www.npmjs.com/package/map-series) promise, which allows them to be executed concurrently. Once all plugins have finished running, we remove them from [apisRunningById](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L246) and fire a `API_RUNNING_QUEUE_EMPTY` event. This in turn, results in any dirty pages being recreated, as well as their queries. Finally, the results are returned.

## Using traceID to await downstream API calls

The majority of API calls result in one or more implementing plugins being called. We then wait for them all to complete, and return. But some plugins (e.g [sourceNodes](/docs/node-apis/#sourceNodes)) result in calls to actions that themselves call APIs. We need some way of tracing whether an API call originated from another API call, so that we can wait on all child calls to complete. The mechanism for this is the `traceId`.

```dot
digraph {
node [ shape="box" ];
"initialCall" [ label="apiRunner(`sourceNodes`, {\l traceId: `initial-sourceNodes`,\l waitForCascadingActions: true,\l parentSpan: parentSpan\l})\l " ];
"apiRunner1" [ label="api-runner-node.js" ];
"sourceNodes" [ label="plugin.SourceNodes()" ];
"createNode" [ label="createNode(node)" ];
"apisRunning" [ label="apisRunningByTraceId[traceId]" ];
"createNodeReducer" [ label="CREATE_NODE reducer" ];
"CREATE_NODE" [ label="CREATE_NODE event" ];
"pluginRunner" [ label="plugin-runner.js" ];
"onCreateNode" [ label="plugin.onCreateNode()" ];
"apiRunnerOnCreateNode" [ label="apiRunner(`onCreateNode`, {\l node,\l traceId: action.traceId\l})\l "; ];
"apiRunner2" [ label="api-runner-node.js" ];
"initialCall" -> "apiRunner1";
"apiRunner1" -> "apisRunning" [ label="set to 1" ];
"apiRunner1" -> "sourceNodes" [ label="call" ];
"sourceNodes" -> "createNode" [ label="call (traceID passed via doubleBind)" ];
"createNode" -> "createNodeReducer" [ label="triggers (action has traceId)" ];
"createNodeReducer" -> "CREATE_NODE" [ label="emits (event has traceId)" ];
"CREATE_NODE" -> "pluginRunner" [ label="handled by (event has traceId)" ];
"pluginRunner" -> "apiRunnerOnCreateNode";
"apiRunnerOnCreateNode" -> "apiRunner2";
"apiRunner2" -> "onCreateNode" [ label="call" ];
"apiRunner2" -> "apisRunning" [ label="increment" ];
}
```

1. The traceID is passed as an argument to the original API runner. E.g

```javascript
apiRunner(`sourceNodes`, {
traceId: `initial-sourceNodes`,
waitForCascadingActions: true,
parentSpan: parentSpan,
})
```

1. We keep track of the number of API calls with this traceId in the [apisRunningByTraceId](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L139) Map. On this first invocation, it will be set to `1`.
1. Using the action rebinding mentioned [above](#injected-arguments), the traceId is passed through to all action calls via the `actionOptions` object.
1. After reducing the Action, a global event is [emitted](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/redux/index.js#L93) which includes the action information
1. For the `CREATE_NODE` and `CREATE_PAGE` events, we need to call the `onCreateNode` and `onCreatePage` APIs respectively. The [plugin-runner](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/redux/plugin-runner.js) takes care of this. It also passes on the traceId from the Action back into the API call.
1. We're back in `api-runner-node.js` and can tie this new API call back to its original. So we increment the value of [apisRunningByTraceId](https://github.com/gatsbyjs/gatsby/blob/8029c6647ab38792bb0a7c135ab4b98ae70a2627/packages/gatsby/src/utils/api-runner-node.js#L218) for this traceId.
1. Now, whenever an API finishes running (when all its implementing plugins have finished), we decrement `apisRunningByTraceId[traceId]`. If the original API call included the `waitForCascadingActions` option, then we wait until `apisRunningByTraceId[traceId]` == 0 before resolving.
Loading

0 comments on commit ca7d0ed

Please sign in to comment.