Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

behind the scenes docs #7510

Merged
merged 49 commits into from
Sep 9, 2018
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
a0393d3
add gatsby-remark-graphviz package
Moocar Aug 15, 2018
30dc68b
replace README SVG with image to guarantee it will display
Moocar Aug 15, 2018
d9b9caa
Replace README rendered image with rendered svg
Moocar Aug 15, 2018
13b8d03
formatting/linting
Moocar Aug 15, 2018
0c2f3c4
use engine render option instead of lang
Moocar Aug 15, 2018
d0fbe30
fixed bug where top level promise callback was never resolved
Moocar Aug 16, 2018
f9c79ec
Merge branch 'master' into gatsby-remark-graphviz
Moocar Aug 16, 2018
998ff5c
fixed bug where top level promise callback was never resolved
Moocar Aug 16, 2018
6754bbe
add detail about prismjs to remark-graphviz
Moocar Aug 16, 2018
37e77d9
add schema behind the scenes
Moocar Aug 16, 2018
dfad957
add error handling to viz.js execution
Moocar Aug 16, 2018
7da1c04
Merge branch 'gatsby-remark-graphviz' into internal-docs
Moocar Aug 16, 2018
0b714c2
add first graph
Moocar Aug 16, 2018
884a132
moar behind the scenes docs
Moocar Aug 16, 2018
1b7f329
add more behind the scenes sections
Moocar Aug 16, 2018
87567fe
separated plugins/apis docs
Moocar Aug 17, 2018
c7fea22
APIs running diagram
Moocar Aug 17, 2018
5e6d9d1
disable unfinished docs
Moocar Aug 17, 2018
879fe11
more internal docs
Moocar Aug 21, 2018
18455da
Merge branch 'master' into internal-docs
Moocar Aug 21, 2018
27c212e
more docs updates
Moocar Aug 21, 2018
916d64b
add graphviz dependency
Moocar Aug 21, 2018
dd58c75
fixed overview: true
Moocar Aug 21, 2018
b84db49
formatting
Moocar Aug 21, 2018
4572b98
All caps heading
Moocar Aug 21, 2018
5053654
in depth schema docs
Moocar Aug 28, 2018
ee5c233
doc updates
Moocar Aug 28, 2018
5a90ca4
doc update
Moocar Aug 28, 2018
69d128f
Merge branch 'master' into internal-docs
Moocar Aug 28, 2018
79ae696
schema docs update
Moocar Aug 28, 2018
1c4fb58
formatting
Moocar Aug 28, 2018
436166d
finished schema connections doc
Moocar Aug 30, 2018
0e0cf51
more linking
Moocar Aug 30, 2018
8b13cd9
typo
Moocar Aug 30, 2018
541964d
Merge branch 'master' into internal-docs
Moocar Aug 30, 2018
ba666a2
Docs for query behind the scenes
Moocar Aug 31, 2018
f09800a
revert accidental sites changes
Moocar Aug 31, 2018
f2497f9
add graph for how query execution works
Moocar Sep 3, 2018
52e9c4e
Merge branch 'master' into internal-docs
Moocar Sep 4, 2018
27d9b65
added query extraction graphs
Moocar Sep 4, 2018
20f6d5b
Added File Type stuff
Moocar Sep 4, 2018
98b862d
Merge branch 'master' into internal-docs
m-allanson Sep 4, 2018
2c7d512
typos
Moocar Sep 5, 2018
41262cb
Merge branch 'master' into internal-docs
Moocar Sep 5, 2018
2aac2df
Merge branch 'internal-docs' of github.com:Moocar/gatsby into interna…
Moocar Sep 5, 2018
4be9afd
More behind the scenes docs
Moocar Sep 6, 2018
58fd41e
Merge branch 'master' into internal-docs
Moocar Sep 6, 2018
a0aeb71
add internal data bridge docs
Moocar Sep 6, 2018
f5e53ba
filled in page creation docs
Moocar Sep 6, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/docs/behind-the-scenes-query-execution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: Query Execution
---

### Query Execution

Query Execution is kicked off by bootstrap by calling [page-query-runner.js runInitialQuerys()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L29). The main files involved in this step are:

- [page-query-runner.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js)
- [query-queue.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js)
- [query-runner.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js)

#### Figuring out which queries need to be executed

The first thing this query does is figure out what queries even need to be run. You would think this would simply be a matter of running the Queries that were enqueued in [Extract Queries](/docs/behind-the-scenes-query-extraction/), but matters are complicated by support for `gatsby develop`. Below is the logic for figuring out which queries need to be executed (code is in [runQueries()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L36)).

##### Already queued queries

All queries queued after being extracted.

##### Queries without node dependencies

All queries whose component path isn't listed in `componentDataDependencies`. As a recap, in [Schema Generation](/docs/schema-generation-behind-the-scenes/), we showed that all Type resolvers record a dependency between the page whose query we're running and any nodes that were successfully resolved. So, If a component is declared in the `components` redux namespace, but is *not* contained in `componentDataDependencies`, then by definition, the query has not been run. Therefore we need to run it. Checkout [Node/Page Dependencies](http://localhost:8000/docs/behind-the-scenes-dependencies/) for more info. The code for this step is in [findIdsWithoutDataDependencies](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L89).

##### Pages that depend on dirty nodes

In `gatsby develop` mode, every time a node is created, or is updated (e.g via editing a markdown file), we add that node to the [enqueuedDirtyActions](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L61) collection. When we execute our queries, we can lookup all nodes in this collection and map them to pages that depend on them (as described above). These pages' queries must also be executed. In addition, this step also handles dirty `connections` (see [Schema Connections](/docs/schema-connections/)). Connections depend on a node's type. So if a node is dirty, we mark all connection nodes of that type dirty as well. The code for this step is in [findDirtyIds](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L171). _Note: dirty ids is really talking about dirty paths_.

#### Queue Queries for Execution

We now have the list of all pages that need to be executed (linked to their Query information). Let's queue them for execution (for realz this time). A call to [runQueriesForPathnames](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L21) kicks off this step. For each page or static query, we create a Query Job that looks something like:

```javascript
{
id: // page path, or static query hash
hash: // only for static queries
jsonName: // jsonName of static query or page
query: // raw query text
componentPath: // path to file where query is declared
isPage: // true if not static query
context: {
path: // if staticQuery, is jsonName of component
...page // page object. Not for static queries
...page.context // not for static queries
}
}
```

This Query Job contains everything we need to execute the query (and do things like recording dependencies between pages and nodes). So, we push it onto the queue in [query-queue.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) and then wait for the queue to empty. Let's see how `query-queue` works.

#### Query Queue Execution

[query-queue.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) creates a [better-queue](https://www.npmjs.com/package/better-queue) queue that offers advanced features parallel execution, which is handy since querys do not depend on each other so we can take advantage of this. Every time an item is consumed from the queue, we call [query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js) where we finally actually execute the query!

Query execution involves calling the [graphql-js](https://graphql.org/graphql-js/) library with 3 pieces of information:

1. The Gatsby schema that was inferred during [Schema Generation](/docs/schema-generation-behind-the-scenes/).
1. The raw query text. Obtained from the Query Job
1. The Context, also from the Query Job. Has the page's `path` amongst other things.

Graphql-js will parse the query, and execute the top level query. E.g `allMarkdownRemark( limit: 10 )` or `file( relativePath: { eq: "blog/my-blog.md" } )`. These will invoke the resolvers defined in [Schema Connections](/docs/schema-connections/) or [GQL Type](/docs/schema-gql-type/), which both use sift to query over all nodes of the type in redux. The result will be passed through the inner part of the graphql query where each type's resolver will be invoked. The vast majority of these will be `identity` functions that just return the field value. Some however could call a [custom plugin field](/docs/schema-gql-type/#plugin-fields) resolver. These in turn might perform side effects such as generating images. This is why the query execution phase of bootstrap often takes the longest.

Finally, a result is returned.

#### Save Query results to redux and disk

As queries are consumed from the queue and executed, their results are saved to redux and disk for consumption later on. This involves converting the result to pure JSON, and then saving it to its [dataPath](/docs/behind-the-scenes-terminology/#datapath). Which is relative to `public/static/d`. The data path includes the jsonName and hash. E.g: for the page `/blog/2018-07-17-announcing-gatsby-preview/`, the queries results would be saved to disk as something like:

```
/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json
```

For static queries, instead of using the page's jsonName, we just use a hash of the query.

Now we need to store the association of the page -> the query result in redux so we can recall it later. This is accomplished via the [json-data-paths](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/reducers/json-data-paths.js) reducer which we invoke by creating a `SET_JSON_DATA_PATH` action with the page's jsonName and the saved dataPath.

57 changes: 57 additions & 0 deletions docs/docs/behind-the-scenes-query-extraction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
title: Query Extraction
---

### Extracting Queries from Files

Up until now, we have [sourced all nodes](/docs/node-creation-behind-the-scenes/) into redux, [inferred a schema](/docs/schema-generation-behind-the-scenes/) from them, and [created all pages](/docs/page-creation/). The next step is to extract and compile all graphql queries from our source files. The entrypoint to this phase is [query-watcher extractQueries()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js), which immediately compiles all graphql queries by calling into [query-compiler.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-compiler.js).

#### Query Compilation

The first thing it does is use [babylon-traverse](https://babeljs.io/docs/en/next/babel-traverse.html) to load all javascript files in the site that have graphql queries in them. This produces AST results that are passed to the [relay-compiler](https://facebook.github.io/relay/docs/en/compiler-architecture.html). This accomplishes a couple of things:

1. It informs us of any malformed queries, which are promptly reported back to the user.
1. It builds a tree of queries and fragments they depend on. And outputs a single optimized query string with the fragments.

After this step, we will have a map of file paths (of site files with queries in them) to Query Objects, which contain the raw optimized query text, as well as other metadata such as the component path and page `jsonName`. The following diagram shows the flow involved during query compilation

```dot
digraph {
fragments [ label = "fragments. e.g\l.cache/fragments/fragment1.js", shape = cylinder ];
srcFiles [ label = "source files. e.g\lsrc/pages/my-page.js", shape = cylinder ];
components [ label = "redux.state.components\l(via createPage)", shape = cylinder ];
fileQueries [ label = "files with queries", shape = box ];
babylon [ label = "parse files with babylon\lfilter those with queries" ];
queryAst [ label = "QueryASTs", shape = box ];
schema [ label = "Gatsby schema", shape = cylinder ];
relayCompiler [ label = "Relay Compiler" ];
queries [ label = "{ Queries | { filePath | <query> query } }", shape = record ];
query [ label = "{\l name: filePath,\l text: rawQueryText,\l originalText: original text from file,\l path: filePath,\l isStaticQuery: if it is,\l hash: hash of query\l}\l ", shape = box ];


fragments -> fileQueries;
srcFiles -> fileQueries;
components -> fileQueries;
fileQueries -> babylon;
babylon -> queryAst;
queryAst -> relayCompiler;
schema -> relayCompiler;
relayCompiler -> queries;
queries:query -> query;
}
```

#### Store Queries in Redux

We're now in the [handleQuery](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js#L68) function.

If the query is a `StaticQuery`, we call the `replaceStaticQuery` action to save it to to the `staticQueryComponents` namespace which is a mapping from a component's path to an object that contains the raw GraphQL Query amonst other things. More details in [Static Queries](/docs/behind-the-scenes-static-vs-normal-queries/). We also remove component's `jsonName` from the `components` redux namespace. See [Component/Page dependencies](/docs/behind-the-scenes-dependencies/).

If the query is just a normal every day query (not StaticQuery), then we update its component's `query` in the redux `components` namespace via the `replaceComponentQuery` action.

#### Queue for execution

Now that we've saved our query, we're ready to queue it for execution. Query execution is mainly handled by [page-query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js), so we accomplish this by passing the component's path to `queueQueryForPathname` function.

Now let's learn about [Query Execution](/docs/behind-the-scenes-query-execution/).

41 changes: 41 additions & 0 deletions docs/docs/behind-the-scenes-static-vs-normal-queries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: Static vs Normal Queries
---

## TODO Difference between normal and Static Queries

Static Queries don't need to get run for each page. Just once

### staticQueryComponents

Started here because they're referenced in page-query-runner:findIdsWithDataDependencies.

The redux `staticQueryComponents` is a map fronm component jsonName to StaticQueryObject. E.g

```javascript
{
`blog-2018-07-17-announcing-gatsby-preview-995` : {
name: `/path/to/component/file`,
componentPath: `/path/to/component/file`,
id: `blog-2018-07-17-announcing-gatsby-preview-995`,
jsonName: `blog-2018-07-17-announcing-gatsby-preview-995`,
query: `raw GraphQL Query text including fragments`,
hash: `hash of graphql text`
}
}
```

The `staticQueryComponents` redux namespace is owned by the `static-query-components.js` reducer with reacts to `REPLACE_STATIC_QUERY` actinos.

It is created in query-watcher. TODO: Check other usages

TODO: in query-watcher.js/handleQuery, we remove jsonName from dataDependencies. How did it get there? Why is jsonName used here, but for other dependencies, it's a path?

### Usages

- [websocket-manager](TODO). TODO
- [query-watcher](TODO).
- `getQueriesSnapshot` returns map with snapshot of `state.staticQueryComponents`
- handleComponentsWithRemovedQueries. For each staticQueryComponent, if passed in queries doesn't include `staticQueryComponent.componentPath`. TODO: Where is StaticQueryComponent created? TODO: Where is queries passed into `handleComponentsWithRemovedQueries`?

TODO: Finish above
83 changes: 83 additions & 0 deletions docs/docs/behind-the-scenes-terminology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
## Terminology

Read up on [page creation](/docs/page-creation/) first.

### dataPath

Path to the page's query result. Relative to `/public/static/d/{modInt}`. Name is kebab hash on `path--${jsonName}`-`result->sha1->base64`. E.g

`621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc`

### Redux `jsonDataPaths` namespace (dataPaths)

Map of page `jsonName` to `dataPath`. Updated whenever a new query is run (in [query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js)). e.g

```
{
// jsonName -> dataPath
"blog-2018-07-17-announcing-gatsby-preview-995": "621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc"
}
```

This is also known via the `dataPaths` variable.

### Query result file

`/public/static/d/621/${dataPath}`

E.g

`/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json`

This is the actual result of the GraphQL query that was run for the page `/blog/2018-07-17-announcing-gatsby-preview/`. The contents would look something like:

```javascript
{
"data": {
"markdownRemark": {
"html": "<p>Today we....",
"timeToRead": 2,
"fields": {
"slug": "/blog/2018-07-17-announcing-gatsby-preview/"
},
"frontmatter": {
"title": "Announcing Gatsby Preview",
"date": "July 17th 2018",
...
},
...
}
},
"pageContext": {
"slug": "/blog/2018-07-17-announcing-gatsby-preview/",
"prev": {
...
},
"next": null
}
}
```

For a query such as:

```javascript
export const pageQuery = graphql`
query($slug: String!) {
markdownRemark(fields: { slug: { eq: $slug } }) {
html
timeToRead
fields {
slug
}
frontmatter {
title
date(formatString: "MMMM Do YYYY")
...
}
...
}
}
`
```

TODO: Consider Creating a standalone terminology page
13 changes: 13 additions & 0 deletions docs/docs/behind-the-scenes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Behind the Scenes
---

Curious how Gatsby works under the hood? This pages in this section describe how a Gatsby build works from an internal code/architecture point of view. It should be useful for anyone who needs to work on the internals of Gatsby, or for those who are simply curious how it all works, or perhaps you're a plugin author and need to understand how core works to track down a bug? Come one, come all!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

- This pages in this section
+ The pages in this section


If you're looking for information on how to _use_ Gatsby to write your own site, or create a plugin, check out the rest of the Gatsby docs. This section is quite low level.

These docs aren't supposed to be definitive, or tell you everything there is to know. But as you're exploring the Gatsby codebase, you might find yourself wondering what a concept means, or which part of the codebase implements a particular idea. These docs aim to answer those kinds of questions.

A few more things. These docs are mostly focused on `gatsby build`. Operations specific to `gatsby develop` are mostly ignored. Though this may change in the future. Also, they mostly focus on the happy path, rather than getting bogged down in details of error handling.

Ready? Dive in by exploring how [APIs/Plugins](/docs/how-plugins-apis-are-run/) work.
8 changes: 8 additions & 0 deletions docs/docs/build-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Build Caching
---

This is a stub. Help our community expand it.

Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your
pull request gets accepted.
8 changes: 8 additions & 0 deletions docs/docs/data-storage-redux.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Data Storage (Redux)
---

This is a stub. Help our community expand it.

Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your
pull request gets accepted.
Loading