gatsbyjs · Moocar · Sep 9, 2018 · Aug 15, 2018 · Aug 15, 2018 · Aug 15, 2018
diff --git a/docs/docs/behind-the-scenes-query-execution.md b/docs/docs/behind-the-scenes-query-execution.md
@@ -0,0 +1,76 @@
+---
+title: Query Execution
+---
+
+### Query Execution
+
+Query Execution is kicked off by bootstrap by calling [page-query-runner.js runInitialQuerys()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L29). The main files involved in this step are:
+
+- [page-query-runner.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js)
+- [query-queue.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js)
+- [query-runner.js](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js)
+
+#### Figuring out which queries need to be executed
+
+The first thing this query does is figure out what queries even need to be run. You would think this would simply be a matter of running the Queries that were enqueued in [Extract Queries](/docs/behind-the-scenes-query-extraction/), but matters are complicated by support for `gatsby develop`. Below is the logic for figuring out which queries need to be executed (code is in [runQueries()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L36)).
+
+##### Already queued queries
+
+All queries queued after being extracted.
+
+##### Queries without node dependencies
+
+All queries whose component path isn't listed in `componentDataDependencies`. As a recap, in [Schema Generation](/docs/schema-generation-behind-the-scenes/), we showed that all Type resolvers record a dependency between the page whose query we're running and any nodes that were successfully resolved. So, If a component is declared in the `components` redux namespace, but is *not* contained in `componentDataDependencies`, then by definition, the query has not been run. Therefore we need to run it. Checkout [Node/Page Dependencies](http://localhost:8000/docs/behind-the-scenes-dependencies/) for more info. The code for this step is in [findIdsWithoutDataDependencies](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L89).
+
+##### Pages that depend on dirty nodes
+
+In `gatsby develop` mode, every time a node is created, or is updated (e.g via editing a markdown file), we add that node to the [enqueuedDirtyActions](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L61) collection. When we execute our queries, we can lookup all nodes in this collection and map them to pages that depend on them (as described above). These pages' queries must also be executed. In addition, this step also handles dirty `connections` (see [Schema Connections](/docs/schema-connections/)). Connections depend on a node's type. So if a node is dirty, we mark all connection nodes of that type dirty as well. The code for this step is in [findDirtyIds](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L171). _Note: dirty ids is really talking about dirty paths_.
+
+#### Queue Queries for Execution
+
+We now have the list of all pages that need to be executed (linked to their Query information). Let's queue them for execution (for realz this time). A call to [runQueriesForPathnames](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js#L21) kicks off this step. For each page or static query, we create a Query Job that looks something like:
+
+```javascript
+{
+  id: // page path, or static query hash
+  hash: // only for static queries
+  jsonName: // jsonName of static query or page
+  query: // raw query text
+  componentPath: // path to file where query is declared
+  isPage: // true if not static query
+  context: {
+    path: // if staticQuery, is jsonName of component
+    ...page // page object. Not for static queries
+    ...page.context // not for static queries
+  }
+}
+```
+
+This Query Job contains everything we need to execute the query (and do things like recording dependencies between pages and nodes). So, we push it onto the queue in [query-queue.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) and then wait for the queue to empty. Let's see how `query-queue` works.
+
+#### Query Queue Execution
+
+[query-queue.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-queue.js) creates a [better-queue](https://www.npmjs.com/package/better-queue) queue that offers advanced features parallel execution, which is handy since querys do not depend on each other so we can take advantage of this. Every time an item is consumed from the queue, we call [query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js) where we finally actually execute the query!
+
+Query execution involves calling the [graphql-js](https://graphql.org/graphql-js/) library with 3 pieces of information:
+
+1. The Gatsby schema that was inferred during [Schema Generation](/docs/schema-generation-behind-the-scenes/).
+1. The raw query text. Obtained from the Query Job
+1. The Context, also from the Query Job. Has the page's `path` amongst other things.
+
+Graphql-js will parse the query, and execute the top level query. E.g `allMarkdownRemark( limit: 10 )` or `file( relativePath: { eq: "blog/my-blog.md" } )`. These will invoke the resolvers defined in [Schema Connections](/docs/schema-connections/) or [GQL Type](/docs/schema-gql-type/), which both use sift to query over all nodes of the type in redux. The result will be passed through the inner part of the graphql query where each type's resolver will be invoked. The vast majority of these will be `identity` functions that just return the field value. Some however could call a [custom plugin field](/docs/schema-gql-type/#plugin-fields) resolver. These in turn might perform side effects such as generating images. This is why the query execution phase of bootstrap often takes the longest.
+
+Finally, a result is returned.
+
+#### Save Query results to redux and disk
+
+As queries are consumed from the queue and executed, their results are saved to redux and disk for consumption later on. This involves converting the result to pure JSON, and then saving it to its [dataPath](/docs/behind-the-scenes-terminology/#datapath). Which is relative to `public/static/d`. The data path includes the jsonName and hash. E.g: for the page `/blog/2018-07-17-announcing-gatsby-preview/`, the queries results would be saved to disk as something like:
+
+```
+/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json
+```
+
+For static queries, instead of using the page's jsonName, we just use a hash of the query.
+
+Now we need to store the association of the page -> the query result in redux so we can recall it later. This is accomplished via the [json-data-paths](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/redux/reducers/json-data-paths.js) reducer which we invoke by creating a `SET_JSON_DATA_PATH` action with the page's jsonName and the saved dataPath.
+
diff --git a/docs/docs/behind-the-scenes-query-extraction.md b/docs/docs/behind-the-scenes-query-extraction.md
@@ -0,0 +1,57 @@
+---
+title: Query Extraction
+---
+
+### Extracting Queries from Files
+
+Up until now, we have [sourced all nodes](/docs/node-creation-behind-the-scenes/) into redux, [inferred a schema](/docs/schema-generation-behind-the-scenes/) from them, and [created all pages](/docs/page-creation/). The next step is to extract and compile all graphql queries from our source files. The entrypoint to this phase is [query-watcher extractQueries()](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js), which immediately compiles all graphql queries by calling into [query-compiler.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-compiler.js). 
+
+#### Query Compilation
+
+The first thing it does is use [babylon-traverse](https://babeljs.io/docs/en/next/babel-traverse.html) to load all javascript files in the site that have graphql queries in them. This produces AST results that are passed to the [relay-compiler](https://facebook.github.io/relay/docs/en/compiler-architecture.html). This accomplishes a couple of things:
+
+1. It informs us of any malformed queries, which are promptly reported back to the user.
+1. It builds a tree of queries and fragments they depend on. And outputs a single optimized query string with the fragments.
+
+After this step, we will have a map of file paths (of site files with queries in them) to Query Objects, which contain the raw optimized query text, as well as other metadata such as the component path and page `jsonName`. The following diagram shows the flow involved during query compilation
+
+```dot
+digraph {
+  fragments [ label = "fragments. e.g\l.cache/fragments/fragment1.js", shape = cylinder ];
+  srcFiles [ label = "source files. e.g\lsrc/pages/my-page.js", shape = cylinder ];
+  components [ label = "redux.state.components\l(via createPage)", shape = cylinder ];
+  fileQueries [ label = "files with queries", shape = box ];
+  babylon [ label = "parse files with babylon\lfilter those with queries" ];
+  queryAst [ label = "QueryASTs", shape = box ];
+  schema [ label = "Gatsby schema", shape = cylinder ];
+  relayCompiler [ label = "Relay Compiler" ];
+  queries [ label = "{ Queries | { filePath | <query> query } }", shape = record ];
+  query [ label = "{\l    name: filePath,\l    text: rawQueryText,\l    originalText: original text from file,\l    path: filePath,\l    isStaticQuery: if it is,\l    hash: hash of query\l}\l ", shape = box ];
+
+
+  fragments -> fileQueries;
+  srcFiles -> fileQueries;
+  components -> fileQueries;
+  fileQueries -> babylon;
+  babylon -> queryAst;
+  queryAst -> relayCompiler;
+  schema -> relayCompiler;
+  relayCompiler -> queries;
+  queries:query -> query;
+}
+```
+
+#### Store Queries in Redux
+
+We're now in the [handleQuery](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-watcher.js#L68) function.
+
+If the query is a `StaticQuery`, we call the `replaceStaticQuery` action to save it to to the `staticQueryComponents` namespace which is a mapping from a component's path to an object that contains the raw GraphQL Query amonst other things. More details in [Static Queries](/docs/behind-the-scenes-static-vs-normal-queries/). We also remove component's `jsonName` from the `components` redux namespace. See [Component/Page dependencies](/docs/behind-the-scenes-dependencies/). 
+
+If the query is just a normal every day query (not StaticQuery), then we update its component's `query` in the redux `components` namespace via the `replaceComponentQuery` action.
+
+#### Queue for execution
+
+Now that we've saved our query, we're ready to queue it for execution. Query execution is mainly handled by [page-query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/page-query-runner.js), so we accomplish this by passing the component's path to `queueQueryForPathname` function.
+
+Now let's learn about [Query Execution](/docs/behind-the-scenes-query-execution/).
+
diff --git a/docs/docs/behind-the-scenes-static-vs-normal-queries.md b/docs/docs/behind-the-scenes-static-vs-normal-queries.md
@@ -0,0 +1,41 @@
+---
+title: Static vs Normal Queries
+---
+
+## TODO Difference between normal and Static Queries
+
+Static Queries don't need to get run for each page. Just once
+
+### staticQueryComponents
+
+Started here because they're referenced in page-query-runner:findIdsWithDataDependencies.
+
+The redux `staticQueryComponents` is a map fronm component jsonName to StaticQueryObject. E.g
+
+```javascript
+{
+  `blog-2018-07-17-announcing-gatsby-preview-995` : {
+    name: `/path/to/component/file`,
+    componentPath: `/path/to/component/file`,
+    id: `blog-2018-07-17-announcing-gatsby-preview-995`,
+    jsonName: `blog-2018-07-17-announcing-gatsby-preview-995`,
+    query: `raw GraphQL Query text including fragments`,
+    hash: `hash of graphql text`
+  }
+}
+```
+
+The `staticQueryComponents` redux namespace is owned by the `static-query-components.js` reducer with reacts to `REPLACE_STATIC_QUERY` actinos.
+
+It is created in query-watcher. TODO: Check other usages
+
+TODO: in query-watcher.js/handleQuery, we remove jsonName from dataDependencies. How did it get there? Why is jsonName used here, but for other dependencies, it's a path?
+
+### Usages
+
+- [websocket-manager](TODO). TODO
+- [query-watcher](TODO). 
+  - `getQueriesSnapshot` returns map with snapshot of `state.staticQueryComponents`
+  - handleComponentsWithRemovedQueries. For each staticQueryComponent, if passed in queries doesn't include `staticQueryComponent.componentPath`. TODO: Where is StaticQueryComponent created? TODO: Where is queries passed into `handleComponentsWithRemovedQueries`?
+
+  TODO: Finish above
diff --git a/docs/docs/behind-the-scenes-terminology.md b/docs/docs/behind-the-scenes-terminology.md
@@ -0,0 +1,83 @@
+## Terminology
+
+Read up on [page creation](/docs/page-creation/) first.
+
+### dataPath
+
+Path to the page's query result. Relative to `/public/static/d/{modInt}`. Name is kebab hash on `path--${jsonName}`-`result->sha1->base64`. E.g
+
+`621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc`
+
+### Redux `jsonDataPaths` namespace (dataPaths)
+
+Map of page `jsonName` to `dataPath`. Updated whenever a new query is run (in [query-runner.js](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/internal-plugins/query-runner/query-runner.js)). e.g
+
+```
+{
+  // jsonName -> dataPath
+  "blog-2018-07-17-announcing-gatsby-preview-995": "621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc"
+}
+```
+
+This is also known via the `dataPaths` variable.
+
+### Query result file
+
+`/public/static/d/621/${dataPath}`
+
+E.g
+
+`/public/static/d/621/path---blog-2018-07-17-announcing-gatsby-preview-995-a74-dwfQIanOJGe2gi27a9CLKHjamc.json`
+
+This is the actual result of the GraphQL query that was run for the page `/blog/2018-07-17-announcing-gatsby-preview/`. The contents would look something like:
+
+```javascript
+{
+  "data": {
+    "markdownRemark": {
+      "html": "<p>Today we....",
+      "timeToRead": 2,
+      "fields": {
+        "slug": "/blog/2018-07-17-announcing-gatsby-preview/"
+      },
+      "frontmatter": {
+        "title": "Announcing Gatsby Preview",
+        "date": "July 17th 2018",
+        ...
+      },
+      ...
+    }
+  },
+  "pageContext": {
+    "slug": "/blog/2018-07-17-announcing-gatsby-preview/",
+    "prev": {
+      ...
+    },
+    "next": null
+  }
+}
+```
+
+For a query such as:
+
+```javascript
+export const pageQuery = graphql`
+  query($slug: String!) {
+    markdownRemark(fields: { slug: { eq: $slug } }) {
+      html
+      timeToRead
+      fields {
+        slug
+      }
+      frontmatter {
+        title
+        date(formatString: "MMMM Do YYYY")
+        ...
+      }
+      ...
+    }
+  }
+`
+```
+
+TODO: Consider Creating a standalone terminology page
diff --git a/docs/docs/behind-the-scenes.md b/docs/docs/behind-the-scenes.md
@@ -0,0 +1,13 @@
+---
+title: Behind the Scenes
+---
+
+Curious how Gatsby works under the hood? This pages in this section describe how a Gatsby build works from an internal code/architecture point of view. It should be useful for anyone who needs to work on the internals of Gatsby, or for those who are simply curious how it all works, or perhaps you're a plugin author and need to understand how core works to track down a bug? Come one, come all! 
+
+If you're looking for information on how to _use_ Gatsby to write your own site, or create a plugin, check out the rest of the Gatsby docs. This section is quite low level.
+
+These docs aren't supposed to be definitive, or tell you everything there is to know. But as you're exploring the Gatsby codebase, you might find yourself wondering what a concept means, or which part of the codebase implements a particular idea. These docs aim to answer those kinds of questions.
+
+A few more things. These docs are mostly focused on `gatsby build`. Operations specific to `gatsby develop` are mostly ignored. Though this may change in the future. Also, they mostly focus on the happy path, rather than getting bogged down in details of error handling.
+
+Ready? Dive in by exploring how [APIs/Plugins](/docs/how-plugins-apis-are-run/) work.
diff --git a/docs/docs/build-caching.md b/docs/docs/build-caching.md
@@ -0,0 +1,8 @@
+---
+title: Build Caching
+---
+
+This is a stub. Help our community expand it.
+
+Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your
+pull request gets accepted.
diff --git a/docs/docs/data-storage-redux.md b/docs/docs/data-storage-redux.md
@@ -0,0 +1,8 @@
+---
+title: Data Storage (Redux)
+---
+
+This is a stub. Help our community expand it.
+
+Please use the [Gatsby Style Guide](/docs/gatsby-style-guide/) to ensure your
+pull request gets accepted.