Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove schema updating from bootstrap #11131

Closed

Conversation

stefanprobst
Copy link
Contributor

Currently we infer a GraphQL schema twice during bootstrap. This PR proposes to remove schema updating.

Reasoning

Schema updating was originally intended for debugging (#1268), and later in #2710 to solve a very specific problem (#2685): adding custom fields to SitePage nodes with createNodeField, depending on whether a path condition is met. Things to note:

  • custom fields added unconditionally to SitePage end up in the schema without the need for schema updating
  • adding custom fields wrapped in a conditional like below did not work, because a page with that path does not yet exist in the store when the schema is created, so there is nothing to infer.
exports.onCreateNode = ({ actions, node }) => {
  if (node.internal.type === `SitePage`) {
    if (node.path === `/path-2/`) {
      const { createNodeField } = actions
      createNodeField({
        node,
        name: `foo`,
        value: `bar`,
      })
    }
  }
}

While infering the schema again after all pages have been created does solve this problem, the far less expensive option would have been to always create the field, but fill in the value conditionally. The following works without the need for schema updating:

exports.onCreateNode = ({ actions, node }) => {
  if (node.internal.type === `SitePage`) {
    const { createNodeField } = actions
    createNodeField({
      node,
      name: `foo`,
      value: node.path === `/path-2/` ? `bar` : '',
    })
  }
}

Other than this usecase, is there any other reason for schema updating? Fields on page context would be missing, but they are available as query variables anyway.

Issues

Because schema creation is not run twice, type name postfixes will differ. I tried to adjust the counting in create-type-name, which unfortunately did not work consistently. For a default starter, take a look at the schema diff here.

@stefanprobst stefanprobst requested a review from a team as a code owner January 17, 2019 14:10
Copy link
Contributor

@freiksenet freiksenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable and I want to get rid of extra schema creation step anyway. I do want someone else to have a second look.

@m-allanson m-allanson assigned m-allanson and pieh and unassigned m-allanson Jan 24, 2019
@KyleAMathews
Copy link
Contributor

People use the page schema for creating menus etc. So they'll add data via pageContext and then query the resulting schema. So we could remove support for this but it would be a breaking change.

@freiksenet
Copy link
Contributor

@KyleAMathews any examples in the wild where this feature is used? I want to think about possible workarounds.

@pieh
Copy link
Contributor

pieh commented Jan 24, 2019

Even if there will be workarounds (and I'm sure there will be), we can't make breaking changes like that (at least during v2).

Can you add some context on why you want to remove it?

@KyleAMathews
Copy link
Contributor

I don't know for sure what sites use this just that people talk about it. One example could be you add metadata to each page e.g. it's section and page title and then when generating a navigation, you query for pages within a section and sort by title.

@wardpeet wardpeet added the breaking change If implemented, this proposed work would break functionality for older versions of Gatsby label Jan 25, 2019
@freiksenet
Copy link
Contributor

My point about workaround wasn't that we remove SitePage in general, but that we create in explicitly instead of inferring. The reason we generate schema twice is cause our schema is only generated if there is a node with such type available. If we add ability to create types by providing type def, we could always add SitePage to the schema and skip the extra schema regeneration.

@stefanprobst
Copy link
Contributor Author

SitePage is actually already created in the first schema build step (and also includes all fields added with createNodeField), what is missing is anything that was added to page context in createPages. I was not aware that people used this other than from the query variables in page queries, thanks @KyleAMathews for the clarification!

@freiksenet
Copy link
Contributor

Right, so stuff is passed to page context and that is inferred as GraphQL types.

Is there any reason to actually query pageContext from GraphQL? I know it's a breaking change, but I feel maybe we can change it to be a list of key-value pairs and thus eliminate the need of regenerating the schema.

@freiksenet
Copy link
Contributor

My main concern here is that we are getting a "bag of fields" type for that context, because practically every page (or realistically every page kind), will have their own context, but one would be able to query for all of those fields for any of the pages. It should either be an actual dict type (so list of key/val pairs) or a union of all possible page contexts.

@freiksenet
Copy link
Contributor

Ok, so this would need to wait to Gatsby 3 cause this is a breaking change. I'll summarize motivations here, it can become an RFC.

Why is this important?

The gatsby build structure can be divided into 4 steps.

  1. Collect nodes
  2. Create schema
  3. Generate pages
  4. Render pages

Currently 3) can feed into 2) by defining additional page context. This means that we have a "cycle" in our logic, with pages optionally affecting the schema. Not only it's confusing directly (schema before 3 and after 3 is different), it also causes weird naming issues (all things being someGeneratedType_2 because we generated it twice). In addition this makes doing incremental and/or distributed builds harder, because schema is tied to page generation. Thus ideally we would have a directly data flow, with pages not affecting the schema. (well technically we have pages feed into the nodes when they are created, but we can consider it to be a special case).

How to minimize changes?

Currently you can query page context through all the possible fields every page context has. So if you added fields foo and bar to different pages, your page query would be smth like that:

{
  allSitePage {
    edges {
      node {
        context {
          foo
          bar
        }
      }
    }
  }
}

I propose to instead have page context in GraphQL be a list of key and value fields. The query would look as follows:

{
  allSitePage {
    edges {
      node {
        context {
          key
          value
        }
      }
    }
  }
}

This minimizes the changes that user have to make to adopt their website.

@freiksenet
Copy link
Contributor

Create RFC here gatsbyjs/rfcs#26

@freiksenet
Copy link
Contributor

Closing this because of schema refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change If implemented, this proposed work would break functionality for older versions of Gatsby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants