Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatsby Plugin: Asset Manifest (for Server-Side Authentication in Front of Built Assets without client-side routes) #20745

Closed
karlhorky opened this issue Jan 21, 2020 · 14 comments
Labels
topic: webpack/babel Webpack or babel

Comments

@karlhorky
Copy link
Contributor

karlhorky commented Jan 21, 2020

Summary

Gatsby static sites are very fast and optimized. But some content may not be suitable to deliver to all audiences. There are often use cases for including an authentication layer.

Currently, Gatsby promotes using client-only routes for authentication, which negate many of the benefits of the static site generation.

It is possible to set up a Node.js server to achieve server-side authentication (example with Auth0 here: https://github.com/karlhorky/auth0-node-heroku). This example can be extended to serve the Gatsby static assets via express.static or similar - if the user is authenticated, they get the static content back; if not, they receive a 403 Forbidden.

This almost achieves what we want! But it is an all-or-nothing solution - there is no way to restrict access to specific Gatsby assets (for example, based on pages), without multiple crazy, error-prone regexes like this:

const allowedUrlsUserBob = /^(\/|\/(webpack-runtime|app|styles|commons|component---src-pages(-|-courses-1(-|-modules-001-|-modules-002-))index-mdx)-[a-z0-9]+\.js|\/page-data(\/|\/index\/|\/courses\/1\/|\/courses\/1\/modules\/(001|002)\/)(page|app)-data.json|\/courses\/1\/|\/courses\/1\/modules\/(001|002)\/|\/(static|icons)\/.+\.png(\?v=[a-z0-9]+)?)(\?[^/]+)?$/;

Spoiler: I'm using this crazy regular expression option right now 😅

Proposals

I propose offering and documenting one or more tools to support a server-side authentication flow for completely static sites (without using client-only routes), similar to @pieh's comment on #1100 (comment):

If gatsby would generate asset manifest on builds detailing what assets are used for given urls/pathnames - would that help?

1. An Asset Manifest would be a great start!

This would allow for simpler configuration of user-level and page-level access-control:

import assets from 'manifest.json'

// assets['/courses/1/modules/001'] === [
//   '/webpack-runtime-ef3a9f9842cf40a03163.js',
//   '/commons-cdc988b7a0635a52ddb8.js',
//   '/app-cafc7b4b1730f061489d.js',
//   '/styles-6c9411e5cef0c7a2398a.js',
//   '/component---src-pages-courses-1-modules-001-index-mdx-19fb7194fc6837729ecd.js',
//   '/page-data/courses/prep-l-webfs-gen-0/page-data.json',
//   '/page-data/app-data.json',
// ]

const accessControl = {
  // Key: User ID
  // Value: Array of unique pages they are allowed to view
  2: [...new Set([
    ...assets['/courses/1/modules/001'],
    ...assets['/courses/1/modules/002'],
  ])],
}

Of course, I'm not fixed on the API for the manifest. I'd be open to having helpers to extend this too!

2. Configurable Pre-fetching

One thing that these solutions cause is a lot of failed pre-fetching requests for users without full access:

Screen Shot 2020-01-21 at 12 52 12

Maybe there could be a way to configure pre-fetching client-side? So that different resources could be pre-fetched per user?

Basic example

Examples in Proposals section above.

Motivation

Gatsby users will commonly want authentication flows in their apps, and they should also want performant applications, which can be achieved with static site generation.

Alternatives Considered

Existing boilerplates, articles and blog posts, such as those below:

  1. https://github.com/auth0-blog/gatsby-auth0

  2. From @rwieruch in Authentication support #1100 (comment):

I implemented a quick MVP this morning to checkout a whole Firebase authentication flow in Gatsby. Turns out it works.

This doesn't really protect the static content (@sarneeh in #1100 (comment)):

@rwieruch Alright, but I think this is not how you should block website content. In your case you just have client-side authentication logic on your page - anyone with a link to your authorised content will still be able to get it (because it's just a static file somewhere on your host). To make it reliable you still need to authorise the user on the server which serves the content.

Ref ("Authentication support"): #1100

cc @simoneb @pieh @samjulien

@karlhorky karlhorky changed the title Helpers for Server-Side Authentication of Built Assets (without client-side routes) Helpers for Server-Side Authentication in Front of Built Assets (without client-side routes) Jan 21, 2020
@simoneb
Copy link

simoneb commented Jan 21, 2020

Thanks for taking the time to track this request @karlhorky 👍

@karlhorky
Copy link
Contributor Author

karlhorky commented Jan 22, 2020

I have created a repo with my setup for the secure Express server-side authentication of Gatsby static files (no client-only routes or open static content!) here:

https://github.com/karlhorky/gatsby-serverside-auth0

This also includes my above-mentioned regular expressions for rudimentary access control on a per-user and per-page basis, which is what this issue hopes to get a better solution for!

@sidharthachatterjee
Copy link
Contributor

sidharthachatterjee commented Jan 23, 2020

Interesting issue, @karlhorky and thank you for taking the time to write this up.

An Asset Manifest would be a great start

webpack.stats.json which is written to public should contain most of the stuff you'd like from a manifest. It typically looks like:

   "errors":[ 

   ],
   "warnings":[ 

   ],
   "namedChunkGroups":{ 
      "app":{  },
      "component---src-pages-404-js":{  },
      "component---src-pages-index-js":{  },
      "component---src-pages-page-2-js":{  }
   },
   "assetsByChunkName":{ 
      "app":[  ],
      "component---src-pages-404-js":[  ],
      "component---src-pages-index-js":[  ],
      "component---src-pages-page-2-js":[  ]
   }
}```

We chunk per page at the moment so you should be fine mapping pages to these and including `/public/app-data.json` in the app chunk and `/public/<page>/page-data.json` for every page. 

> Configurable Pre-fetching

We've considered adding an opt out mechanism for prefetching and adding an imperative API for prefetching. We'd love contributions for this in case you're interested. Let's track that in https://github.com/gatsbyjs/gatsby/issues/20568

@karlhorky
Copy link
Contributor Author

karlhorky commented Jan 23, 2020

Ok great! So I just need to import the webpack.stats.json file on the start of the Express server and use the information within it, I guess.

I'll see if I can make something work in the repo: https://github.com/karlhorky/gatsby-serverside-auth0


Edit: Done:

Updated the proof of concept repo:

Here's the difference:

Old Solution

// Regular expression to match allowed assets related
// to src/pages/index.mdx in the Gatsby website.
//
// Trying to navigate to assets related to src/pages/page-2.mdx
// will return an "Access denied."
const allowedGatsbyWebsiteUrls = /^(\/|\/(webpack-runtime|app|styles|commons|component---src-pages-index-mdx)-[a-z0-9]+\.js|\/page-data\/(index\/)?(page|app)-data.json|\/(static|icons)\/.+\.png(\?v=[a-z0-9]+)?)(\?[^/]+)?$/;

New Solution

// Require the Gatsby asset manifest from the build
// to get paths to all assets that are required by
// each "named chunk group" (each named chunk group
// corresponds to a page).
//
// Ref: https://github.com/gatsbyjs/gatsby/issues/20745#issuecomment-577685950
const {
  namedChunkGroups,
} = require('../gatsby-website/public/webpack.stats.json');

function pageToWebpackFormat(page) {
  // Replace slashes and periods with hyphens
  return page.replace(/(\/|\.)/g, '-');
}

function pageToGatsbyPageDataPath(page) {
  // Strip the /index.mdx at the end of the page
  // If it's the index, just strip the .mdx
  return page.replace(/(\/index)?\.mdx$/, '');
}

function pageToWebPaths(page) {
  // Strip the index.mdx at the end of the page
  let pageWithoutIndex = page.replace(/((\/)?index)?\.mdx$/, '');
  // Add a slash, but only for non-root paths
  if (pageWithoutIndex !== '') pageWithoutIndex += '/';
  return [pageWithoutIndex, pageWithoutIndex + 'index.html'];
}

function getPathsForPages(pages) {
  return (
    pages
      .map(page => {
        return [
          // All asset paths from the webpack manifest
          ...namedChunkGroups[
            `component---src-pages-${pageToWebpackFormat(page)}`
          ].assets,
          // All of the Gatsby page-data.json files
          `page-data/${pageToGatsbyPageDataPath(page)}/page-data.json`,
          ...pageToWebPaths(page),
        ];
      })
      // Flatten out the extra level of array nesting
      .flat()
      .concat(
        // Everything general for the app
        ...namedChunkGroups.app.assets,
        'page-data/app-data.json',
      )

      .filter(
        assetPath =>
          // Root
          assetPath === '' ||
          // Only paths ending with js, json, html and slashes
          assetPath.match(/(\.(html|js|json)|\/)$/),
      )
      // Add a leading slash to make a root-relative path
      // (to match Express' req.url)
      .map(assetPath => '/' + assetPath)
  );
}

const allowedWebpackAssetPaths = getPathsForPages([
  'index.mdx',
]);

function isAllowedPath(path) {
  const pathWithoutQuery = path.replace(/^([^?]+).*$/, '$1');

  // Allow access to the manifest
  if (pathWithoutQuery === '/manifest.webmanifest') return true;

  // Allow access to images within static and icons
  if (pathWithoutQuery.endsWith('png')) {
    if (
      pathWithoutQuery.startsWith('/static/') ||
      pathWithoutQuery.startsWith('/icons/')
    ) {
      return true;
    }
  }

  return allowedWebpackAssetPaths.includes(pathWithoutQuery);
}

@karlhorky
Copy link
Contributor Author

karlhorky commented Jan 23, 2020

So the webpack.stats.json file does not include the following (will update as I find more):

Files in public/static (eg. Images)

Candidates for extraction:

  1. List out paths to all files in public/static using a library
  2. The public/index.html file contains these paths
  3. The public/app-xxxxxxxxxxxxxxxxxxxxxxx.js file contains these paths

Files in various public/xxxxxxxxxxxxxxxxxxxxxxx directories (eg. Videos, SVG files)

Candidates for extraction:

  1. List out paths to all video, etc. files in each public/xxxxxxxxxxxxxxxxxxxxxxx directory using a library
  2. The public/index.html file contains these paths
  3. The public/component---src-pages-pagepath-xxxxxxxxxxxxxxxxxxxxxxx.js file contains these paths

Files in public/icons

Candidates for extraction:

  1. List out paths to all files in public/icons using a library
  2. The public/index.html file contains these paths

Files in public/page-data (eg. app-data.json and page-data.json Files)

Candidates for extraction:

  1. List out paths to all files in public/page-data using a library
  2. The public/app-xxxxxxxxxxxxxxxxxxxxxxx.js file contains these paths

public/manifest.webmanifest

Candidates for extraction:

  1. Hardcode it
  2. The public/index.html file contains this path

@karlhorky
Copy link
Contributor Author

karlhorky commented Jan 23, 2020

@sidharthachatterjee would the Gatsby team be open to creating a separate Asset Manifest for these files? Maybe in the same format as the webpack stats?

It would allow for my new solution above to be further simplified.

karlhorky added a commit to upleveled/gatsby-serverside-auth0 that referenced this issue Jan 23, 2020
@sidharthachatterjee
Copy link
Contributor

sidharthachatterjee commented Jan 24, 2020

@karlhorky Yup, absolutely. I think this could be a pretty cool gatsby plugin which could use onCreateWebpackConfig (off the top of my head) to hook into webpack using a custom plugin to get all assets for a page (including more than just js).

Files in public/static (eg. Images)

Hmm, this is interesting. @pieh Do we keep a dependency graph of these per page entry point?

Files in public/icons

Could list these like you said in onPostBuild in a plugin

Files in public/page-data (eg. app-data.json and page-data.json Files)

These names can be hard coded because they will always be called these (by design) but I'd consider them internal implementation details which we might break in a minor version

public/manifest.webmanifest

This should be okay to hardcode

@karlhorky karlhorky changed the title Helpers for Server-Side Authentication in Front of Built Assets (without client-side routes) Gatsby Plugin: Asset Manifest (for Server-Side Authentication in Front of Built Assets without client-side routes) Jan 25, 2020
@github-actions
Copy link

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Feb 15, 2020
@karlhorky karlhorky added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Feb 15, 2020
@danabrit danabrit added the topic: webpack/babel Webpack or babel label May 30, 2020
@karlhorky
Copy link
Contributor Author

Update: I've added some features and fixed some things in the proof of concept:

  • Gatsby 404 page displayed when user requests non-existent resource
  • refactored folder structure (most Gatsby-specific code in isAllowedGatsbyPath.js)
  • fixed some weirdness with Auth0 redirects firing multiple times
  • turned on TypeScript checking on the JavaScript files and fixed type errors

@BilaalHussain
Copy link

BilaalHussain commented Jul 22, 2020

This is really awesome work. I am trying to deal with something similar (cloudfront+lambda@edge+s3, block unauthenticated requests to /blog/private/* with the lambda)

One concern I have for the approach you are taking: I notice that my in my website generated by gatsby-transformer-remark, the entire site's content is contained in my app-{hash}.js, so even if I filter the appropriate page-data.jsons. I still can't secure my site without blocking all JS 😬

Have you run into/investigated this problem @karlhorky? (it could be specific to the plugin I'm using)

(I suspect it might be due to my plugin, the allPages GQL query is what is contained in the main js bundle, which obviously contains all the page data)

@karlhorky
Copy link
Contributor Author

No, my app-{hash}.js file does not contain page content (try the demo repo: https://github.com/upleveled/gatsby-serverside-auth0).

It only contains a mapping to each of the pages (so if the page titles are secret, that could be an issue).

@karlhorky
Copy link
Contributor Author

karlhorky commented Jul 25, 2020

Saw that @sidharthachatterjee added some new paths on /static/d/<hash>.json (Gatsby@2.24.7):

#25723

These cause all JavaScript on the page to break if these pre-fetch requests do not succeed, because of how the requests are handled (no catch of errors):

Screen Shot 2020-07-25 at 12 02 27


So this caused the solution above to break (understandable, when using undocumented internals).

I've published a fix here:

upleveled/gatsby-serverside-auth0@278020c

@karlhorky
Copy link
Contributor Author

This behavior of causing all JavaScript on the page to break if pre-fetching fails seems like it could be improved though.

Maybe it could be addressed as part of #25330

@karlhorky
Copy link
Contributor Author

Another change to the static query paths by Sidhartha in #26242 ...

Fixed in upleveled/gatsby-serverside-auth0@85aae95

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
topic: webpack/babel Webpack or babel
Projects
None yet
Development

No branches or pull requests

6 participants