Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(source-filesystem): No file watchers on build #6207

Closed
wants to merge 12 commits into from

Conversation

Bouncey
Copy link
Contributor

@Bouncey Bouncey commented Jun 28, 2018

When a site has many many files coming through gatsby-source-filesystem, an issue can arise where the system running gatsby runs out of watchers. This gist has a workaround for increasing the number of watchers available.

The above fix does not work reliably in a netlify build container however. So this PR makes changes where, if the NODE_ENV is 'production', we use the built-in fs to traverse the directory tree looking for file paths. This will not attach any listeners.

If we are in 'development', we still use chokidar to monitor for file changes as per the current functionality.

@gatsbybot
Copy link
Collaborator

gatsbybot commented Jun 28, 2018

Deploy preview for using-drupal ready!

Built with commit 3699cfe

https://deploy-preview-6207--using-drupal.netlify.com

@gatsbybot
Copy link
Collaborator

gatsbybot commented Jun 28, 2018

Deploy preview for gatsbygram failed.

Built with commit 3699cfe

https://app.netlify.com/sites/gatsbygram/deploys/5b6b821bfdd72a65db274fdc

})

const fsMachine = createFSMachine()
let currentState = fsMachine.initialState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above 2 lines ashould be insidefileWatcher function - otherwise there would be single instance of state machine even if multiple instances of plugin are running which would cause problems

@pieh
Copy link
Contributor

pieh commented Jun 29, 2018

Others thing that comes to my mind - ignored files/directories currently isn't exactly 1:1 - it would be good to create more complex fixture and compare if both fileWatcher and fileFinder creates same nodes

@pieh
Copy link
Contributor

pieh commented Jun 29, 2018

Also worth noting - this also results in build speedup - source and transform nodes gatsbyjs.org on netlify changed from ~48s to ~38s

@m-allanson
Copy link
Contributor

I'm a bit wary that having two separate implementations for finding files will be the source of subtle bugs. As an example it looks like the Windows / AppVeyor tests for this are failing at the moment.

I actually don't know how chokidar works behind the scenes, but I wonder if there's a way to use it without watchers, or to tell it to output a list of files that it will watch? A quick look at the docs seems like there's a polling method and a getWatched() method. Maybe we can make use of those instead of rolling our own file finding implementation?

@Bouncey
Copy link
Contributor Author

Bouncey commented Jun 29, 2018

@pieh The idea would be that only one would run, determined by the NODE_ENV, so the 'same node' issue shouldn't happen.

@m-allanson my understanding of chokidar is that it is a wrapper around fs.watch and anything that it finds it will attach a watcher to. My idea was to not use chokidar for the build, as the files ideally will not change during a build.

I thought I had pushed a fix for appveyor, I will try again.

@KyleAMathews
Copy link
Contributor

I echo @m-allanson's concern about having two separate implementations.

But this would solve but the watchers exhaustion problem & speed up things as there's no reason to add watchers in production builds so that's wasting time.

Interesting discussion paulmillr/chokidar#410 You might want to try using one of the libraries mentioned there e.g. https://github.com/kmalakoff/readdirp-walk instead of rolling your own algorithm.

@m-allanson
Copy link
Contributor

@Bouncey it'd be great to get this merged if you have a chance to try out using readdirp-walk or similar?

@thebigredgeek
Copy link
Contributor

Wondering how we can push this across the finish line? This is also a blocker for internal gatsby inc stuff

@Bouncey
Copy link
Contributor Author

Bouncey commented Aug 8, 2018

Hey @m-allanson & @thebigredgeek, I have been slammed at work recently. I plan to get the changes made as soon as possible though.

@Bouncey Bouncey force-pushed the feat/no-watch-on-build branch from fb12f83 to 3699cfe Compare August 8, 2018 23:51
@Bouncey
Copy link
Contributor Author

Bouncey commented Aug 8, 2018

Sorry for the force push, but it was a grizzly rebase. I just had a feeling I was about to add 300 commits to this bad boy.

@KyleAMathews
Copy link
Contributor

Deploy preview for using-postcss-sass failed.

Built with commit 3699cfe

https://app.netlify.com/sites/using-postcss-sass/deploys/5b6b821cfdd72a65db274fe2

@KyleAMathews
Copy link
Contributor

Deploy preview for image-processing failed.

Built with commit 3699cfe

https://app.netlify.com/sites/image-processing/deploys/5b6b821cfdd72a65db274fe6

@KyleAMathews
Copy link
Contributor

Deploy preview for using-remark failed.

Built with commit 3699cfe

https://app.netlify.com/sites/using-remark/deploys/5b6b821efdd72a65db274fee

@thebigredgeek
Copy link
Contributor

So FWIW, you can set the environment variable CHOKIDAR_USEPOLLING to a truthy value and it will bypass the watcher issue entirely, though with slightly more CPU usage.

@pieh
Copy link
Contributor

pieh commented Sep 20, 2018

I've opened issue in chokidar repo asking if disabling file watcher could be implemented directly there: paulmillr/chokidar#770 Let's see what maintainers there think about it.

@pieh
Copy link
Contributor

pieh commented Oct 22, 2018

Got response from chokidar maintainer that this isn't something that they want to support. So my best bet right now is to create abstraction that use either chokidar or readdirp depending on if we want to watch for changes or not. It seems like chokidar is using readdirp internally so hopefully same glob patterns can be used and won't cause different results - will need to investigate more.

@m-allanson
Copy link
Contributor

For reference, this PR covers a similar issue #5807

* master: (1595 commits)
  chore: update link for react-gatsby-firebase-authentication (gatsbyjs#10314)
  fix(www): Awesome Gatsby sidebar link (gatsbyjs#10313)
  Add thijs koerselman to creators list (gatsbyjs#10303)
  chore(release): Publish
  fix(gatsby-plugin-emotion): allow for React.Fragment shorthand syntax (gatsbyjs#10291)
  feat(www): Update starter cards (gatsbyjs#10258)
  Update index.md (gatsbyjs#10307)
  Update index.md (gatsbyjs#10306)
  Add thijs koerselman portfolio to sites list (gatsbyjs#10304)
  chore(release): Publish
  fix(gatsby): don't remount matchPath pages (gatsbyjs#10261)
  chore(release): Publish
  feat(gatsby-source-contentful): enable RichText for all users (gatsbyjs#10301)
  docs(tutorial): update 404 screenshot (gatsbyjs#10295)
  feat(gatsby-plugin-typescript): allow specifying babel preset options (gatsbyjs#10248)
  docs(gatsby-plugin-sass): fix typo in docs (gatsbyjs#10292)
  fix(ci): actually only run tests on non-docs changes (gatsbyjs#10287)
  fix(blog): rename for correct date in link (gatsbyjs#10290)
  Adds a more descriptive link purpose (gatsbyjs#9266)
  fix(www): comment out down showcase site
  ...
Mostly this involved including the changes from gatsbyjs#8859 into file-watcher.js
@m-allanson
Copy link
Contributor

m-allanson commented Dec 7, 2018

I've updated this branch with the latest changes from master. There's a Windows issue that needs clearing up, I'll look into that now.

* master: (35 commits)
  feat(gatsby-source-filesystem): keep original name of remote files (gatsbyjs#9777)
  docs(gatsby-source-contentful): Rewrite of gatsby-source-contentful query section (gatsbyjs#7533)
  Update "Deploying with Now" Guide (gatsbyjs#10390)
  Add Matomo to list of analytics plugins (gatsbyjs#10372)
  Add satispay.com to showcase (gatsbyjs#10380)
  Adds @goblindegook/gatsby-starter-typescript (gatsbyjs#10377)
  Fix typo in gatsby-remark-code-repls sample `gatsby-config.json` in README (gatsbyjs#10361)
  fix(gatsby): fix false type conflict warning on plugin fields (gatsbyjs#10355)
  fix(docs): Just a small typo fix in the docs (gatsbyjs#10359)
  feat(gatsby-image): add onStartLoad prop  (gatsbyjs#6702)
  fix(docs): add Ecosystem to docs sidebar, consistency with tutorial sidebar (gatsbyjs#10350)
  fix(www): Starters.yaml housekeeping (gatsbyjs#10354)
  docs: add ttag starter (gatsbyjs#10352)
  docs: document branching (gatsbyjs#9983)
  plugin checker initial commit (gatsbyjs#7062)
  docs: new starter features is required (gatsbyjs#10353)
  docs: migrated line highlighting to highlight-line, highlight-start, highlight-end (gatsbyjs#10202)
  Add Birra Napoli to site showcase (gatsbyjs#10344)
  Add BetterDocs to site showcase (gatsbyjs#10349)
  chore(release): Publish
  ...
@m-allanson
Copy link
Contributor

I've swapped readdirp-walk out for readdirp, as readdirp is used directly by chokidar and readdirp-walk hasn't been updated in a few years. So readdirp seems like a safer bet in terms of consistency with chokidar.

@gatsbyjs/core I think this is good for another review now.

@pieh
Copy link
Contributor

pieh commented Dec 13, 2018

Idea here:
instead of using 2 different code paths for dev and production in plugin code, what if we create hopefully thin abstraction that would have same API as chokidar, but would accept watch: true/false param (and decide to use chokidar or readdirp based on that).

That way we could do

-const watcher = chokidar.watch(pluginOptions.path, {
+const watcher = findFiles(pluginOptions.path, {
+   watch: process.env.GATSBY_EXECUTING_COMMAND === `develop`,  
    ignored: [
      `**/*.un~`,
      `**/.DS_Store`,
      `**/.gitignore`,
      `**/.npmignore`,
      `**/.babelrc`,
      `**/yarn.lock`,
      `**/bower_components`,
      `**/node_modules`,
      `../**/dist/**`,
      ...(pluginOptions.ignore || []),
    ],
  })

here as first step, and later on we could move it to its own package and do the same in gatsby core without much effort.

What are your thoughts?

* master: (33 commits)
  fix(blog): youfit case study typofix
  Doc improvements to Visual testing with Storybook guide (gatsbyjs#10436)
  fix(gatsby-plugin-offline): prevent incorrect revisioning of static file by workbox (gatsbyjs#10416)
  fix(starters): ttag repo link
  fix typo in pull request template (gatsbyjs#10454)
  fix(www) Fix query for plugin links always ?=undefined (gatsbyjs#10453)
  chore(release): Publish
  fix(gatsby): fix extracting StaticQuery nested in shorthand fragment (gatsbyjs#10443)
  fix(www): avoid querying for no-cache=1 (gatsbyjs#10389)
  fix(gatsby-image): update typescript definitions - properly mark fields as optional (gatsbyjs#10419)
  refactor(gatsby): improve EnsureResources (gatsbyjs#10224)
  Fixed minor Typos and grammatical errors (gatsbyjs#9353)
  docs: add ClinciJS website into showcase (gatsbyjs#10437)
  docs(babel-preset-gatsby): document --save-dev flag in README (gatsbyjs#10434)
  fix(docs): Environment Variables Examples (gatsbyjs#10406)
  chore(release): Publish
  [gatsby-image] re: fade out base64 on full image load (gatsbyjs#7539)
  docs(starter-library): add example to starter library (gatsbyjs#10425)
  docs(gatsby-plugin-offline): specify to not HTTP-cache sw.js (gatsbyjs#10430)
  fix(docs): prompt => confirm (gatsbyjs#10431)
  ...
@m-allanson
Copy link
Contributor

Thanks @pieh, that was a good idea as it turned up some issues that I'd previously overlooked.

It looks like chokidar does some additional processing on paths before deciding which ones to ignore.

This means that for our list of default ignores to be handled correctly, we'd need to emulate chokidar functionality in Gatsby code. This seems like a great way to introduce subtle bugs into Gatsby :)

Example difference

As an example - in the default ignores we have a couple of globs like **/node_modules. The file some/dir/node_modules/packagename/subdir/foo/blah.js will be ignored by chokidar as it breaks the path down into directory fragments and runs the ignore list against all the fragments (at least that’s my understanding so far).

In our new findFiles method we use anymatch directly on the full paths. findFiles will do something like:

anymatch(`**/node_modules`, `some/dir/node_modules/packagename/subdir/foo/blah.js`)

and so won’t ignore the file, whereas chokidar will ignore the file. We could handle this case in our own code, but there may be other differences too.

Conclusion

I'm going to close this PR for now, as:

  • using CHOKIDAR_USE_POLLING or increasing the number of available watchers provides a useable workaround
  • writing a findFiles method that's compatible with Chokidar's ignores doesn't seem like a valuable use of time

Thanks @Bouncey for the original PR!

@m-allanson m-allanson closed this Dec 17, 2018
@m-allanson
Copy link
Contributor

Additional research

I spent a bit of time looking at other projects and how they handle file watching.

This isn't particularly relevant, but I collected the info so might as well post it.

Other watchers

There's watchman from facebook, this requires an external service to be installed.

There's NSFW from Axosoft, this is a native node module and requires node-gyp to be functional before you can install it.

Other projects that watch files

VSCode seems to have a different watcher for each platform (one is chokidar).

Brunch uses chokidar’s persistent flag to switch between build and develop modes

Webpack wraps chokidar in its own watchpack library.

Parcel also wraps chokidar for its watch handling code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants