Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source and transform nodes') #6654

Closed
dustinhorton opened this issue Jul 22, 2018 · 156 comments

Comments

@dustinhorton
Copy link
Contributor

dustinhorton commented Jul 22, 2018

Description

gatsby develop hangs on source and transform nodes after querying a large WordPress installation (~9000 posts, ~35 pages).

Is there any guides as to what's too big for Gatsby to handle in this regards?

Environment

  System:
    OS: macOS High Sierra 10.13.6
    CPU: x64 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
    Shell: 3.2.57 - /bin/bash
  Binaries:
    Node: 8.10.0 - ~/n/bin/node
    Yarn: 1.5.1 - ~/n/bin/yarn
    npm: 5.6.0 - ~/n/bin/npm
  Browsers:
    Chrome: 67.0.3396.99
    Safari: 11.1.2
  npmPackages:
    gatsby: ^1.9.273 => 1.9.273
    gatsby-image: ^1.0.54 => 1.0.54
    gatsby-link: ^1.6.45 => 1.6.45
    gatsby-plugin-google-analytics: ^1.0.27 => 1.0.31
    gatsby-plugin-postcss-sass: ^1.0.22 => 1.0.22
    gatsby-plugin-react-helmet: ^2.0.10 => 2.0.11
    gatsby-plugin-react-next: ^1.0.11 => 1.0.11
    gatsby-plugin-resolve-src: 1.1.3 => 1.1.3
    gatsby-plugin-sharp: ^1.6.48 => 1.6.48
    gatsby-plugin-svgr: ^1.0.1 => 1.0.1
    gatsby-source-filesystem: ^1.5.39 => 1.5.39
    gatsby-source-wordpress: ^2.0.93 => 2.0.93
    gatsby-transformer-sharp: ^1.6.27 => 1.6.27
  npmGlobalPackages:
    gatsby-cli: 1.1.58

edit: Just want to reiterate—this is not something easily fixable by deleted .cache/, .node_modules, etc. If that resolves your problem, you weren't experiencing this issue.

@pieh
Copy link
Contributor

pieh commented Jul 23, 2018

Can You prepare reproduction repo? Number of posts shouldn't be a problem (at least at this step) - v1 might get into memory problems but this would be in later build step and shouldn't get stuck

@pieh pieh added the status: needs more info Needs triaging and reproducible examples or more information to be resolved label Jul 23, 2018
@dustinhorton
Copy link
Contributor Author

dustinhorton commented Jul 23, 2018

Was curious if it was an issue with Local by Flywheel, and able to build the site when serving WordPress via MAMP Pro.

But, I'm not even building post pages yet (am building the pages), and the execution time for that problematic step is 636.41s (just shy of 11 minutes).

const path = require('path')

exports.createPages = ({ boundActionCreators, graphql }) => {
  const { createPage } = boundActionCreators

  const postTemplate = path.resolve('./src/templates/Post/Post.js')

  graphql(
    `
      {
        allWordpressPost {
          edges {
            node {
              id
              slug
            }
          }
        }
      }
    `
  )
    .then((result) => {
      console.log('posts')
      // const { data, errors } = result

      // if (errors) console.log(errors)

      // if (!data) return

      //data.allWordpressPost.edges.forEach(({ node }) => {
      //  const { id, slug } = node

      //  createPage({
      //    component: postTemplate,
      //    context: {
      //      id,
      //    },
      //    path: slug,
      //  })
      //})
    })

edit: just enable createPage for posts and execution of that item rose to 14 minutes. Brutal, but also interesting that it's only 3 minutes longer for ~9000 more items. It's sitting on ⠁ run graphql queries for long time currently.

edit: that ran for 419.470 s, or 7 minutes.

@dustinhorton
Copy link
Contributor Author

@pieh Whoops, posted that before I saw you'd just replied. I can try to get this site up remotely tomorrow.

@dustinhorton
Copy link
Contributor Author

And meant to include, this last line is where it hangs via Local, and takes forever via MAMP.

$ gatsby develop
success delete html and css files from previous builds — 0.017 s
success open and validate gatsby-config — 0.226 s
info One or more of your plugins have changed since the last time you ran Gatsby. As
a precaution, we're deleting your site's cache to ensure there's not any stale
data
success copy gatsby files — 0.013 s
success onPreBootstrap — 0.159 s
⠁ source and transform nodes -> wordpress__acf_posts fetched : 100
⠁ source and transform nodes -> wordpress__acf_pages fetched : 34
⠂ source and transform nodes -> wordpress__acf_media fetched : 100
⠈ source and transform nodes -> wordpress__acf_categories fetched : 13
⢀ source and transform nodes -> wordpress__acf_tags fetched : 0
⠄ source and transform nodes -> wordpress__acf_users fetched : 11
⢀ source and transform nodes -> wordpress__POST fetched : 9092
⢀ source and transform nodes -> wordpress__PAGE fetched : 34
⠐ source and transform nodes -> wordpress__wp_media fetched : 7483
⡀ source and transform nodes -> wordpress__wp_types fetched : 1
⠁ source and transform nodes -> wordpress__wp_statuses fetched : 1
⢀ source and transform nodes -> wordpress__wp_taxonomies fetched : 1
⠄ source and transform nodes -> wordpress__CATEGORY fetched : 14
⠈ source and transform nodes -> wordpress__TAG fetched : 19
⠐ source and transform nodes -> wordpress__wp_users fetched : 11
⡀ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "You are not currently logged in."
⠈ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "Sorry, you are not allowed to do that."
⡀ source and transform nodesThe server response was "404 Not Found"
Inner exception message : "No route was found matching the URL and request method"
success source and transform nodes — 636.410 s

@m-allanson m-allanson added the type: question or discussion Issue discussing or asking a question about Gatsby label Jul 24, 2018
@dustinhorton
Copy link
Contributor Author

@pieh Haven't confirmed this will successfully build (now with the WordPress remote, it's taking hours), but it certainly reveals the issue: https://github.com/dustinhorton/gatsby-issue

Should be able to just clone that and build.

@dustinhorton dustinhorton changed the title [gatsby-source-wordpress] Large WordPress site won't build (stuck at 'source and transform nodes')—post and/or page limit? [gatsby-source-wordpress] Large WordPress site causing extremely slow build time (stuck at 'source and transform nodes') Jul 26, 2018
@dustinhorton
Copy link
Contributor Author

Just ran twice for over 10 hours without the site finishing building. Please let me know what else I can provide for help debugging.

@KyleAMathews
Copy link
Contributor

Could you try upgrading to v2? We've made a ton of speed improvements to different gatsby subsystems which should dramatically speed up large sites like this.

@dustinhorton
Copy link
Contributor Author

@KyleAMathews I'll give that a shot tonight—thanks.

@dustinhorton
Copy link
Contributor Author

@KyleAMathews v2 version @ https://github.com/dustinhorton/gatsby-v2-issue. Been building for about 50 minutes at this point.

@dustinhorton
Copy link
Contributor Author

Killing it now. Site still hasn't built.

@KyleAMathews
Copy link
Contributor

Another thing you can try is to enable tracing https://next.gatsbyjs.org/docs/performance-tracing/

We haven't added tracing support yet to gatsby-source-wordpress but the tracing reports might help you figure out where it's stalling.

If anyone else is interested in looking into this, a great PR would be to add tracing support to gatsby-source-wordpress. Lemme know if you're interested!

@KyleAMathews KyleAMathews added the help wanted Issue with a clear description that the community can help with. label Jul 27, 2018
@dustinhorton
Copy link
Contributor Author

Going to need to bail out on this unfortunately, as I need to spend all time I have porting over to a traditional theme—kind of crushed to not be able to use Gatsby. Everything else feels so backwards.

@KyleAMathews
Copy link
Contributor

Sorry we haven't had a chance to look into this :-( Sprinting right now to get v2 out.

Is there a chance you could leave the WP site running? It definitely seems like there's a bug here that should be fixed.

@KyleAMathews
Copy link
Contributor

I tweeted out asking for help so hopefully someone will jump on this soon :-)

https://twitter.com/gatsbyjs/status/1027079401287102465

@dustinhorton
Copy link
Contributor Author

Wow, that's rad—thanks so much. Site isn't going anywhere for the time being (and I'll migrate a copy and update repro repo if it needs to).

@Khristophor
Copy link
Contributor

Khristophor commented Aug 8, 2018

@dustinhorton for what it's worth I've also noticed issues building a larger (~1,000 post) project on Local by Flywheel compared to our production environment with a CDN in front of it.

REST responses for Gatsby are 10-20x longer from Local than from production, so the site takes forever to build. I haven't spent time debugging the issue in Local yet, but it's on my to-do list :)

@KyleAMathews I could take a look at adding tracing to source-wordpress.

@KyleAMathews
Copy link
Contributor

@Khristophor that'd be great!

@Khristophor
Copy link
Contributor

@dustinhorton I'm seeing 404's for the images on your sample site (https://dustinhorton.com/gatsby-wp/wp-content/uploads/2018/07/IMG_9906.jpg, for example) that might be inflating the build time. Any chance you could look in to the paths for those?

@dustinhorton
Copy link
Contributor Author

dustinhorton commented Aug 8, 2018 via email

@Khristophor
Copy link
Contributor

That's true, but part of the source and transform step is to download all the media items it finds in the REST response:
https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-source-wordpress/src/normalize.js#L434

Getting 404's on 7504 images might be causing some problems ;)

@dustinhorton
Copy link
Contributor Author

Believe I've cleaned up all the 404s. Will try to build tonight. Thanks all.

@dustinhorton
Copy link
Contributor Author

Seemingly no change:

~/Sites/gatsby-issue-v2 (master)
→yarn build
yarn run v1.5.1
$ gatsby build
success open and validate gatsby-config — 0.009 s
success load plugins — 0.277 s
success onPreInit — 0.257 s
success delete html and css files from previous builds — 0.008 s
success initialize cache — 0.245 s
success copy gatsby files — 0.079 s
success onPreBootstrap — 0.001 s

=START PLUGIN=====================================

Site URL: http://dustinhorton.com/gatsby-wp
Site hosted on Wordpress.com: false
Using ACF: true
Using Auth: undefined undefined
Verbose output: true

Mama Route URL: http://dustinhorton.com/gatsby-wp/wp-json

⠁ source and transform nodesRoute discovered : /
Invalid route.
Route discovered : /oembed/1.0
Invalid route.
Route discovered : /oembed/1.0/embed
Invalid route.
Route discovered : /oembed/1.0/proxy
Invalid route.
Route discovered : /yoast/v1
Valid route found. Will try to fetch.
Route discovered : /yoast/v1/configurator
Valid route found. Will try to fetch.
Route discovered : /yoast/v1/reindex_posts
Valid route found. Will try to fetch.
Route discovered : /yoast/v1/ryte
Valid route found. Will try to fetch.
Route discovered : /yoast/v1/indexables/(?P<object_type>.*)/(?P<object_id>\d+)
Invalid route.
Route discovered : /yoast/v1/statistics
Valid route found. Will try to fetch.
Route discovered : /acf/v3
Invalid route.
Route discovered : /acf/v3/posts/(?P<id>[\d]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/posts
Valid route found. Will try to fetch.
Route discovered : /acf/v3/pages/(?P<id>[\d]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/pages
Valid route found. Will try to fetch.
Route discovered : /acf/v3/media/(?P<id>[\d]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/media
Valid route found. Will try to fetch.
Route discovered : /acf/v3/categories/(?P<id>[\d]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/categories
Valid route found. Will try to fetch.
Route discovered : /acf/v3/tags/(?P<id>[\d]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/tags
Valid route found. Will try to fetch.
Route discovered : /acf/v3/comments/(?P<id>[\d]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/comments
Valid route found. Will try to fetch.
Route discovered : /acf/v3/options/(?P<id>[\w\-\_]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/users/(?P<id>[\d]+)/?(?P<field>[\w\-\_]+)?
Invalid route.
Route discovered : /acf/v3/users
Valid route found. Will try to fetch.
Route discovered : /wp/v2
Invalid route.
Route discovered : /wp/v2/posts
Valid route found. Will try to fetch.
Route discovered : /wp/v2/posts/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/posts/(?P<parent>[\d]+)/revisions
Invalid route.
Route discovered : /wp/v2/posts/(?P<parent>[\d]+)/revisions/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/pages
Valid route found. Will try to fetch.
Route discovered : /wp/v2/pages/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/pages/(?P<parent>[\d]+)/revisions
Invalid route.
Route discovered : /wp/v2/pages/(?P<parent>[\d]+)/revisions/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/media
Valid route found. Will try to fetch.
Route discovered : /wp/v2/media/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/types
Valid route found. Will try to fetch.
Route discovered : /wp/v2/types/(?P<type>[\w-]+)
Invalid route.
Route discovered : /wp/v2/statuses
Valid route found. Will try to fetch.
Route discovered : /wp/v2/statuses/(?P<status>[\w-]+)
Invalid route.
Route discovered : /wp/v2/taxonomies
Valid route found. Will try to fetch.
Route discovered : /wp/v2/taxonomies/(?P<taxonomy>[\w-]+)
Invalid route.
Route discovered : /wp/v2/categories
Valid route found. Will try to fetch.
Route discovered : /wp/v2/categories/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/tags
Valid route found. Will try to fetch.
Route discovered : /wp/v2/tags/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/users
Valid route found. Will try to fetch.
Route discovered : /wp/v2/users/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/users/me
Valid route found. Will try to fetch.
Route discovered : /wp/v2/comments
Valid route found. Will try to fetch.
Route discovered : /wp/v2/comments/(?P<id>[\d]+)
Invalid route.
Route discovered : /wp/v2/settings
Valid route found. Will try to fetch.
Added ACF Options route.

Fetching the JSON data from 25 valid API Routes...

=== [ Fetching wordpress__yoast_v1 ] === https://dustinhorton.com/gatsby-wp/wp-json/yoast/v1
⠈ source and transform nodes -> wordpress__yoast_v1 fetched : 1
Fetching the wordpress__yoast_v1 took: 936.166ms

=== [ Fetching wordpress__yoast_configurator ] === https://dustinhorton.com/gatsby-wp/wp-json/yoast/v1/configurator
⢀ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "Sorry, you are not allowed to do that."
Fetching the wordpress__yoast_configurator took: 846.014ms

=== [ Fetching wordpress__yoast_reindex_posts ] === https://dustinhorton.com/gatsby-wp/wp-json/yoast/v1/reindex_posts
⢀ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "Sorry, you are not allowed to do that."
Fetching the wordpress__yoast_reindex_posts took: 1010.589ms

=== [ Fetching wordpress__yoast_ryte ] === https://dustinhorton.com/gatsby-wp/wp-json/yoast/v1/ryte
⠠ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "Sorry, you are not allowed to do that."
Fetching the wordpress__yoast_ryte took: 1022.977ms

=== [ Fetching wordpress__yoast_statistics ] === https://dustinhorton.com/gatsby-wp/wp-json/yoast/v1/statistics
⠄ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "Sorry, you are not allowed to do that."
Fetching the wordpress__yoast_statistics took: 820.827ms

=== [ Fetching wordpress__acf_posts ] === https://dustinhorton.com/gatsby-wp/wp-json/acf/v3/posts
⠈ source and transform nodes -> wordpress__acf_posts fetched : 100
Fetching the wordpress__acf_posts took: 6352.670ms

=== [ Fetching wordpress__acf_pages ] === https://dustinhorton.com/gatsby-wp/wp-json/acf/v3/pages
⡀ source and transform nodes -> wordpress__acf_pages fetched : 34
Fetching the wordpress__acf_pages took: 2760.048ms

=== [ Fetching wordpress__acf_media ] === https://dustinhorton.com/gatsby-wp/wp-json/acf/v3/media
⠈ source and transform nodes -> wordpress__acf_media fetched : 100
Fetching the wordpress__acf_media took: 4273.250ms

=== [ Fetching wordpress__acf_categories ] === https://dustinhorton.com/gatsby-wp/wp-json/acf/v3/categories
⠁ source and transform nodes -> wordpress__acf_categories fetched : 13
Fetching the wordpress__acf_categories took: 1029.029ms

=== [ Fetching wordpress__acf_tags ] === https://dustinhorton.com/gatsby-wp/wp-json/acf/v3/tags
⠈ source and transform nodes -> wordpress__acf_tags fetched : 0
Fetching the wordpress__acf_tags took: 941.066ms

=== [ Fetching wordpress__acf_comments ] === https://dustinhorton.com/gatsby-wp/wp-json/acf/v3/comments
⢀ source and transform nodes -> wordpress__acf_comments fetched : 9
Fetching the wordpress__acf_comments took: 2868.036ms

=== [ Fetching wordpress__acf_users ] === https://dustinhorton.com/gatsby-wp/wp-json/acf/v3/users
⠠ source and transform nodes -> wordpress__acf_users fetched : 11
Fetching the wordpress__acf_users took: 2049.181ms

=== [ Fetching wordpress__POST ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/posts
⠁ source and transform nodes
Total entities : 9094
Pages to be requested : 91
⠁ source and transform nodes -> wordpress__POST fetched : 9094
Fetching the wordpress__POST took: 152767.807ms

=== [ Fetching wordpress__PAGE ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/pages
⢀ source and transform nodes -> wordpress__PAGE fetched : 34
Fetching the wordpress__PAGE took: 2194.895ms

=== [ Fetching wordpress__wp_media ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/media
⢀ source and transform nodes
Total entities : 7504
Pages to be requested : 76
⢀ source and transform nodes -> wordpress__wp_media fetched : 7485
Fetching the wordpress__wp_media took: 132029.996ms

=== [ Fetching wordpress__wp_types ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/types
⢀ source and transform nodes -> wordpress__wp_types fetched : 1
Fetching the wordpress__wp_types took: 956.603ms

=== [ Fetching wordpress__wp_statuses ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/statuses
⢀ source and transform nodes -> wordpress__wp_statuses fetched : 1
Fetching the wordpress__wp_statuses took: 1017.845ms

=== [ Fetching wordpress__wp_taxonomies ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/taxonomies
⠠ source and transform nodes -> wordpress__wp_taxonomies fetched : 1
Fetching the wordpress__wp_taxonomies took: 1029.885ms

=== [ Fetching wordpress__CATEGORY ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/categories
⢀ source and transform nodes -> wordpress__CATEGORY fetched : 14
Fetching the wordpress__CATEGORY took: 943.710ms

=== [ Fetching wordpress__TAG ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/tags
⠠ source and transform nodes -> wordpress__TAG fetched : 19
Fetching the wordpress__TAG took: 1104.454ms

=== [ Fetching wordpress__wp_users ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/users
⡀ source and transform nodes -> wordpress__wp_users fetched : 11
Fetching the wordpress__wp_users took: 1325.604ms

=== [ Fetching wordpress__wp_me ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/users/me
⠂ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "You are not currently logged in."
Fetching the wordpress__wp_me took: 926.146ms

=== [ Fetching wordpress__wp_comments ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/comments
⠂ source and transform nodes
Total entities : 9410
Pages to be requested : 95
⡀ source and transform nodes -> wordpress__wp_comments fetched : 9397
Fetching the wordpress__wp_comments took: 85370.673ms

=== [ Fetching wordpress__wp_settings ] === https://dustinhorton.com/gatsby-wp/wp-json/wp/v2/settings
⠁ source and transform nodesThe server response was "401 Unauthorized"
Inner exception message : "Sorry, you are not allowed to do that."
Fetching the wordpress__wp_settings took: 808.396ms

=== [ Fetching wordpress__acf_options ] === http://dustinhorton.com/gatsby-wp/wp-json/acf/v2/options
⠂ source and transform nodesThe server response was "404 Not Found"
Inner exception message : "No route was found matching the URL and request method"
Fetching the wordpress__acf_options took: 1059.276ms

=END PLUGIN=====================================: 412457.896ms
⠁ source and transform nodes

And it's been sitting there for about 8 hours.

@Khristophor
Copy link
Contributor

@dustinhorton what kind of hosting are you using? I think it's just killing your production box with the amount of requests. I believe I got it to finish (after quite some time, not eight hours) setting concurrent connections to something low, like 1 or 2.

@dustinhorton
Copy link
Contributor Author

It's a decent VPS on Linode. I can get settings tweaked on it if that'd help. But the issue happens locally too.

@pieh
Copy link
Contributor

pieh commented Aug 30, 2018

const requestRemoteNode = (url, headers, tmpFilename, filename) =>
new Promise((resolve, reject) => {
const responseStream = got.stream(url, {
...headers,
timeout: 30000,
retries: 5,
})
const fsWriteStream = fs.createWriteStream(tmpFilename)
responseStream.pipe(fsWriteStream)
responseStream.on(`downloadProgress`, pro => console.log(pro))
// If there's a 400/500 response or other error.
responseStream.on(`error`, (error, body, response) => {
fs.removeSync(tmpFilename)
reject(error)
})
fsWriteStream.on(`error`, error => {
reject(error)
})
responseStream.on(`response`, response => {
fsWriteStream.on(`finish`, () => {
resolve(response)
})
})
})
this is sometimes not working correctly when we pull larger amount of files - network request get resolved but file write stream never finishes (or errors out). I think it would be great to add some kind of timeout after responseStream finish to wait for fsWriteStream to finish, and if it doesn't and destroy all resources and try to write file again (possibly make few retries) and actually errors out when it can't actually do that.

@TylerBarnes TylerBarnes removed the topic: source-wordpress Related to Gatsby's integration with WordPress label Dec 6, 2019
@TylerBarnes
Copy link
Contributor

@bradydowling thanks for sharing your repo! For using older versions of Gatsby than cli, you can make an npm script for develop and build.

{
  "scripts": {
    "develop": "gatsby develop",
    "build": "gatsby build"
  }
} 

then running npm run develop or yarn develop will use the local version in your project.

@TylerBarnes
Copy link
Contributor

We're investigating this issue but in the meantime, anyone with the problem may be able to get around it by running CI=1 yarn build, as that should use a different reporter library behind the scenes. If you try that and it works please let us know!

@pvdz
Copy link
Contributor

pvdz commented Dec 9, 2019

@dustinhorton :

v2 version @ https://github.com/dustinhorton/gatsby-v2-issue. Been building for about 50 minutes at this point.

Fwiw. I realize that was posted about a year ago, and Gatsby has changed considerably since then. When running it on my machine (and setting the gatsby version to * in package.json) the build seems to complete in about 2000 seconds (~33 minutes).
Additionally, when upgrading the cli, there's now a progress bar, which makes a huge difference in terms of how long it "feels", since you get a more concrete feedback loop.

The sourcing step takes almost all of this time (1968 / 1975 seconds). The downloading of remote files is the most of that (1845 seconds).

This doesn't surprise me when I look at a single round trip to this server:

# Starting requestInQueue, _concurrentRequests= 10
@ requestInQueue for 75 tasks { concurrent: 10 } { id: 'url' }
@ Fetch http://dustinhorton.com/gatsby-wp/wp-json/wp/v2/media?per_page=100&page=4: 2587.339ms
@ Fetch http://dustinhorton.com/gatsby-wp/wp-json/wp/v2/media?per_page=100&page=10: 2661.584ms
@ Fetch http://dustinhorton.com/gatsby-wp/wp-json/wp/v2/media?per_page=100&page=8: 2695.937ms
@ Fetch http://dustinhorton.com/gatsby-wp/wp-json/wp/v2/media?per_page=100&page=2: 2738.339ms
@ Fetch http://dustinhorton.com/gatsby-wp/wp-json/wp/v2/media?per_page=100&page=6: 2853.199ms

Each request takes roughly 2 to 4 seconds. The 75 pages that are fetched initially while exploring, take 18 seconds in total (!). I have a fast connection and I recan repro that timing with a plain wget.

So the longest step will try to download about 7500 resources. Considering a single request takes 2 to 4 seconds, I'm not surprised it takes that long.

Even so, I do notice some pauses during the main download stretch of 1845 seconds. I'm not sure whether this is just the server throttling the data or not (I did set concurrency to 5).

I did try to wiggle the width of the terminal (I'm on xfce linux, fwiw) and while that occassionally coincided with progress moving forward, I'm right now convinced that's more of a coincidence than causality.

Bottom line: while I can repro the slow download and seemingly "stuck" progress, all signs currently point to that being pretty much caused by waiting on the server response. Additionally, the width the terminal does not seem to affect this.

That said: there is a possibilty that the terminal output gets stuck somehow while updating the progress bar at a very particular width. While this is unlikely, it's not impossible. Hence we really need a repro that we can run ourselves (so no auth). And preferably one that does not depend on a remote server, as I don't want to be hammering the server.

I'm going to update labels on this issue accordingly.

@pvdz pvdz added status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. and removed status: needs more info Needs triaging and reproducible examples or more information to be resolved labels Dec 9, 2019
@pvdz
Copy link
Contributor

pvdz commented Dec 9, 2019

The repro posted in #6654 (comment) by @njmyers does not exist anymore

The repo posted in #6654 (comment) by @bradydowling requires a bunch of permissions I don't have, and seems to have similar problems with round trip time

@ Fetch http://topazandsapphire.com/wp-json/wp/v2/media?per_page=100&page=7: 25025.257ms
@ Fetch http://topazandsapphire.com/wp-json/wp/v2/media?per_page=100&page=4: 27791.269ms
@ Fetch http://topazandsapphire.com/wp-json/wp/v2/media?per_page=100&page=2: 37817.874ms
@ Fetch http://topazandsapphire.com/wp-json/wp/v2/media?per_page=100&page=5: 38056.989ms
@ Fetch http://topazandsapphire.com/wp-json/wp/v2/media?per_page=100&page=3: 38446.504ms
@ Fetch http://topazandsapphire.com/wp-json/wp/v2/media?per_page=100&page=6: 43799.842ms

This sourcing step is not really showing any progress indicator except for the spinner and occassionaly steps are being logged, and still takes a few minutes, so perhaps we can at least show some kind of progress indicidator if that makes sense.

Additionally, perhaps it could help to point out the average time to fetch a resource, as that's an indication of why "Gatsby" is slow, when it's really caused by the round trip.

In this repo, even downloading 589 remote files took about 5 minutes, with the progress bar often just being stuck for no apparent reason.

After the bootstrap the build fails for me because files are missing.

@bradydowling
Copy link

@pvdz I'll have to play with this again (I gave up on it for a while) but there are certain files that throw permissions issues even when it builds successfully so I just figured those can be ignored.

But to summarize your post, are you saying that certain (download) steps just take a really long time and we should wait longer for them to complete?

@pvdz
Copy link
Contributor

pvdz commented Dec 9, 2019

@bradydowling Well, looks like it, yes. :)

FTR: I've tracked the resource gathering a bit. To shed some light on timings;

Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6084.jpg: 15605.630ms
Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6036.jpg
Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6051.jpg: 6447.272ms
Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6034.jpg
Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6045.jpg: 6944.355ms
Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6029.jpg
Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6036.jpg: 6401.541ms
Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/01/IMG_6027.jpg

These are 6mb files btw. I'm on a 250Mbs connection which is fine to handle those faster than 1mbs but it does not surprise me that it blows up download times. No amount of cli resizing is going to speed that up ;)

@bradydowling
Copy link

bradydowling commented Dec 9, 2019

Interesting. This is just a standard WordPress personal blog hosted on EC2 so it's not like it's a gigantic install. Perhaps this is because all these requests are overloading the host. Or, I'm no WordPress expert, but perhaps there's some sort of standard WP rate limit on REST API calls that can happen? I'm also going with the assumption that this behavior isn't unique to this site.

@pvdz
Copy link
Contributor

pvdz commented Dec 9, 2019

Perhaps this is because all these requests are overloading the host.

This is my guess (or something in this ballpark). But I'm exploring a bit of our own architecture to check whether we are losing efficiency through abstractions. But considering I can mimic most of the times reported with plain wgets/curls, I doubt there's much there.

@pvdz
Copy link
Contributor

pvdz commented Dec 9, 2019

So fwiw I replaced the got.stream() bits with a dumb raw downloader:

    let r = ""
    require("http").get(url, res =>
      res
        .on("data", m => (r += m))
        .on("end", () => {
          console.timeEnd("$$ Fetch time for " + url)
          resolve(r)
        })
    )
$ Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/05/IMG_5260.jpg
$$ Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/09/TRAVEL-LEISURE-2-copy.png: 1003.535ms
$ Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/05/International-Travel-Topaz-Sapphire.png
$$ Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/09/IMG_4606.jpg: 3174.126ms
$ Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/05/Brunch-Topaz-Sapphire-2.png
$$ Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/09/IMG_4647.jpg: 9521.157ms
$ Started actually fetching http://topazandsapphire.com/wp-content/uploads/2016/05/IMG_6978.jpg
$$ Fetch time for http://topazandsapphire.com/wp-content/uploads/2016/05/International-Travel-Topaz-Sapphire.png: 3611.910ms

So yes, I'm pretty sure the long delays (in this case at least) are caused by download. So perhaps our best bet is to improve the feedback while waiting for a download :)

@Vacilando
Copy link
Contributor

Lots and lots of people say terminal windows resizing (for whatever weird reason) resolves the develop process stuck on 'source and transform nodes'.

Sadly, when using WSL this is not a solution. Stuck with 'source and transform nodes' locally in build as well as in develop. Netlify builds do work but local development has become impossible.

@pvdz
Copy link
Contributor

pvdz commented Dec 19, 2019

@Vacilando can you debug some links that are being downloaded for your site during sourcing and test manually whether they download fast? Like I mentioned above, one big problem I'm seeing is that certain wp hosts are simply super duper slow.

So if the host is slow and there's a lot of content to download, then yeah this step will take a lot of time because that's all it should be doing in this step; discover content and download it :)

If you've confirmed the content itself is downloaded in a fraction of the whole step, please circle back here. In that case a repro would be tremendously helpful :)

@bradydowling
Copy link

bradydowling commented Dec 19, 2019 via email

@TylerBarnes
Copy link
Contributor

@bradydowling part of that already exists actually. You can set an env variable GATSBY_CONCURRENT_DOWNLOAD to configure the limit for concurrent requests. The next major version of gatsby-source-wordpress #19292 will have more control over how media files are downloaded. As for the caching, downloaded files are currently cached, but when you change a gatsby-*.js file it currently wipes the cache out to prevent a stale cache from causing unexpected bugs. So that's a core issue rather than being gatsby-source-wordpress specific, but work is always being done to improve Gatsby's cache.

@wardpeet
Copy link
Contributor

Partially Jobs Api (#19831) should fix this caching problem.

@bradydowling
Copy link

Ya I saw the bit about GATSBY_CONCURRENT_DOWNLOAD closer to the top. From my experience, that didn't help so I guess my suggestion was toward more fine-grained control like in mb per s/m/h or something like that. Maybe I'm just saying nonsense.

@TylerBarnes
Copy link
Contributor

@bradydowling I'm looking at adding request retries with exponential backoff as well as adding an optional setting for max requests per second for cases where that doesn't work well enough.

@github-actions
Copy link

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jan 10, 2020
@pvdz
Copy link
Contributor

pvdz commented Jan 13, 2020

I'm going to close this now.

If you think you have a wordpress sourcing problem, please confirm that your delays are not caused by a slow wordpress server first. Then please open a new issue (but feel free to point back to this issue).

The high number of comments makes it very difficult to track the discussion. So opening a new issue is more likely to result in your specific problem getting an answer.

@pvdz pvdz closed this as completed Jan 13, 2020
@pvdz pvdz removed help wanted Issue with a clear description that the community can help with. stale? Issue that may be closed soon due to the original author not responding any more. type: bug An issue or pull request relating to a bug in Gatsby status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. labels Jan 13, 2020
@gatsbyjs gatsbyjs locked as resolved and limited conversation to collaborators Jan 13, 2020
@dustinhorton
Copy link
Contributor Author

dustinhorton commented Jan 13, 2020

I and others confirmed it over the past year and a half. my original issue was on a well-tuned vps. @njmyers had a likely fix, or at least improvement, but couldn't get any answers from maintainers about how they'd like it done.

i thought about closing myself, but i think it needs to be out there as a warning that a moderately large wordpress site is NOT a good fit for gatsby as of yet.

@pvdz
Copy link
Contributor

pvdz commented Jan 14, 2020

@dustinhorton I understand that. This issue is over a year and a half old, things change rapidly. With the issue amounting this many comments it's difficult to figure out the actual problem anymore.

image

Fwiw, as noted above, I checked the last reported repros and determined those, at least, were caused by slow remotes. If you have a repro with a current Gatsby release on a fast remote please let me know, even if it's perhaps already posted in this thread. Or maybe open a new issue for it (and tag me) if you want more focus on it, I'll leave that up to you :)

(Just to be clear, we closed this issue because it's gone a bit stale with too many off-topic messages, please do not feel like we're squashing the discussion as that is not the intention and we recognise our work is not finished here!)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests