Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gatsby-source-contentful] downloadLocal broken by gatsby-source-filesystem #23123

Closed
mjmaurer opened this issue Apr 15, 2020 · 23 comments
Closed
Labels
topic: source-contentful Related to Gatsby's integration with Contentful type: bug An issue or pull request relating to a bug in Gatsby

Comments

@mjmaurer
Copy link
Contributor

mjmaurer commented Apr 15, 2020

Description

#20843 introduced a timeout for createRemoteFileNode. I'm almost certain this breaks localFile for contentful projects with ~15 or greater assets.

I fixed transitive dependencies on gatsby-source-filesystem to 2.1.47 (right before #20843) and the issue was fixed.

Steps to reproduce

Attempt using gatsby-source-contentful with downloadLocal enabled. If gatsby develop takes > 30 seconds, createRemoteFileNode will silently timeout. Build will complete, but most localFile fields in graphiql will be null.

Expected result

localFile fields are populated.

Actual result

localFile fields are null

Other Notes

I think all of the createRemoteFileNode calls are actually completing, but the timeout has some nasty side effect.

I'd love to see this reverted as I have to resort to the very hacky npm-force-resolutions

Environment

System:
OS: Linux 4.4 Ubuntu 18.04.4 LTS (Bionic Beaver)
CPU: (8) x64 Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Shell: 4.4.20 - /bin/bash
Binaries:
Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node
Yarn: 1.22.1 - /usr/bin/yarn
npm: 6.13.4 - ~/.nvm/versions/node/v12.16.1/bin/npm
Languages:
Python: 2.7.17 - /usr/bin/python
npmPackages:
gatsby: ^2.17.4 => 2.20.20
gatsby-image: ^2.2.30 => 2.3.2
gatsby-plugin-brotli: ^1.3.1 => 1.3.1
gatsby-plugin-emotion: ^4.1.18 => 4.2.1
gatsby-plugin-manifest: ^2.2.41 => 2.3.3
gatsby-plugin-netlify: ^2.1.32 => 2.2.1
gatsby-plugin-postcss: ^2.1.16 => 2.2.1
gatsby-plugin-prefetch-google-fonts: 1.4.3 => 1.4.3
gatsby-plugin-react-helmet: ^3.1.13 => 3.2.2
gatsby-plugin-react-svg: ^3.0.0 => 3.0.0
gatsby-plugin-remove-fingerprints: 0.0.2 => 0.0.2
gatsby-plugin-resolve-src: ^2.0.0 => 2.0.0
gatsby-plugin-sharp: ^2.2.32 => 2.5.4
gatsby-source-contentful: ^2.1.73 => 2.2.7
gatsby-transformer-remote-filesystem: ^0.2.0 => 0.2.0
gatsby-transformer-sharp: ^2.3.0 => 2.4.4
npmGlobalPackages:
gatsby-cli: 2.11.8

@mjmaurer mjmaurer added the type: bug An issue or pull request relating to a bug in Gatsby label Apr 15, 2020
@ascorbic
Copy link
Contributor

Hi. I'm having trouble reproducing this. I created a test site with 100 (5-10MB) image assets. I added downloadLocal: true and ran gatsby develop. Downloading remote files took 48.9s. After loading, localFile was set on all of the assets. How long does the "Downloading remote files" stage take? Can you post the full build logs? Ideally, are you able to share a repro?

@ascorbic ascorbic added the status: needs more info Needs triaging and reproducible examples or more information to be resolved label Apr 15, 2020
@mjmaurer
Copy link
Contributor Author

I'm at a loss for words. Removed the force-resolution, and did a clean install of gatsby-source-contentful. A gatsby clean to finish off, and suddenly everything is working as expected.

Thanks for the work of reproing and that PR. Definitely ran multiple clean npm installs yesterday as well as multiple clean gatsby builds. In the end, not sure what was happening.

@lucassilvagc
Copy link
Contributor

lucassilvagc commented Apr 19, 2020

Hi,

Is there a specific solution to this problem? I have ~274 assets on my space... and a bunch of them are failing.

What I'm noticing is that, whenever I reach ~220-230 files, it times out.

Like this:

Fetch Contentful data: 986.556ms
⠴ source and transform nodes
[======================= ] 13.563 s 231/274 84% Downloading remote files

Then it completes and later it fails.

@kyleconrad
Copy link

kyleconrad commented Apr 20, 2020

Running into this as well - have 217 assets in my Contentful (including some larger video files - all below 50MB), it claims to have completed the download and all of that. However, when I query, I'm getting this:

image

File exists on Contentful, file does not exist on my local server. There's no notice or warning that stuff is failing, it's just in the background not completing.

Getting this in the terminal: success Downloading remote files - 30.672s - 174/217 7.07/s

Quick update: removed the plugin, installed it again, cleaned, etc, and got this: success Downloading remote files - 30.591s - 164/217 7.09/s - so looks like there's definitely something with the timeout where it gets right over 30.5s and decides to fail.

@mjmaurer
Copy link
Contributor Author

Yea I ran into this again as well. I think it was my internet connection + a large file. But it does seem like an issue that if one file fails, everything fails silently

@kyleconrad
Copy link

@mjmaurer I manually bumped up the two TIMEOUT numbers in create-remote-file-node.js (in gatsby-source-filesystem) from 30s to 30 minutes, and it's successfully pulling all the files (including large videos). It's 100% a gross hack and not sustainable, but at least to get over the hump here it works.

@GrtDev
Copy link

GrtDev commented Apr 25, 2020

Got the same result here as well: success Downloading remote files - 30.130s - 56/93 3.09/s
Missing random localFile data of some images. Download gets cut off right at the 30 seconds mark.

I did some further digging:

It seems the timeout is created in the requestRemoteNode.

const responseStream = got.stream(url, {
headers,
timeout: CONNECTION_TIMEOUT,
retries: CONNECTION_RETRY_LIMIT,
...httpOpts,
})

All the requests promises are created at the same time but the actual requests are only loaded in order. This causes all the requests at the bottom of the stack to time out and fail.

The Timeout error is not handled in the download-contentful-assets and therefor fails silently.

When actually logging the error you get the following trace:

failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
...
...
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
success Downloading remote files - 30.618s - 39/93 3.04/s

Because the error is not handled the result is still seen as a success even though the website will not run properly due to missing data. So this error needs to be handled appropriately.

As for the request failing, the issue seems to be that too many of them are fired off at the same time. But are only loading in order.

concurrent: process.env.GATSBY_CONCURRENT_DOWNLOAD || 200,

It seems gatsby-source-filesystem assumes you can download 200 files concurrently. But this might not work with the contentful API? I don't know what the limit is here. This is however adjustable via an environment variable.

Setting the following config seems to fix the timeout issue for me. :

gatsby-config.js

process.env.GATSBY_CONCURRENT_DOWNLOAD = 1

new output:

success Downloading remote files - 137.409s - 93/93 0.68/s

As for the timeout, 30 seconds is a good default, but might not be enough for larger files (or slow internet). Perhaps this needs to be adjustable if needed. Perhaps also through an environment variable?

@mjmaurer Maybe reopen this issue as more people seem to encounter this problem?

@mjmaurer mjmaurer reopened this Apr 25, 2020
@wardpeet wardpeet added the topic: source-contentful Related to Gatsby's integration with Contentful label May 13, 2020
@shanekenney
Copy link
Contributor

I've looked into this a bit and from what I can see there's a few issues at play here:

  1. The gatsby-source-contentful plugin swallows exceptions from createRemoteFileNode. These will most likely be networking errors if the downloads timeout or something else unexpected happens like a TCP connection being reset. gatsby-source-contentful then assumes the file has been downloaded successfully when it hasn't and errors crop up later in the build when null references are hit.

  2. The 30s timeout got is configured with. It's possible this will be hit if you're downloading a large asset and you don't have the bandwidth to complete the download in 30s. The easiest way to reproduce this is throttling your network connection and running a build. On MacOS, I used the Network Link Conditioner. Note: The asset size limit in Contentful is 1GB

  3. The default number of concurrent downloads in create-remote-file-node. The default is 200 and this seems to cause all sorts of problems for me running a local build in a large Contentful space. Since there's more downloads happening concurrently, a timeout is more likely for any individual file plus I'm also seeing the occasional connection reset before a timeout happens. It's likely this is less of an issue if you've got a high bandwidth connection to Contentful's asset CDN (aka CloudFront) but I wonder if this is a sensible default from a reliability standpoint. Maybe this could be determined more intelligently, e.g. if network errors are encountered perform some kind of exponential backoff.

I'm going to start working on a PR to fix point 1 immediately. I don't think the Contentful source plugin should ever swallow errors. I would love to get someone's thoughts on points 2 & 3. Happy to work on these as well.

@axe312ger
Copy link
Collaborator

related:

@ascorbic ascorbic removed the status: needs more info Needs triaging and reproducible examples or more information to be resolved label Jun 4, 2020
@github-actions
Copy link

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jun 24, 2020
@axe312ger axe312ger added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Jun 30, 2020
@jayhostan
Copy link

jayhostan commented Jul 8, 2020

I was about to use the downloadLocal today and run into a different issue, reproduced on a clean install.

Seemingly the plugin options are not passed down correctly. In the source plugin pluginConfig.get(downloadLocal) always returns false (same applies for forceFullSync) so the download wont ever start

I'll dig a bit deeper if I'll have the time

System:
OS: macOS 10.15.5
CPU: (16) x64 Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
Shell: 5.7.1 - /bin/zsh
Binaries:
Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node
npm: 6.14.4 - ~/.nvm/versions/node/v12.16.1/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
Browsers:
Chrome: 83.0.4103.116
Safari: 13.1.1
npmPackages:
gatsby: ^2.23.12 => 2.23.12
gatsby-image: ^2.4.9 => 2.4.9
gatsby-plugin-manifest: ^2.4.14 => 2.4.14
gatsby-plugin-offline: ^3.2.13 => 3.2.13
gatsby-plugin-react-helmet: ^3.3.6 => 3.3.6
gatsby-plugin-sharp: ^2.6.14 => 2.6.14
gatsby-source-contentful: ^2.3.24 => 2.3.24
gatsby-source-filesystem: ^2.3.14 => 2.3.14
gatsby-transformer-sharp: ^2.5.7 => 2.5.7
npmGlobalPackages:
gatsby-cli: 2.12.52

@axe312ger
Copy link
Collaborator

@jayhostan could you please check if this is still the case with gatsby-source-contentful@next? Thanks :)

@jayhostan
Copy link

jayhostan commented Jul 17, 2020

@jayhostan could you please check if this is still the case with gatsby-source-contentful@next? Thanks :)

hey @axe312ger , yes the issue is present on next as well

System:
OS: macOS 10.15.5
CPU: (16) x64 Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
Shell: 5.7.1 - /bin/zsh
Binaries:
Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node
npm: 6.14.4 - ~/.nvm/versions/node/v12.16.1/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
Browsers:
Chrome: 83.0.4103.116
Safari: 13.1.1
npmPackages:
gatsby: ^2.23.12 => 2.24.3
gatsby-image: ^2.4.9 => 2.4.13
gatsby-plugin-cdn-files: 0.0.3 => 0.0.3
gatsby-plugin-manifest: ^2.4.14 => 2.4.18
gatsby-plugin-offline: ^3.2.13 => 3.2.18
gatsby-plugin-react-helmet: ^3.3.6 => 3.3.10
gatsby-plugin-remote-images: ^2.2.0 => 2.2.0
gatsby-plugin-sharp: ^2.6.14 => 2.6.19
gatsby-source-contentful: ^3.0.0-contentful-next.35 => 3.0.0-contentful-next.35
gatsby-source-filesystem: 2.2.0 => 2.2.0
gatsby-source-remote-file: ^0.2.0 => 0.2.0
gatsby-transformer-remote-filesystem: ^1.0.0 => 1.0.0
gatsby-transformer-sharp: ^2.5.7 => 2.5.11
npmGlobalPackages:
gatsby-cli: 2.12.52

@axe312ger
Copy link
Collaborator

@jayhostan it works fine for me on my local machine (adding downloadLocal: true && it downloads file). Are you sure you pass the options correctly in gatsby-config.js? 🙈

@jayhostan
Copy link

jayhostan commented Jul 17, 2020

@jayhostan it works fine for me on my local machine (adding downloadLocal: true && it downloads file). Are you sure you pass the options correctly in gatsby-config.js? 🙈
@axe312ger

Screenshot 2020-07-17 at 14 19 31

gatsby-node in the source plugin:

Screenshot 2020-07-17 at 14 20 10

Screenshot 2020-07-17 at 14 20 41

@jayhostan
Copy link

oh well my bad sorry.. I've passed it down at the wrong place. I need a rubber duck

@nandorojo
Copy link
Contributor

nandorojo commented Aug 7, 2020

I'm still getting this issue. When I have bad service, my assets don't download at all. Could we make the timeout an option in the plugin's config? This makes it impossible to build in dev mode.

@axe312ger
Copy link
Collaborator

I am very open for a PR, but it should contain:

  • an config option to set the download timeout
  • a retry logic when it fails

@axe312ger
Copy link
Collaborator

I think most issues in here are because we swallow errors + retry logic was introduced with Gatsby v3 (also v2.32)

We should soon merge and release #24288 which should solve a lot of your issues when downloading to local.

Also:

Please do not use downloadLocal if everything you do is resizing images. Contentfuls Image API can do that perfectly and you will save so much bandwidth, time and disk space!

@axe312ger
Copy link
Collaborator

axe312ger commented Apr 13, 2021

This should be fixed with improved network error handling in the latest release v5.3

A backport to v4 (gatsby v2) should happen soon. (https://github.com/gatsbyjs/gatsby/projects/25)

@axe312ger
Copy link
Collaborator

The backport was released to v2 now as well. yarn upgrade-interactive should help :)

@axe312ger
Copy link
Collaborator

axe312ger commented Apr 27, 2021

I'll let Gatsby Bot close this issue automatically as last user feedback was August 2020.

If you still run into this issue and run latest Gatsby v2 or Gatsby v3, let us know here and we will investigate further.

@tnordberg
Copy link

Hello,

This thread is dormant, but it is the closest explanation I've found. Please redirect me if necessary.

Was the 30s timeout issue ever made configurable?

On Gatsby Cloud, my team is seeing large video files (>50mb) coming from Contentful fail to download, but the site finishes building successfully. This is with downloadLocal set to true. All other (smaller) media assets download and are available as expected. We do not see this issue on our local dev machines, possibly due to faster download speeds.

Is there a way to configure this variable to allow for longer-than-30s downloads?

We are on:

"gatsby": "^4.6.0",
"gatsby-source-contentful": "7.3.2",
"gatsby-source-filesystem": "^4.2.0",

Thanks in advance for any help or insights you can offer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: source-contentful Related to Gatsby's integration with Contentful type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

No branches or pull requests