Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatsby develop and build hang on source and transform nodes stage on large CSV file with extremely high ram usage #33868

Closed
2 tasks done
DanielPBliss opened this issue Nov 5, 2021 · 17 comments · Fixed by #36610
Labels
help wanted Issue with a clear description that the community can help with. topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label type: bug An issue or pull request relating to a bug in Gatsby

Comments

@DanielPBliss
Copy link

DanielPBliss commented Nov 5, 2021

Preliminary Checks

Description

When using gatsby build or gatsby develop with the plugin gatsby-transformer-csv with a largeish csv file ~16'000 rows it hangs on the 'source and transform nodes' stage seemingly indefinitely. Also the nodejs process is using approximately 50Gb out of my 64gb of memory.
Ive tried this with a new starter of gatsby 4 and created a new csv file with the same length to make sure it wasnt an issue the one project or the file was somehow corrupted/invalid.

I know a similar issue was mentioned here : #11839 (comment) but this one doesnt mention high memory usage so may have a different cause

Reproduction Link

https://github.com/DanielPBliss/GatsbySourceAndTransformIssue

Steps to Reproduce

  1. Clone linked repo
  2. Run 'gatsby build' or 'gatsby develop'

...

Expected Result

Build to complete and Ram usage to stay within normal range

Actual Result

Build hangs on 'source and transform nodes' stage seemingly indefinitely and ram usage quickly accumulates to about 50gb out of my 64gb.

Environment

System:
    OS: Windows 10 10.0.19042
    CPU: (8) x64 Intel(R) Core(TM) i7-7820HK CPU @ 2.90GHz 
  Binaries:
    Node: 14.15.5 - C:\Program Files\nodejs\node.EXE       
    Yarn: 1.22.5 - C:\Program Files (x86)\Yarn\bin\yarn.CMD
    npm: 6.14.11 - C:\Program Files\nodejs\npm.CMD
  Languages:
    Python: 3.8.3 - C:\Python38\python.EXE
  Browsers:
    Edge: Spartan (44.19041.1266.0), Chromium (95.0.1020.40)
  npmPackages:
    gatsby: ^4.1.0 => 4.1.0
    gatsby-plugin-gatsby-cloud: ^4.1.0 => 4.1.0
    gatsby-plugin-image: ^2.1.0 => 2.1.0
    gatsby-plugin-manifest: ^4.1.0 => 4.1.0
    gatsby-plugin-offline: ^5.1.0 => 5.1.0
    gatsby-plugin-react-helmet: ^5.1.0 => 5.1.0
    gatsby-plugin-sharp: ^4.1.0 => 4.1.0
    gatsby-source-filesystem: ^4.1.0 => 4.1.0
    gatsby-transformer-csv: ^4.1.0 => 4.1.0
    gatsby-transformer-sharp: ^4.1.0 => 4.1.0
  npmGlobalPackages:
    gatsby-cli: 3.14.2

Config Flags

No response

@DanielPBliss DanielPBliss added the type: bug An issue or pull request relating to a bug in Gatsby label Nov 5, 2021
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Nov 5, 2021
@nickamckenna
Copy link

"gatsby build" -> It uses 105Gb in about 30 seconds on my computer! Crazy!

@graysonhicks graysonhicks added topic: performance Related to runtime & build performance topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Nov 7, 2021
@wardpeet
Copy link
Contributor

wardpeet commented Nov 8, 2021

My guess is that we should do csv streaming instead of loading the whole file. It's not something we will prioritize anytime soon but feel free to open a PR.

@LekoArts LekoArts added help wanted Issue with a clear description that the community can help with. and removed topic: performance Related to runtime & build performance labels Nov 8, 2021
@pragmaticpat
Copy link
Contributor

So @wardpeet this looks like some inefficiency within gastby-transformer-csv specifically - is that right?

@joernroeder
Copy link
Contributor

I'm also experiencing very high usage of RAM on my local machine on my gatsby 4 test branch on the same build step.
Also gatsby-could (obviously) fails with Your Gatsby build's memory consumption exceeded the limits allowed in your plan. For more details, see https://gatsby.dev/memory.

Screenshot 2021-11-16 at 21 51 51

I don't use the csv source plugin, but maybe the following list of yarn list | grep gatsby- can help to resolve that.

├─ custom-gatsby-plugin@1.0.0
│  ├─ gatsby-source-apiserver@^2.1.8
│  └─ gatsby-source-filesystem@^4.1.3
├─ custom-gatsby-utils@1.0.0
│  └─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-background-image@^1.3.1
│  ├─ gatsby-plugin-feed@^4.1.1
│  ├─ gatsby-plugin-force-trailing-slashes@^1.0.4
│  ├─ gatsby-plugin-gatsby-cloud@^4.1.3
│  ├─ gatsby-plugin-image@^2.1.3
│  ├─ gatsby-plugin-loadable-components-ssr@^4.1.0
│  ├─ gatsby-plugin-netlify-cms@^6.1.0
│  ├─ gatsby-plugin-react-helmet-async@^1.1.1
│  ├─ gatsby-plugin-react-svg@^3.1.0
│  ├─ gatsby-plugin-robots-txt@^1.6.14
│  ├─ gatsby-plugin-sharp@^4.1.4
│  ├─ gatsby-plugin-sitemap@^5.1.1
│  ├─ gatsby-plugin-theme-ui@^0.12.0
│  ├─ gatsby-plugin-twitter@^4.1.1
│  ├─ gatsby-plugin-webpack-bundle-analyser-v2@^1.1.25
│  ├─ gatsby-remark-autolink-headers@^5.1.1
│  ├─ gatsby-remark-embedder@^5.0.0
│  ├─ gatsby-remark-responsive-iframe@^5.1.0
│  ├─ gatsby-source-datocms@^3.0.10
│  ├─ gatsby-source-filesystem@^4.1.3
│  ├─ gatsby-source-wordpress@^6.1.3
│  ├─ gatsby-transformer-cloudinary@^2.2.3
│  ├─ gatsby-transformer-json@^4.1.0
│  ├─ gatsby-transformer-remark@^5.1.4
│  ├─ gatsby-transformer-sharp@^4.1.0
│  ├─ gatsby-transformer-yaml@^4.1.0
│  └─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-core-utils@^3.1.3
│  └─ gatsby-legacy-polyfills@^2.1.0
├─ gatsby-background-image@1.5.3
├─ gatsby-cli@4.1.4
│  ├─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-recipes@^1.1.3
│  ├─ gatsby-telemetry@^3.1.3
├─ gatsby-core-utils@3.1.3
├─ gatsby-graphiql-explorer@2.1.0
├─ gatsby-legacy-polyfills@2.1.0
├─ gatsby-link@4.1.0
├─ gatsby-page-utils@2.1.3
│  ├─ gatsby-core-utils@^3.1.3
├─ gatsby-plugin-catch-links@4.1.0
├─ gatsby-plugin-feed@4.1.1
│  ├─ gatsby-plugin-utils@^2.1.1
├─ gatsby-plugin-force-trailing-slashes@1.0.5
├─ gatsby-plugin-gatsby-cloud@4.1.3
│  ├─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-telemetry@^3.1.3
├─ gatsby-plugin-image@2.1.3
│  ├─ gatsby-core-utils@^3.1.3
├─ gatsby-plugin-loadable-components-ssr@4.1.0
├─ gatsby-plugin-netlify-cms@6.1.0
├─ gatsby-plugin-page-creator@4.1.4
│  ├─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-page-utils@^2.1.3
│  ├─ gatsby-plugin-utils@^2.1.1
│  ├─ gatsby-telemetry@^3.1.3
├─ gatsby-plugin-react-helmet-async@1.2.0
├─ gatsby-plugin-react-svg@3.1.0
├─ gatsby-plugin-robots-txt@1.6.14
├─ gatsby-plugin-sharp@4.1.4
│  ├─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-plugin-utils@^2.1.1
│  ├─ gatsby-telemetry@^3.1.3
├─ gatsby-plugin-sitemap@5.1.1
├─ gatsby-plugin-theme-ui@0.12.0
├─ gatsby-plugin-twitter@4.1.1
├─ gatsby-plugin-typescript@4.1.3
├─ gatsby-plugin-utils@2.1.1
├─ gatsby-plugin-webpack-bundle-analyser-v2@1.1.25
├─ gatsby-react-router-scroll@5.1.0
├─ gatsby-recipes@1.1.3
│  ├─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-telemetry@^3.1.3
├─ gatsby-remark-autolink-headers@5.1.1
├─ gatsby-remark-embedder@5.0.0
├─ gatsby-remark-responsive-iframe@5.1.0
├─ gatsby-source-apiserver@2.1.8
├─ gatsby-source-datocms@3.0.10
│  ├─ gatsby-core-utils@^3.1.0
│  ├─ gatsby-core-utils@3.1.0
│  ├─ gatsby-plugin-utils@^2.0.0
│  ├─ gatsby-plugin-utils@2.0.0
├─ gatsby-source-filesystem@4.1.3
│  ├─ gatsby-core-utils@^3.1.3
├─ gatsby-source-wordpress@6.1.3
│  ├─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-plugin-catch-links@^4.1.0
│  ├─ gatsby-source-filesystem@^4.1.3
├─ gatsby-telemetry@3.1.3
│  ├─ gatsby-core-utils@^3.1.3
├─ gatsby-transformer-cloudinary@2.2.4
├─ gatsby-transformer-json@4.1.0
├─ gatsby-transformer-remark@5.1.4
│  ├─ gatsby-core-utils@^3.1.3
├─ gatsby-transformer-sharp@4.1.0
├─ gatsby-transformer-yaml@4.1.0
├─ gatsby-worker@1.1.0
│  ├─ gatsby-cli@^4.1.4
│  ├─ gatsby-core-utils@^3.1.3
│  ├─ gatsby-graphiql-explorer@^2.1.0
│  ├─ gatsby-legacy-polyfills@^2.1.0
│  ├─ gatsby-link@^4.1.0
│  ├─ gatsby-plugin-page-creator@^4.1.4
│  ├─ gatsby-plugin-typescript@^4.1.3
│  ├─ gatsby-plugin-utils@^2.1.1
│  ├─ gatsby-react-router-scroll@^5.1.0
│  ├─ gatsby-telemetry@^3.1.3
│  ├─ gatsby-worker@^1.1.0

@joernroeder
Copy link
Contributor

shot in the dark, but it could be the same issue #34081

@github-actions
Copy link

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Dec 16, 2021
@LekoArts LekoArts removed the stale? Issue that may be closed soon due to the original author not responding any more. label Jan 3, 2022
@witcradg
Copy link

witcradg commented Jan 4, 2022

I guess I need to fork the entire gatsby project to create the PR for this. @joernroeder

@witcradg
Copy link

witcradg commented Jan 4, 2022

I've committed the change to my fork. I'm not sure I'm following the right procedure to hand it off but you can find it here:
https://github.com/witcradg/gatsby
@joernroeder
If I haven't done this as a proper PR, let me know and I'll do it again.

@github-actions
Copy link

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Jan 25, 2022
@LekoArts LekoArts removed the stale? Issue that may be closed soon due to the original author not responding any more. label Jan 28, 2022
@LekoArts
Copy link
Contributor

LekoArts commented Feb 1, 2022

@witcradg We'd be happy to receive this as a PR. Feel free to send it in! :)

@smf706fish
Copy link

Any updates on this??? Having a similar problem trying to build a gatsby website with 10,000+ pages from 50 csv files for each state and then 1 big csv file with all url slugs in it. Site builds using about 1/6 of the data, building with anymore and gets stuck on "source and transform nodes" then stops with a "Killed: 9" message. Tried --max-old-space-size=8192 as well as fix @witcradg made with csv transformer and no luck. Also tried breaking the url slug csv into smaller csvs with no luck.

@joernroeder
Copy link
Contributor

I might be able to create a pull request for this over the weekend. Just need the time to look into it.

1 similar comment
@AshuuDixit
Copy link

I might be able to create a pull request for this over the weekend. Just need the time to look into it.

@github-actions
Copy link

github-actions bot commented Mar 4, 2022

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Mar 4, 2022
@jidicula
Copy link

jidicula commented Mar 6, 2022

I get a trace from dmesg saying

CODE SIGNING: cs_invalid_page(0x11635c000): p=14585[node] final status 0x23010204, denying page sending SIGKILL

when I run gatsby develop or gatsby build. The killed message shows up while plugins are loading.

❯ npx gatsby build --verbose
verbose set gatsby_log_level: "verbose"
verbose set gatsby_executing_command: "build"
verbose loading local command from:
path/to/project/node_modules/gatsby/dist/commands/build.js
verbose running command: build
success compile gatsby files - 0.192s
success load gatsby config - 0.036s
⠋ load plugins
[1]    15593 killed     npx gatsby build --verbose

@github-actions github-actions bot removed the stale? Issue that may be closed soon due to the original author not responding any more. label Mar 7, 2022
@github-actions
Copy link

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

@github-actions github-actions bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Mar 27, 2022
@github-actions
Copy link

github-actions bot commented May 6, 2022

Hey again!

It’s been 60 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it.
Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to comment on this issue or create a new one if you need anything else.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community! 💪💜

@github-actions github-actions bot closed this as completed May 6, 2022
@LekoArts LekoArts removed the stale? Issue that may be closed soon due to the original author not responding any more. label Nov 16, 2022
@LekoArts LekoArts reopened this Nov 16, 2022
LekoArts pushed a commit that referenced this issue Nov 16, 2022
Co-authored-by: ascott <ascott@DESKTOP-39AL99T.localdomain>
Co-authored-by: Lennart <lekoarts@gmail.com>
Fixes #33868
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issue with a clear description that the community can help with. topic: plugins Related to plugin system, themes & catch-all for plugins that don't have a label type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
Development

Successfully merging a pull request may close this issue.