Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory allocations and very slow execution when type errors occur in complex inputs #1753

Closed
SoyYoRafa opened this issue Feb 22, 2019 · 7 comments · Fixed by #1771
Closed
Labels

Comments

@SoyYoRafa
Copy link

SoyYoRafa commented Feb 22, 2019

Hi,

We have a rather large complex object as an input. A single request with hundreds of type errors in the input took 9 seconds in graphql.execute. The majority of the time spent (>95%) was spent in the inspect() call, specifically its recursive object string construction. There were huge amount of memory allocations. With 20 concurrent requests, we can consistently cause node to run out of memory. Changing inspect() to return an empty string lowered the total execution time of our process to low hundreds of milliseconds and the process was stable. I don't know the exact latency of graphql.execute with inspect effectively disabled. Given this finding, I am proposing three changes.

  1. inspect() should never be in the execution path as currently written. At the very least, it should not be recursive and/or have a maximum string length. In our case, each inspect grew to be over 100KB.
  2. There should be a maximum amount of errors in getVariableValues(). If the number of errors reaches the maximum, no additional varDefNodes are processed. This grew to be ~1500. Cap at 5?
  3. If inspect() must be kept, there should also be a maximum total amount of error lengths between error messages in getVariableValues(). If the limit is reached, no more varDefNodes should be processed. This would prevent large single errors from multiplying by the cap.

Given that we had ~1500 errors, each error was ~100KB and there were 20 concurrent requests, that is 3GB of memory. That is more than the default max-old-space-size.

Item 2 is an easy fix but 1 and 3 will make the execution path much more robust. We are in an air-gapped environment so I cannot easily provide a reproducible case.

        const properties = Object.keys(value)
          .map(k => `${k}: ${inspect(value[k])}`)
          .join(', ');

graphql 14.1.1
node 8.11.3

@daniele-orlando
Copy link

daniele-orlando commented Mar 1, 2019

Same here.

graphql/utilities/coerceValue:coerceValue() is exhausting the entire server memory on large inputs, leading to a Node server crash.

This is really embarrassing, there should be a reasonable limit to the errors that are collected by this recursive function. Otherwise on a production server it would be trivial for an attacker to crash a GraphQL server with any large input.

Please try to address this major flaw as soon as possible because there is a huge hole here.

@SoyYoRafa
Copy link
Author

I am strive to be very sympathetic and patient with open source projects and maintainers. But, I have to agree with @daniele-orlando characterization of the severity. I am highly surprised no emergency fix has been released in 8 days. There must be a lot of vulnerable servers and the code as currently written is simply unacceptable in any serious production environment - how is Facebook running this? I am happy to send a pull request to disable inspect from being recursive.

@IvanGoncharov
Copy link
Member

how is Facebook running this?

FB has its own GraphQL implementation in HACK (a dialect of PHP) and graphql-js used only for Relay and other client-side tools.
Disclaimer: I'm not FB employee, so I may be missing something.

This is really embarrassing,

I am strive to be very sympathetic and patient with open source projects and maintainers.

I'm helping to maintain this project for the last few years mostly in my own free time.
Matt from FB helps a lot, but he is also doing it in his own free time.

Shaming maintainers for not dedicating their own free time to your issue is not very productive.
As a contractor, I have ongoing projects and my responsibilities to customers so I can't immediately stop whatever I'm doing and switch to fixing this issue.


I implement partial fix in #1771 and I will look into getVariableValues in the next few days.

@daniele-orlando
Copy link

daniele-orlando commented Mar 5, 2019

@IvanGoncharov Thanks for your efforts and for your time, and of all maintainers of the graphql-js project.

@mjmahone
Copy link
Contributor

For the JS server implementation, we deeply rely on the support of the community to identify and fix bugs like this.

While we use graphql-js internally at Facebook, we typically do not use it during production requests or in the browser, because as @IvanGoncharov mentioned, we have our own Hack server implementation, and our clients don't generally need graphql-js to handle requests or responses at runtime. We use it extensively at build time (when validating query text and such), but certain classes of errors we will never hit, even if they're really important to solve for the community.

If you run into an important-to-solve issue, you probably have as much ability to create test cases and put up at least an RFC fix as I do. The current core maintainers would greatly appreciate any help on this front!

@yaacovCR
Copy link
Contributor

@IvanGoncharov i guess this can be closed?

@yaacovCR
Copy link
Contributor

yaacovCR commented Oct 7, 2024

Closing this, feel free to reopen as necessary.

@yaacovCR yaacovCR closed this as completed Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants