-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AS4 contains code that makes triggering a crashing bug in graphql@16.6 (fixed in 16.7) more likely #7746
Comments
Hi. You're right that the fix that landed in v3.8.2 doesn't appear to be in AS4. I'm not sure if we intentionally didn't port this to AS4 (which was nearing its first alpha release on a different branch at the time) or if it was an oversight. My memory (and understanding of my comment) is that the choice made by Apollo Server only actually caused a problem due to a bigger bug in graphql-js: graphql/graphql-js#3528 Basically, even though the "couldn't happen" error could in fact happen, the only case where it happens is a case where a thrown error should be swallowed and ignored due to the behavior of Promise.all anyway. However, the graphql-js bug meant that sometimes the error was not swallowed. It looks like the graphql-js bug was fixed in v16.7.0 (https://github.com/graphql/graphql-js/releases/tag/v16.7.0) and AS4 requires graphql@16.6.0, which in fact is what you are using. So you should be able to immediately fix your problem by upgrading graphql-js a tiny bit. I'm not sure what the best approach should be for us here:
|
Hello, Thank you for the prompt reply. We integrated this plugin based on the guidance in the documentation. We followed the recommended implementation outlined on that page: import { ApolloServer } from '@apollo/server';
import { ApolloServerPluginUsageReporting } from '@apollo/server/plugin/usageReporting';
const server = new ApolloServer({
typeDefs,
resolvers,
plugins: [
ApolloServerPluginUsageReporting({
fieldLevelInstrumentation: 0.5,
}),
],
}); If there's a potential for this plugin to throw exceptions that can bring down your server, it would be helpful for such information to be mentioned in the documentation. This would have allowed us to take precautions, such as wrapping the plugin with a try/catch block. Ideally, I think you should add an option that prevents the plugin from throwing exceptions, and I think you should set it to Thank you for considering our feedback |
As I said, I do think the biggest bug here is the graphql-js bug that was fixed in v16.7. My belief is that upgrading your graphql-js to 16.7 will fix your problem immediately. It is true that AS4's dependencies allow you to use it with v16.6 (which is literally the only published version of graphql-js that is supported by AS4 but has this bug in it), and we should probably deal with this fact by reintroducing the AS3 patch — but if you want to fix your problem today, the real problem is that graphql-js's error handling is broken in v16.6. We are throwing exceptions in a place that is supposed to be safe to throw exceptions, except that in precisely one supported version of graphql-js, it accidentally isn't. The reason your server is crashing is that graphql-js is improperly hooking up its Promises, in such a way that Promises that should count as "handled" because they are passed to a Promise.all are not handled. This actually has nothing to do with our plugin at all: it can happen any time that an error is thrown from a "synchronous" resolver when there's also an asynchronous resolver for another field at the same level already running that throws an error. See the test from the PR that fixed the bug: if you run the query The line in our plugin is merely something that makes it more likely that you'll hit this case, because it makes every resolver after a certain point throw. Since the bug is triggered by "at least two fields in parallel throw", it makes it more likely. Again, we probably should apply the AS3 change to AS4. But even once we do that, you should still upgrade your As an analogy: graphql/graphql-js#3528 is a bomb sitting in the middle of your office. The |
…ld` is called "late" (#7747) Porting #6398 from AS3. This fix was unintentionally left out of AS4. Fixes #7746 The minimum version of `graphql` officially supported by Apollo Server 4 as a peer dependency, v16.6.0, contains a [serious bug that can crash your Node server](graphql/graphql-js#3528). This bug is fixed in the immediate next version, `graphql@16.7.0`, and we **strongly encourage you to upgrade your installation of `graphql` to at least v16.7.0** to avoid this bug. (For backwards compatibility reasons, we cannot change Apollo Server 4's minimum peer dependency, but will change it when we release Apollo Server 5.) Apollo Server 4 contained a particular line of code that makes triggering this crashing bug much more likely. This line was already removed in Apollo Server v3.8.2 (see #6398) but the fix was accidentally not included in Apollo Server 4. We are now including this change in Apollo Server 4, which will **reduce** the likelihood of hitting this crashing bug for users of `graphql` v16.6.0. That said, taking this `@apollo/server` upgrade **does not prevent** this bug from being triggered in other ways, and the real fix to this crashing bug is to upgrade `graphql`. --------- Co-authored-by: David Glasser <glasser@apollographql.com>
Fixed via #7747 |
(but just to be explicit (as you will see from the CHANGELOG in v4.9.4), the actual fix is still to upgrade |
I appreciate the thorough explanation. As I mentioned earlier, I understand the underlying issue and how to fix it. I was merely offering my perspective in response to your, perhaps rhetorical, question: "I'm not sure what the best approach should be for us here". While the occurrence of this "impossible" state might not be attributable to you, I think many of your customers, including us, would prefer the usage reporting to fail silently without bringing down the server. I mean, there is a possibility that the Thank you for considering our feedback |
Appreciated — I'm glad we got this change in, and I'm also happy that we're now strongly recommending graphql v16.7 in our "migrating to AS4" guide and we'll make sure it's required by AS5. Thanks for bringing this up! |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Issue Description
Hi,
We recently encountered significant production disruptions at my company where containers were crashing. By examination of the logs, we were able to attribute the disruptions to ApolloServerPluginUsageReporting. We swiftly deployed a new build without the plugin, which fixed the issue.
Today, I extracted logs from our ECS cluster to try and find root cause of the disruptions. Below is a snapshot of the acquired logs:
I proceeded by cloning the source code of both
graphql-js
andapollographql
in an effort to understand this issue. It appears to me that the problem arose due to promises resolving subsequent to the invocation of thetreeTraceBuilder
stop function. The graphql-js library does not seem to abort promises that are being executed with Promise.all.While preparing to open an issue in the repository, I stumbled upon this existing issue. It seems that the problem was already identified and a solution was deployed. Nonetheless, upon reviewing the code for treeTraceBuilder in main, I’m unable to locate this fix. Furthermore, git blame implies that there hasn’t been any alteration to this line of code for the past three years.
Have there been alterations in version 4.0 that rendered this fix unnecessary, or is there another element I am overlooking?
Really appreciate any help, thanks!
Link to Reproduction
N/A
Reproduction Steps
N/A
The text was updated successfully, but these errors were encountered: