-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow response times with large documents #723
Comments
Similarly, if you are able to generate the full response higher-up the graph, it is not currently possible to prevent resolve being called on all the child objects. Let's say we have a Schema such as this:
And the following query:
Let's assume the Having the ability for a resolver to do a 'full resolution' could be a huge performance gain in certain situations such as this, since the time to recursively iterate over a large data-set can be significant. |
Any sense of what part of the course of execution is spending the most time for your use case? For example, are you validating the query first and spending time there? Is it all in query execution? Are there particular fields that are spending longer time than others? It would be interesting to see if you could put timing statements in to gain a better understanding of where the slow down is coming from. |
@leebyron Sorry I didn't see this until now. The delay in response time (2-4s, as seen in the 1st GIF) points entirely to the pruning/reformatting of the results to match the requested property structure. When I bypass that part, I can establish a reliable baseline for the query/validation/etc., which is < 100ms. What I was wondering is if, for large arrays, if there is duplicated validation or formatting that's performed on each item that could be done once? (Say that there was a performance opportunity where GraphQL checks each item in array that the requested property |
@leebyron I am experiencing similar issues when I have a list of items with their own resolvers. My resolvers make heavy use of dataloaders and I have initially thought that there is something slow between load -> batch -> return results from in DataLoaders but then I have realized that the individual calls to resolver function of each field takes too long for some reason. Example:
I have logged the process.hrtime results for each resolver call. What I observe is that each call to resolvers easily take 1-2 ms if not more. If I have 15 child nodes, it easily adds up to 70-80ms of just function calls. When I run node --inspect and tried to profile the most time seems to be spend on validate/visitUsingRules functions. Here is a screenshot of Chrome DevTool profiler: I hope this helps. |
As someone who is experiencing similiar issues I'd like to provide my profiler. I have a simple query like so query ExampleQuery { Most of the data fetched is within 5-20 results. However one of the data points fetched is 1200 results. I ran a profiler and seems like a significant portion relates to graphql.execute.js. I used ab testing and had somewhere around 1.5-2.5 seconds for my requests even though I'm seeing the longest individual fetch taking around 500ms. How can I make this more performant? Statistical profiling result from isolate-0x102004600-v8.log, (12704 ticks, 1351 unaccounted, 0 excluded). isolate-0x102004600-v8.log
|
I also wanted to find out which part of the graphql engine is responsible for this performance penality. For this, I first let R2D2 have 100000 friends and then used the
All timings are just rough estimates, I ran the tests only for ~2s each and thus did not have a lot of iterations. But I think the takeaway is that there is no single "bad function" we could fix and magically have the performance improve by a large factor. The test could be repeated with a more complex object and nested loops to more closely reflect real huge queries, but I don't think this would change much. My suggestion If you have a huge subgraph in your GraphQL schema that
I would suggest the following: Annotate these huge lists of objects in some kind (e.g. attaching a Symbol) and and then check for this symbol in execute.js export const disableGraphQLPostProcessingSymbol = Symbol('disableGraphQLPostProcessingSymbol');
// ...
function completeValue(
// ...
// If result is an Error, throw a located error.
if (result instanceof Error) {
throw result;
}
if (result[disableGraphQLPostProcessingSymbol]) {
return result;
}
// ...
} This totally cuts off any processing of the huge subgraph and, in my test case, yields sub-millisecond response times. |
@terencechow I edited your comment and hide
@Yogu Since #1251 you can add performance benchmarks so it would be great to have something like your test inside If you or someone else wants to help with writing such test it would help a lot both in improving performance and preventing performance regressions in future.
Yes, but it doesn't mean we could cut down a lot from by using:
|
It's been a while since I looked into it (over a year it seems!), but IIRC for each returned property, it was checked against the requested schema. For example, 100 items that had At the time, my take was that that path-whitelisting could only happen once, vs. for every item. Again, over year has gone by since I experienced this problem, so... 🤷♂️ |
I think the TL;DR of this issue is that GraphQL has some overhead and that reducing that overhead is non-trivial and removing it completely may not be an option. Ultimately GraphQL.js is still responsible for making API boundary guarantees about the shape and type of the returned data and by design does not trust the underlying systems. In other words GraphQL.js does runtime type checking and sub-selection and this has some cost. I think improving the performance of GraphQL.js execution is still very possible but achieving the same performance as the lower bound of simply passing through data without sub-selection or checking is probably not possible. The changes Ivan referenced in his comment above will help out a bunch. They may collectively reduce a 1000ms query to a 700ms query if the majority of the time spent is in GraphQL overhead rather than waiting on services. There's also probably a lot more room to improve, but that's something I would look to heavy users of GraphQL.js to help analyze and contribute. PRs that speed up execution are always welcome for review. Also, I think it's always worth considering the tools you're using for the job. I think if your responses are measuring in the 100's of megabytes then GraphQL may be the wrong tool for that job. Likewise, if you see sub-selection as a pain point costing time rather than a feature allowing flexibility, then again GraphQL is probably the wrong tool for that job. One option I've seen in the past is to create a custom "scalar" type which simply captures a bag of untyped JSON, so that you can both use GraphQL for the portions where that is valuable but fall back on plain JSON when that is more useful. |
I think this is a fair explanation. Thank you for that. |
I've created a simple example that shows performance degradation of GraphQL caused by returning a promise from a resolver. This issue doesn't seem to be related to type-checking, but rather actual GraphQL implementation. I understand the point about GraphQL not being the right tool for these type of queries, but still this seems like an actual issue. Sample results:
Here is the code snippet: Follow up: I replaced native node Promise with Bluebird, and performance results are much better.
|
Hi, |
@chinmay1994 |
I also experienced very slow performance with an array of large documents but using graphql server cache solved the issue entirely (I'm using Apollo Server with their response cache plugin). Before using the plugin it would take 1 second to get the response, now it takes 88 msec. This workaround may be the easiest in terms of implementation while we still retain all the useful features of GraphQL like type checking and fields selection. |
I'm experiencing the same problem with |
Sorry for commenting on an old thread, but this is one of the places that are high in the Google Search results when researching We had a dramatic improvement in performance after ensuring that all the synchronous resolvers remain synchronous throughout the codebase. This is especially relevant if you're using schema transforms, resolvers composition, or similar utilities where you can implicitly make all resolvers in your schema In general, the issue is much more taxing after your payload size creates enough promises to fully saturate the event loop, after that, the performance rolls down the hill badly. To summarize, avoid |
Does anyone have any proof that using If there is a performance difference, I would like to understand the mechanism that causes that. |
As I read it, the point is to avoid returning a promise and/or utilizing an async function if you don’t have to, as async in whatever form will carry a performance hit. Internally, graphql-js avoids wrapping anything synchronous with a promise, specifically to avoid that, and the recommendation is that resolver code similarly be careful in that regard. |
Originally posted here:
I'm trying to resolve some performance issues with large documents, and the problem (AFAICT) is due to the pruning of the document based on requested fields.
Here's how I discovered it:
2s - 4s response time.
Even if I use
formatResponse(response) { return [] }
to pretend nothing came back , it's still a problem somewhere beforeformatResponse
.106ms response time.
In the resolver, If I do:
And then use
formatResponse
to do:I can see the response starting & streaming much faster.
A co-worker tried
master
to see if #710 resolves it, but it does not appear so.For reference on why we have a document this large, it's because, internally, we leverage GraphQL to fetch a full document that we then reduce into a separate document that describes the state of key entities for internal tooling.
(For example, "why does this product not appear for users in Texas?")
Because of the complex (programmatic) rules that run against these documents, we're showing internal users the unfiltered document and filtering using the same logic that happens in user-land.
In the short-term, it appears our best option is to find a means of returning an unfiltered document (for performance) for internal uses?
The text was updated successfully, but these errors were encountered: