-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
router hangs under load #550
Comments
Something to ask is if query variables are being inlined. |
It's worth noting that this would be problematic if the operations generated by autocannon were truly dynamic and there was a high cardinality of operations being received; If the same set of inlined variables are sent over and over again, I wouldn't expect a performance hit as the cache would still suffice. (Worth noting that we also have an article about best practices that's somewhat relevant.) |
The queries were all static and there was only one query. This was really simple microbenchmark. The effect was not so much a performance hit as total inability to run the benchmark after a while, but still curl on same query would work reliably, even curl in a shell loop, without missing a beat. So to go further I need to verify that is this the pod that is messed up, and does this scenario occur with other kinds of benchmarks, such as artillery or some native one. |
@tomrj did you look into the pod's behaviour? Could you give me some of the parameters of the bench, like number of concurrent connections, size of query, etc? When you say "total inability to run the benchmark", do you mean that the benchmark never finishes? Or that a benchmark runs entirely but then successive benchmarks fail? |
@tomrj I can reproduce the issue, I'll let you know when I have a fix for it |
Here is what I know so far: in a serie of subgraph requests happening on a connection, at some point one of them has an issue and no more requests are sent to that subgraph. The router still receives any new requests but does not answer if it is waiting for a response from that subgraph. The timeline, pieced together from logs and wireshark:
I debugged hyper's execution a bit and right now I do not believe it comes from hyper, it might be in the reqwest library that we use as http client |
The run seems to be unable to connect to the router at all, all requests fail |
Describe the bug
report 1 @tomrj
I did not see this mystery behaviour in previous builds but when running some load on router, it getting stuck. I’m using a tool called autocannon) after a while, and does not recover.
However: when curling the server using same headers and same query it works just fine.
report 2
I can replicate this sort of behavior when I blow out concurrency limits on downstream lambdas and they start throttling. If I recall, router didn't seem to handling closure of stuck threads all that gracefully and eventually would top out.
To Reproduce
Need more info from users.
Expected behavior
The router shouldn't hang.
The text was updated successfully, but these errors were encountered: