-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch exceptions thrown by Executor.execute #636
Comments
The most basic handling should be catching the exception and using it to cancel the call. However, this will appear to the application like such calls are "leaked," because the cancellation wouldn't likely be delivered to the application (and even the But we need to support things like On the server, if the But for the other cases, we may want a second, "fallback" executor. It could be a single thread, or it could be the shared executor we typically use (probably in concert with Unfortunately, if it is the We need to also make sure that we close any message
|
Included another suggestion for this here: #7106 (comment) |
(Responding here to keep design ideas here) @njhill wrote:
I don't feel comfortable with a polling approach. At the very least that would increase load when already overloaded. I guess we could schedule a single Runnable in the executor that could call all the callbacks (and so there's be one 2 second timer, independent of number of streams), which would reduce the cost if rejection is still occurring. But I don't feel like this would notify the application quickly enough, since it is likely overloaded. Maybe if there is a rejection we could try to run that one notify-of-cancellation Runnable every time we try to schedule onto the executor until it succeeds. That would work better. There'd be some synchronization required. I'm still sort of bothered by the delay of the cancellation notification, as I can think of some easy pathological cases, but it's probably "good enough" as a default. (Assuming someone asks for it,) I guess we could use a fallback executor just if one is provided, and otherwise use this scheme. |
Hey, thanks for logging/tracking this issue! We discovered this is happening for our own service today and it's causing us to leak some resources in streaming calls, which causes our servers to run themselves over in a sort of feedback loop (we are under load => we get rejected close calls => we fail to clean up resources that continue competing in the threadpool). We're wondering if it's workable to use the ThreadPoolExecutor#CallerRunsPolicy to ensure that we're propagating onClose()/onCompletion() calls to our observer to make sure that we can clean up resources. It's not perfect because we're using it for our coroutines calls too when they're done executing, but we think it's best that if this error starts happening, that we're able to at least see that it's happening and try cleaning from there. Would that be an okay work around? |
@ejona86 Any update about the prioritization of this issue? It was open more than 7 years ago and is it categorized as a bug. Due to this issue, the executions rejected by the executor with a RejectedExecutionException can't be detected by current gRPC event handler methods so we can't make needed cleanups and metrics recording. Basically in that case we cannot do a thing. The CALLER_RUNS_POLICY workaround does not work for us as (for this case) it's a futile attempt to pass the responsibility to another layer, but we have not the information needed to accomplish the tasks we have to complete. I think that having this issue unsolved brings a resilience issue to gRPC Java about properly managing tasks we are not able to compute when resources are getting exhausted (executors' resources). @garrison-stauffer Using ThreadPoolExecutor#CallerRunsPolicy only transfers your problem to another layer (the caller one). If in that layer you are able to do your required actions (cleanups, observability measures, etc.) I think it's the only workaround you have as of today. |
When I created the issue I thought this would have been a bigger deal. But I think it turned out not to be for a couple reasons.
Instead of rejecting runnables, I think the normal solution to use an interceptor that fails RPCs. (E.g., it can count the number of concurrent RPCs to a service and when it exceeds a limit it fails and can even report info to the client on when to retry.) Some later comments sketch solutions for point 3. But points 1 and 2 still come into play and you can tell that very few people have complained about this. (I've discussed it in person with a reasonable number of people so they certainly notice and are disappointed, but point 3 morphs the conversation.) Point 2 was eroded by #8266. And I have some hopes to erode it further by letting you configure Executors on a per-service basis. But that's also because I want to get to a point where it is more useful to fix this. |
Hello @ejona86 thanks for your answer. Some clarifications about your points (and about our need): In the point 1, an executor can reject and it's a normal behavior. I think if grpc-java is using an executor it should ease supporting the normal behavior of an executor, and rejecting it is part of this normal behavior (normal==API defines it). So handling that situation should be needed, and the lack of such handling should be considered a flaw IMHO. This comment applies both for the direct and any other configured executor. We do use an interceptor that fails RPC´s. But for task limiting purposes we do need to have a counter (inflight tasks counter), we increase it when the task starts and we decrease it when the task ends. The only case where the counter never gets decreased is when there is a rejection in the executor and that's because we cannot find a place in the grpc-java interfaces where we can put our code to decrease the counter so in such cases the counter remain "un-decreased" making the counter useless as it keeps a wrong count anytime at least one rejection occurs in the whole JVM life. For us, to have a point in grpc-java where to place our decrease-counter logic in every case (including rejections from executors) is crucial. Not having it makes useless such feature (the counter and then the task limiting feature) , as (we all agree) CALLER_RUNS_POLICY seems no solution here. You could tell me: Always enable your task limiter and you will never have a rejection in the executor. It's a good point but our task limiter can start disabled, and then be enabled in runtime, when you see in your metrics some anomalous parameter; maybe rejections because your sizing was low for the peak load you are receiving. If we have this feature, our counter would count right the inflight tasks and the rate limiter will be able to drop rc's and then recover. But without this, the counter has already missed a lot of decrement operations, so that's unrecoverable. Its value will be wrongly high (due to the missed decrements due to the rejections). I hope I've explained our need and why we feel that having a situation that is not controlled in any existent hook means to leave loose ends . And that loose ends in our case invalidates a whole feature. |
There's literally no good way to handle it in the general case. From what I've seen, most Java code will leak like a sieve if it gets a rejection. With async code, there simply isn't a way to handle it safely. For example, consider ListenableFuture which just logs or that ForkJoinPool doesn't let you specify a limit. (ForkJoin pool will throw RejectedExecutionException, but only if shutdown or it gets an OutOfMemoryError, or some cases concerning concurrency which prove the point but are too complex to discuss here). The only way to handle this is variations on the idea that somewhere there's an executor that will actually accept the Runnable. We might allow one executor to reject, but have a "backup" executor that won't reject.
Exactly. You are telling me that some executor must accept the Runnable. The Runnable must be run. Period. But that's the opposite of rejection and demonstrates why rejection, while being a nice idea, is pretty hard to make use of in practice in async environments. Rejection makes sense for simpler scenarios where unit-of-work == Runnable. In that case, rejecting the executable rejects the entire unit of work. Think servlets where requests are tied to a single thread and you use blocking I/O. But for async it falls apart. A hard-line "the API says" won't get you anywhere. I am sympathetic to your cause but that approach is a dead end. I can simply say "rejected exceptions are unsupported" or that gRPC "handles" the rejection via crashing and leaking memory. Certainly, there could be no guarantee that your callback is executed. This is more of a give-and-take and not really a place for an API lawyer, as there is no constraint that says combining various APIs is guaranteed to produce a useful result. API lawyering can just help you predict the result. (FWIW, RejectedExecutionException is a RuntimeException which many people consider "a programmer error", aka bug, if they are thrown. This is a correct interpretation if you get a rejection because the executor was shut down, for example. I do understand things aren't that black-and-white, but it is at least interesting that it isn't a checked exception.)
"Do whatever you do for the Executor that is rejecting." If you start disabled, that's fine. Don't have the executor throw then. And then if the limit is changed you start failing new RPCs until enough old RPCs complete that you can accept new ones. |
Thanks for your answer.
Totally agree on this.
Yes, I think the same. There is no way to make sure that the "closing" tasks are executed independently of the "user" tasks, right?? (Current state of the code base) What we would need to achieve is that the closing tasks were executed. We saw a "cancellationExecutor" in "ServerImpl" code but, aside it is the same as the "non cancellation" executor (but non wrapped), I think it does not run all the "closing" tasks, only certain types of them. This is only to confirm with you, because you know deeply the code (and I don't 😢 ) Once this is confirmed, I think we'll adopt the approximation you advised, to "never reject" a task that is sent to the grpc executor to ensure all the tasks ("closing" and "non-closing" related) are run. Thanks in advance for this final clarification. |
Correct. Because all the tasks that matter here are user tasks, which call the application-provided listener.
Correct. And that's really the core of this issue. To be honest, one of the things holding this up is whatever solution we decide here, we're probably stuck with it forever. Adding an executor tends to be disruptive.
|
This issue seems to have been ongoing for 9 years now. Are there any new developments or goals? |
direct executor lets RuntimeExceptions pass through the call stack. We need to defend against that in places we would permit direct executor. Even without direct executor, execute can throw with rejected exception, so it is really a case we should handle.
The text was updated successfully, but these errors were encountered: