-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excess contention in ExecutorService #2118
Comments
@ejona86 Suggested using ForkJoinPool. While this is not possible in general due to being limited to Java 6 APIs, running a local server/ client with this does in fact reduce the contention. |
And it's unknown how much better ForkJoinPool does when receiving runnables from threads outside of the pool, but it seems worth a check. |
A spot check shows the contention gone, but QPS plummets to half (86kqps -> 42kqps). Run with: |
Hmmm, running on a 32x thread machine it does speed up a lot. Maybe there is a threshold. |
On how many cores and for how long did this client run? Also, I believe those numbers might be cumulative over all threads. So if say 32 threads are trying to add to the queue concurrently and one gets the lock for 100 micros, then that means 3.1millis of contention. |
@buchgr The numbers are cumulative. It is a 32core client talking to a 32core server. I can't recall if I looked at the client or the server, but since they both use the executor in the same way it doesn't matter which. The contention profiler records how long a thread waits on a lock to become available and also acquires it. (So the thread that holds the lock and releases it will not be recorded). Running last night with FJP showed a 3x perf jump (~460kqps) so this contention matters a lot in high qps cases. |
It might be worth mentioning that Netty backported the FJP, so that it can be used with Java 1.6 https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/chmv8/ForkJoinPool.java We could check if Netty's FJP is there and if so use it? |
@buchgr FJP depends heavily on the number of available cores available for use. For example, running on a 32core machine, but under 50% load from other processes, FJP does worse at parallelism level 32 than 16. Picking the number to be too high or too low causes painful performance swings, so it would be hard to set it as a default. Also, blocking calls are going to make it act poorly. Only Future / Async Clients (and servers) really benefit from it. It's a good optimization, but only after recognizing it as applicable to the use case. |
When profiling a client with 200K active RPCs, there is a point of contention on the Executor. Each RPC gets its own SerializingExecutor, which executes work on an underlying executor. Currently, that executor is ThreadPoolExecutor in almost all cases, which itself has a BlockingQueue. That queue is heavily contended showing up at minutes of wasted time:
An idea to fix this is to have some sort of striping executor in order to prevent this contention from happening.
The text was updated successfully, but these errors were encountered: