Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few random threads reporting high CPU usage #4453

Closed
dhruvil-93 opened this issue Dec 31, 2019 · 8 comments
Closed

Few random threads reporting high CPU usage #4453

dhruvil-93 opened this issue Dec 31, 2019 · 8 comments
Labels
Stale For auto-closed stale issues and pull requests

Comments

@dhruvil-93
Copy link

Jetty version
9.4.22.v20191022 to 9.4.25.v20191220

Java version
Oracle Java 8u65, Oracle Java 8u231, Amazon Corretto Java 8u232

OS type/version
Host: Amazon Linux 2 (Linux ip-xxx-xxx-xxx-xxx 4.14.154-99.181.amzn1.x86_64 #1 SMP Sat Nov 16 01:38:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux)
Docker: Debian:8.11

Description
I have a web service packaged as war and deployed using Jetty Server.
Recently I migrated from 9.4.13.v20181111 to 9.4.22.v20191022 and observed that the CPU usage is consistently increasing without dropping down.

Sample top -H output:

top - 08:53:19 up 19:48,  0 users,  load average: 6.56, 6.93, 7.01
Tasks: 6141 total,   7 running, 6024 sleeping,   0 stopped,   0 zombie
Cpu(s): 23.4%us, 10.6%sy,  0.0%ni, 64.3%id,  0.0%wa,  0.0%hi,  1.7%si,  0.0%st
Mem:  31892176k total, 29186560k used,  2705616k free,   291108k buffers
Swap:        0k total,        0k used,        0k free,  1993760k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20860 app        20   0 29.5g  20g  30m R 36.2 68.1 105:51.14 pool-5-thread-3
19176 app        20   0 29.5g  20g  30m S 34.2 68.1 185:50.41 pool-5-thread-2
21735 app        20   0 29.5g  20g  30m S 14.9 68.1   0:52.42 qtp2088051243-7
31489 app        20   0 29.5g  20g  30m R 14.2 68.1   0:46.75 qtp2088051243-8
15891 app        20   0 29.5g  20g  30m S 13.9 68.1   1:00.19 qtp2088051243-7

I use javax.servlet.AsyncContext to process requests asynchronously and submit it as Runnable to a thread pool.
The work done is very much simple. The application receives data over HTTP and puts to MQ for further processing.
But with time, few threads report high CPU usage. On analyzing the application using Visual VM, it can be seen that the threads reporting high CPU in top are continuously in WAITING (Park) state. Pasting below the Thread Dump for reference.

pool-5-thread-3
priority:5 - threadId:0x00007f13b0033000 - nativeId:0x517c - nativeId (decimal):20860 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000037ca83da0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

pool-5-thread-2
priority:5 - threadId:0x00007f13a0020800 - nativeId:0x4ae8 - nativeId (decimal):19176 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000037ca83da0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Upon tracing the system calls using the strace utility, I see that these threads are performing the following syscalls continuously -

strace -p 20860 -C -T -e trace=read,write,open,close,fork,connect,accept,stat,ioctl,mmap,brk
Process 20860 attached
write(1784, "\1", 1)                    = 1 <0.001470>
write(1784, "\1", 1)                    = 1 <0.001551>
write(1784, "\1", 1)                    = 1 <0.001331>
write(1784, "\1", 1)                    = 1 <0.001371>
write(1784, "\1", 1)                    = 1 <0.001313>
write(1784, "\1", 1)                    = 1 <0.001269>
write(1784, "\1", 1)                    = 1 <0.001437>
write(1784, "\1", 1)                    = 1 <0.001619>
write(1784, "\1", 1)                    = 1 <0.001433>
write(1784, "\1", 1)                    = 1 <0.001377>
write(1784, "\1", 1)                    = 1 <0.001373>
write(1784, "\1", 1)                    = 1 <0.001335>
write(1784, "\1", 1)                    = 1 <0.001316>
write(1784, "\1", 1)                    = 1 <0.001415>
write(1784, "\1", 1)                    = 1 <0.001371>
write(1784, "\1", 1)                    = 1 <0.001655>
write(1784, "\1", 1)                    = 1 <0.001352>
write(1784, "\1", 1)                    = 1 <0.001230>
write(1784, "\1", 1)                    = 1 <0.001277>
write(1784, "\1", 1)                    = 1 <0.001479>
write(1784, "\1", 1)                    = 1 <0.001993>
write(1784, "\1", 1)                    = 1 <0.001272>
write(1784, "\1", 1)                    = 1 <0.001291>
write(1784, "\1", 1)                    = 1 <0.001189>
write(1784, "\1", 1)                    = 1 <0.001172>
write(1784, "\1", 1)                    = 1 <0.001285>
write(1784, "\1", 1)                    = 1 <0.001296>
write(1784, "\1", 1)                    = 1 <0.001243>
write(1784, "\1", 1)                    = 1 <0.001180>
^CProcess 20860 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.028411         980        29           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.028411                    29           total
strace -p 19176 -C -T -e trace=read,write,open,close,fork,connect,accept,stat,ioctl,mmap,brk
Process 19176 attached
write(1784, "\1", 1)                    = 1 <0.001438>
write(1784, "\1", 1)                    = 1 <0.001421>
write(1784, "\1", 1)                    = 1 <0.001221>
write(1784, "\1", 1)                    = 1 <0.001381>
write(1784, "\1", 1)                    = 1 <0.001225>
write(1784, "\1", 1)                    = 1 <0.001489>
write(1784, "\1", 1)                    = 1 <0.001573>
write(1784, "\1", 1)                    = 1 <0.001427>
write(1784, "\1", 1)                    = 1 <0.001332>
write(1784, "\1", 1)                    = 1 <0.001385>
write(1784, "\1", 1)                    = 1 <0.001441>
write(1784, "\1", 1)                    = 1 <0.001341>
write(1784, "\1", 1)                    = 1 <0.001207>
write(1784, "\1", 1)                    = 1 <0.001243>
write(1784, "\1", 1)                    = 1 <0.001219>
write(1784, "\1", 1)                    = 1 <0.001399>
write(1784, "\1", 1)                    = 1 <0.001313>
^CProcess 19176 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.015642         920        17           write
------ ----------- ----------- --------- --------- ----------------
100.00    0.015642                    17           total

Both these threads are performing the write operation on the same file descriptor.

ls -l /proc/20860/fd/1784
l-wx------. 1 app app 64 Dec 31 09:06 /proc/20860/fd/1784 -> pipe:[66942]
ls -l /proc/19176/fd/1784
l-wx------. 1 app app 64 Dec 31 08:25 /proc/19176/fd/1784 -> pipe:[66942]

Upon examining the pipe using lsof ,the output is -

lsof | grep 66942
java      14880       app  100r     FIFO               0,11       0t0      66942 pipe
java      14880       app 1784w     FIFO               0,11       0t0      66942 pipe

Here 14880 is the main java process running.

I was earlier using 9.4.13.v20181111 without any such issue, but am able to reproduce the same with all releases from 9.4.22.v20191022 to 9.4.25.v20191220.

I also tried with an multiple releases of Java and still the issue persists but goes away as soon as I switch to 9.4.13.v20181111.

Please note me know if any additional information is required.

@joakime
Copy link
Contributor

joakime commented Dec 31, 2019

The threads with names pool-#-thread-# are not Jetty threads, we don't name our threads that way.
Find out what library is creating those threads.

However, the ones with qtp########-# are Jetty threads, those are managed by Jetty's Queued Thread Pool.

While it is possible to use standard Java thread executors with various Jetty components it's not recommended for any system with load.

Also, using Jetty 9.4.13 is not recommended (as it's subject to multiple overlapping security issues now) - https://www.eclipse.org/jetty/security-reports.html

You mention pipes and whatnot, are you using unixsocket or conscrypt perhaps? If so, did you upgrade those components as well when you updated to newer versions of Jetty?
If you are not using those technologies (unixsocket or conscrypt) then the pipes are further evidence of non-jetty behaviors.

@dhruvil-93
Copy link
Author

While handling requests, we use the javax.servlet.AsyncContext to free up the jetty worker threads and process asynchronously and write response to javax.servlet.http.HttpServletResponse.

Rest assured, all the other components remain unchanged other than the Jetty libraries (client, http, io, server, servlet, util) revision.

I added the details regarding the threads and system calls and the work it was doing when the issue is faced for additional information in case it could help.

@sbordet
Copy link
Contributor

sbordet commented Jan 3, 2020

@dhruvil-93 I don't think VisualVM reports correctly which thread is consuming CPU. I recommend you use JMC to track CPU usage.

Are you setting a timeout on AsyncContext? If so, is the timeout expiring?
Do you use AsyncListener? If so, do you implement onError()?
Do you use Servlet async I/O?
Is the Jetty connector idle timeout expiring?

We need DEBUG logs to understand what's going on, as there is not enough evidence that this is a Jetty issue - the stack traces just show some thread idle in an external pool.

@dhruvil-93
Copy link
Author

dhruvil-93 commented Jan 6, 2020

Yes, I've set timeout on AsyncContext of 10 seconds and it is working.
Also, I do implement the onError() along with onComplete(), onTimeout(), onStartAsync() (onStartAsync() is an empty method in my case).
I don't use Servlet async I/O. I write to HttpServletResponse.getWriter()

This is a production system and hence DEBUG logs are disabled hence won't be able to share DEBUG logs.

Also, JMC requires to be on Oracle Commercial license for use in production environment and that is something I would prefer to avoid.

@sbordet
Copy link
Contributor

sbordet commented Jan 6, 2020

@dhruvil-93 we don't have enough information to help you. It could be that in previous Jetty versions we did not call correctly AsyncListener methods and now we do, so the behavior changed.
Without further evidence it's difficult to tell.

If you can enable DEBUG logs momentarily (say for 1 minute) it would be possible to understand more what's going on.

If you implement AsyncListener methods, I recommend that you review their implementation, as the Servlet Specification requires them to be implemented in certain ways and we must follow the specification (from previous Jetty versions to newer Jetty versions we may have fixed bugs in this area to match more closely with the specification).

Would be great to know what you do in AsyncListener methods.

Can you setup a simple reproducible test project that we can try ourselves?

@lorban
Copy link
Contributor

lorban commented Jan 13, 2020

@dhruvil-93 just FYI: Azul backported the Flight Recorder in their recent builds of the free Zulu JVM: https://www.azul.com/downloads/zulu-community/

With it, you can use Zulu Mission Control (https://www.azul.com/products/zulu-mission-control/) and enjoy JFR without using the Oracle JVM.

@stale
Copy link

stale bot commented Jan 16, 2021

This issue has been automatically marked as stale because it has been a full year without activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale For auto-closed stale issues and pull requests label Jan 16, 2021
@stale
Copy link

stale bot commented Jun 9, 2021

This issue has been closed due to it having no activity.

@stale stale bot closed this as completed Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale For auto-closed stale issues and pull requests
Projects
None yet
Development

No branches or pull requests

4 participants