100% CPU usage in Selector using Jetty on Linux #3022

prasadlvi · 2018-10-23T09:39:22Z

I am running embedded Jetty 9.4.7.v20170914. I observed 100% CPU usage when there is no load on the server. After noticing #2205 I upgraded to latest Jetty 9.4.12.v20180830. But observed result was the same.

Environment

java version "1.8.0_181"
Linux 3.16.38-tinycore64 #1 SMP Thu Nov 17 15:37:06 UTC 2016 x86_64 GNU/Linux

18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 waiting with 0 keys
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken with none selected
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken up from select, 0/0/0 selected
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 processing 0 keys, 0 updates
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - updateable 0
18-10-23 09:07:28,711 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - updates 0

18-10-23 09:07:28,711 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 woken with none selected
18-10-23 09:07:28,711 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 woken up from select, 0/0/4 selected
18-10-23 09:07:28,711 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 processing 0 keys, 1 updates
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - updateable 1
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - update org.eclipse.jetty.io.ChannelEndPoint$1@70d50d97
18-10-23 09:07:28,712 [ChannelEndPoint          ] [Jetty-93                    ] DEBUG - Key interests updated 0 -> 1 on SocketChannelEndPoint@626064f5{/192.168.1.179:49327<->/10.0.0.230:443,OPEN,fill=FI,flush=-,to=12/120000}{io=1/1,kio=1,kro=1}->SslConnection@72c821e0{NOT_HANDSHAKING,eio=-1/-1,di=-1,fill=INTERESTED,flush=IDLE}~>DecryptedEndPoint@3d398250{/192.168.1.179:49327<->/10.0.0.230:443,OPEN,fill=FI,flush=-,to=13/120000}=>HttpConnection@56f90049[p=HttpParser{s=START,0 of -1},g=HttpGenerator@3ef0ec53{s=START}]=>HttpChannelOverHttp@d841d4c{r=25,c=false,a=IDLE,uri=null,age=0}
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - updates 0
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 waiting with 4 keys

I found the CPU consuming threads using top -H -o %CPU -p <process id>

When looking at the thread dump it seems same two threads in the above log cosume most CPU.

"qtp1655055736-110" #110 prio=5 os_prio=0 tid=0x00007f090c01f800 nid=0x4af7 runnable [0x00007f08e9cde000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000000d27e0040> (a sun.nio.ch.Util$3)
	- locked <0x00000000d27e0030> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000d27dfe08> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:422)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:359)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

"Jetty-93" #93 daemon prio=5 os_prio=0 tid=0x00007f08f4374800 nid=0x49e4 runnable [0x00007f09125cb000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000000d0a2f2b8> (a sun.nio.ch.Util$3)
	- locked <0x00000000d0a2f2a8> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000d0a2f260> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:422)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:359)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

Does this looks like a bus in Jetty? Or could it be a problem with my application?

The text was updated successfully, but these errors were encountered:

sbordet · 2018-10-23T09:59:18Z

If the selector is in a loop, you should have a ton of logs like the first snippet you posted, in particular like:

18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken up from select, 0/0/0 selected

Can you please configure the Jetty thread pool in this way:

QueuedThreadPool threadPool = ...;
threadPool.setDetailedDump(true);

and take a server dump when you see 100% CPU?

prasadlvi · 2018-10-24T04:19:20Z

I am working on to get the server dump.

But I got a thread dump with threadPool.setDetailedDump(true);

Yes, when CPU goes to 100% I get the above log entries thousands of times continuously. Like in an infinite loop. Thread with nid=0x3ec5 was consuming 100% of CPU.

Can continuous log entries be considered itself a sign of bug? Or can there be occasions Jetty can output thousands of debug logs like this but still work fine?

18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken up from select, 0/0/0 selected

thread_dump.txt

sbordet · 2018-10-24T10:18:44Z

The selector should not be woken up if there is nothing to do.

After #2755 (fixed in 9.4.12) we are also taking care of interrupted threads that continuously wake up the selector.

We need a server dump and possibly the DEBUG logs.
You have many sources of usage of NIO, we also need to be sure that the threads that are spinning are Jetty threads (not those from Mina or DNS, etc.).

Can you reproduce this issue reliably?

hpmtissera · 2018-10-25T09:04:02Z

Yes, I can reproduce the issue. But only in Tinycore Virtual Appliance with 2 cores. I couldn't reproduce it with more than two cores or in my Mac.

I have attached the DEBUG logs and Thread Dump.

I wonder whether this can be related to the following bug report.

thread-dump.txt

joakime · 2018-10-25T15:07:28Z

OS Info: Linux 3.16.38-tinycore64, amd64
         processors: 2, architecture: amd64-64

Can you test with a recent (not-EOL) Linux 4.x kernel?
There's far too many bugs/quirks with the ancient Linux 3.x kernel with regards to JVM behaviors.

prasadlvi · 2018-10-29T09:09:05Z

I tried with Linux Kernel 4.19 in CentOS 7. Got the same issue. I attached the full log file. Below is an extract.

18-10-29 04:38:56,106 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,106 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop woken up from select, 0/1 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop woken up from select, 0/1 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop woken up from select, 0/1 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected

sbordet · 2018-10-29T09:14:04Z

@prasadlvi, the log only shows the loop, not what happened before the loop started, which is the critical information.
Also, if you can reproduce, and you take a server dump - not a thread dump.

prasadlvi · 2018-10-29T09:19:39Z

Yes, I understand. I'll attach the necessary information as soon as possible.

prasadlvi · 2018-10-31T10:21:41Z

I attached the server dump and the full log file.

jetty-server-dump.txt
server.log

sbordet · 2018-10-31T12:30:34Z

@prasadlvi seems you can reproduce easily, so can you try a recent Linux kernel and JDK 11 (since in JDK 11 the NIO was basically rewritten)?

sbordet · 2018-10-31T12:55:14Z

The logs show this:

Thread qtp2024612739-89 was waiting in select() then wakes up for reading.
Thread qtp2024612739-93 takes over selection while -89 reads.
Thread -93 sets the interest to 0 and goes back to select().
Thread -89 reads -1, writes the TLS close_notify, then desires read interest, so a task is submitted to the selector.
Thread -89 initiates an orderly close().
Thread -93 wakes up, sets the read interest to 1, then goes back to select().
Thread -93 immediately wakes up for read.
Thread -89 completes the close() so now the SelectionKey is invalid.
Thread -93 is processing the SelectionKey, detects that is invalid and calls close() which at this point is a no-op.
Thread -93 goes back to select() and reports still having 1 key.
Thread -93 immediately wakes up, but now reports having 0 keys - the spin loop begins.

I cannot see anything wrong in the way Jetty handles this particular case.
There is a concurrent close, but those happen all the times, and a completely empty selector should not wake up, so this looks to me a JDK bug.

@prasadlvi it's even more important if you can test with JDK 11.
If still happens we definitely need to put a workaround for this behavior.

brettwooldridge · 2018-11-01T05:07:38Z

@sbordet Co-worker of @prasadlvi here, we have tried on several 3.x and 4.x kernels.

sbordet · 2018-11-01T10:28:53Z

@brettwooldridge

we have tried on several 3.x and 4.x kernels

And all of them show the spin loop?

Did you manage to try JDK 11?

prasadlvi · 2018-11-02T07:36:17Z

I tried with JDK 11 and Linux 4.19.0 (jetty-9.4.12.v20180830) and was not able to reproduce the issue.

sbordet · 2018-11-02T08:30:40Z

@gregw so this seems a JDK issue rather than a OS issue.

We would need to put in the selector loop a counter, detect when it spin loops (with zero keys?), destroy and rebuild the selector, but we need to exclude the selector from being chosen when a new connection is accepted - we need a lock 😞

gregw · 2018-11-02T10:46:58Z

Or we just System.exit(1) and let the server be restarted externally?

sbordet · 2018-11-02T10:53:21Z

@gregw we can't System.exit(), there are other selectors that work fine.

gregw · 2018-11-02T15:44:43Z

I think that random JVM failures are like out of memory exceptions.... who knows what else is broken in the JVM, so best to exit and try again.
But then again, I'm not really keen on filling jetty with code that catches JVM issues and working around them.

Perhaps if a selector wakes up for no reason (0 keys and no tasks), then we can sleep for a while (say 50ms). That would stop the busy loop and we'd recover in 50ms if the selector comes good.... would be interesting to learn if the selector ever comes good?

brettwooldridge · 2018-11-04T04:20:25Z

Do not ever, under any circumstance, call System.exit(). You have no idea what the impact of that could be. Airplanes could crash, missiles could launch, respirators could stop.

Sleeping 50ms seems perfectly reasonable. Logging something ominous every minute that the condition persists is also a good idea, so as to not sweep the problem under the rug.

gregw · 2018-11-04T09:09:36Z

@brettwooldridge I think System.exit() can be the right thing to do for some deployments, but it is hard for us to consider doing so for all deployments. However a busy loop in the selector isn't going to be much better for the airline/missile/respirator.

So primary action should be to triage the JVM bug and make every effort to get that fixed. Secondary action would be to either find a change in behaviour in jetty to avoid triggering the bug... or perhaps inserting a pause to mitigate the impact of trying to continue on in the face of the bug.

joakime · 2018-11-05T17:56:24Z

I'm -1 on using System.exit() as it has no provisions for cleanup and proper shutdown behaviors, which will cause far more drastic issues as a side effect.

gregw · 2018-11-06T07:44:20Z

Sorry I think my "suggestion" of System.exit() has derailed the discussion here. What I should have said it that I'm -1 on writing code in jetty to detect a JVM bug and to restart just one selector in one connector if it is detected, specially if that check and restart requires us to put a new contended lock on the connector. So I was just saying that doing an exit to restart the server would be preferable... but in reality neither is a good option.

So let's triage the issue some more and see if we can quantify exactly when it happens and if it can be raised with open/jdk. Only then can we work out if it can be detected in the code and some non-obnoxious work around put in place... but if the problem is not happening on current JVMs, then perhaps there is nothing we need to do as the problem is already fixed?

joakime · 2019-02-17T14:54:11Z

Last update pointed to a JVM and/or kernel issue.
Closing as not a jetty specific issue.

XuHe1 · 2019-09-19T07:57:54Z

so, how to solve?

sbordet · 2019-09-19T13:32:32Z

@XuHe1 please open a new issue and be sure to add a server dump and a proof that you have a spin loop in the selector, along with exact Jetty version, OS, JDK version, etc.

bhuvangu · 2020-09-10T05:24:15Z

@prasadlvi how did you guys manage this behavior?

prasadlvi · 2020-09-10T05:26:18Z

@bhuvangu It was fixed after updating to Java 11.

Vighnesh123 mentioned this issue Jan 14, 2019

20%-30% CPU usage in Jetty threads on Linux when there is no load. #3257

Closed

joakime closed this as completed Feb 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

100% CPU usage in Selector using Jetty on Linux #3022

100% CPU usage in Selector using Jetty on Linux #3022

prasadlvi commented Oct 23, 2018 •

edited

Loading

sbordet commented Oct 23, 2018

prasadlvi commented Oct 24, 2018 •

edited

Loading

sbordet commented Oct 24, 2018

hpmtissera commented Oct 25, 2018 •

edited

Loading

joakime commented Oct 25, 2018

prasadlvi commented Oct 29, 2018 •

edited

Loading

sbordet commented Oct 29, 2018

prasadlvi commented Oct 29, 2018

prasadlvi commented Oct 31, 2018

sbordet commented Oct 31, 2018

sbordet commented Oct 31, 2018 •

edited

Loading

brettwooldridge commented Nov 1, 2018

sbordet commented Nov 1, 2018

prasadlvi commented Nov 2, 2018

sbordet commented Nov 2, 2018

gregw commented Nov 2, 2018

sbordet commented Nov 2, 2018

gregw commented Nov 2, 2018

brettwooldridge commented Nov 4, 2018

gregw commented Nov 4, 2018

joakime commented Nov 5, 2018

gregw commented Nov 6, 2018

joakime commented Feb 17, 2019

XuHe1 commented Sep 19, 2019

sbordet commented Sep 19, 2019

bhuvangu commented Sep 10, 2020

prasadlvi commented Sep 10, 2020 •

edited

Loading

100% CPU usage in Selector using Jetty on Linux #3022

100% CPU usage in Selector using Jetty on Linux #3022

Comments

prasadlvi commented Oct 23, 2018 • edited Loading

sbordet commented Oct 23, 2018

prasadlvi commented Oct 24, 2018 • edited Loading

sbordet commented Oct 24, 2018

hpmtissera commented Oct 25, 2018 • edited Loading

joakime commented Oct 25, 2018

prasadlvi commented Oct 29, 2018 • edited Loading

sbordet commented Oct 29, 2018

prasadlvi commented Oct 29, 2018

prasadlvi commented Oct 31, 2018

sbordet commented Oct 31, 2018

sbordet commented Oct 31, 2018 • edited Loading

brettwooldridge commented Nov 1, 2018

sbordet commented Nov 1, 2018

prasadlvi commented Nov 2, 2018

sbordet commented Nov 2, 2018

gregw commented Nov 2, 2018

sbordet commented Nov 2, 2018

gregw commented Nov 2, 2018

brettwooldridge commented Nov 4, 2018

gregw commented Nov 4, 2018

joakime commented Nov 5, 2018

gregw commented Nov 6, 2018

joakime commented Feb 17, 2019

XuHe1 commented Sep 19, 2019

sbordet commented Sep 19, 2019

bhuvangu commented Sep 10, 2020

prasadlvi commented Sep 10, 2020 • edited Loading

prasadlvi commented Oct 23, 2018 •

edited

Loading

prasadlvi commented Oct 24, 2018 •

edited

Loading

hpmtissera commented Oct 25, 2018 •

edited

Loading

prasadlvi commented Oct 29, 2018 •

edited

Loading

sbordet commented Oct 31, 2018 •

edited

Loading

prasadlvi commented Sep 10, 2020 •

edited

Loading