Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocket hangs in blockingWrite #2061

Closed
dreis2211 opened this issue Dec 14, 2017 · 107 comments · Fixed by #3830
Closed

WebSocket hangs in blockingWrite #2061

dreis2211 opened this issue Dec 14, 2017 · 107 comments · Fixed by #3830
Assignees
Labels
Bug For general bugs on Jetty side
Milestone

Comments

@dreis2211
Copy link
Contributor

dreis2211 commented Dec 14, 2017

Hi,

it was suggested to me by @joakime to open a new issue for #272 as it still occurs on 9.4.8 - with a slight tendency to occur more often now (which might be just bad luck on our end).

Unfortunately, I can't say anything new about the issue. It still appears to be random (regardless of load for example) that threads end up in WAITING state and only a server restart helps to solve the issue.

"clientOutboundChannel-23" #144 prio=5 os_prio=0 tid=0x00007fceb0001000 nid=0x28f7 waiting on condition [0x00007fce59acc000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006ca773c88> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:210)
        at org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:90)
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(WebSocketRemoteEndpoint.java:107)
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(WebSocketRemoteEndpoint.java:393)
        at org.springframework.web.socket.adapter.jetty.JettyWebSocketSession.sendTextMessage(JettyWebSocketSession.java:273)
        at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:101)
        at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.tryFlushMessageBuffer(ConcurrentWebSocketSessionDecorator.java:132)
        at org.springframework.web.socket.handler.ConcurrentWebSocketSessionDecorator.sendMessage(ConcurrentWebSocketSessionDecorator.java:104)
        at org.springframework.web.socket.messaging.StompSubProtocolHandler.sendToClient(StompSubProtocolHandler.java:470)
        at org.springframework.web.socket.messaging.StompSubProtocolHandler.handleMessageToClient(StompSubProtocolHandler.java:457)
        at org.springframework.web.socket.messaging.SubProtocolWebSocketHandler.handleMessage(SubProtocolWebSocketHandler.java:338)
        at org.springframework.messaging.support.ExecutorSubscribableChannel$SendTask.run(ExecutorSubscribableChannel.java:135)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

As this is affecting our production servers, I'd appreciate if this is investigated again. I'd also appreciate any workaround that doesn't suggest a server restart.

Cheers,
Christoph

@joakime joakime added the Bug For general bugs on Jetty side label Dec 14, 2017
@joakime joakime self-assigned this Dec 14, 2017
@joakime
Copy link
Contributor

joakime commented Dec 14, 2017

What are the various idle timeouts you have configured?
Can you tell us what JVM version? and on what OS you are experiencing these problems?

@joakime
Copy link
Contributor

joakime commented Dec 14, 2017

OP is using stomp + spring. What version of stomp/spring are you using?

No reports of similar problems from cometd community, or google app engine community. (yet?)

@dreis2211
Copy link
Contributor Author

dreis2211 commented Dec 14, 2017

I'm running JDK 8u121 on Debian Jessie with Spring-Boot 1.5.9 (Spring Framework 4.3.13). As far as the idle-timeout concerns (and I understood SharedBlockingCallback correctly) it indefinetily blocks due to -1 as a timeout for WebSockets, hence the WAITING instead of TIMED_WAITING status of the threads. The idle timeout for the QueuedThreadPool is on 5000ms seconds, though.

@joakime
Copy link
Contributor

joakime commented Jan 8, 2018

The QueuedThreadPool idleTimeout will not take effect on the actively being used thread.
The QueuedThreadPool idleTimeout is for idle timeout of threads returned to the pool and have not been allocated/requested for another process. These QTP threads are then removed from the pool entirely on their idle timeout.

The -1 idle-timeout on websocket is an indefinite idle timeout and is operating properly.
Perhaps you really want an idle timeout on your websockets, if so, set the idle timeout on the websocket session (if you want it to apply to a specific connection), or the websocket container (if you want it to apply to all new connections).

@dreis2211
Copy link
Contributor Author

Hi @joakime. I was just answering your question for the various idle timeouts...

As the previous thread/issue #272 already mentions, many argue the -1 as an idle timeout in SharedBlockingCallback (or BlockingWriteCallback to be more specific) is somewhat dangerous as it leads to indefinitely blocking the threads which one cannot recover from. Hence the many workarounds to recompile Jetty with an overridden getIdleTimeout() on BlockingWriteCallback.

Correct me if I'm wrong, but your suggestion would not solve this problem. And even if it does, it would be just a different workaround instead of actually solving the root-cause (which I feel we're not getting closer to find unfortunately).

Speaking of the root-cause: Though we have no clear evidence, the behavior described in this issue and #272 seems to be caused by slow clients for us (we have many mobile users that might have a slow connection). Maybe this helps you to come up with a test scenario for the root-cause.

Cheers

@joakime
Copy link
Contributor

joakime commented Jan 8, 2018

I'm curious what the OS thinks of the connection.
Is it still live? (tcpdump information should be tell us)

Having an idle timeout of -1 isn't normal. On the distribution its defaulted to 60,000 ms, and embedded usage seems to default to 30,000 ms.
In fact, the only way I've been able to have a -1 idle timeout is to manually set it to -1.

The process to determine idle timeout seems to be ...

  • If the @WebSocket annotation has no idle timeout specified, then the WebSocket session's idle timeout is used.
  • If the WebSocket session has no idle timeout specified, then the WebSocket container's version is used.
  • If the WebSocket container has no idle timeout specified, then the Connector's idle timeout is used.
  • The standard Connector has an idle timeout of 30,000 ms

But even at -1 idle timeout, the active websocket connection with an active message write, sits in SharedBlockingCallback$Blocker.block() if attempting to write to a congested connection (one where the remote side isn't reading). It will sit like this indefinitely until the connection is closed or the congestion clears up. This is a sane setup for many websocket user. Especially connections used between servers in a data center.

If I set a websocket container max idle timeout then even that scenario idle times out.

I can sort-of reproduce the SharedBlockingCallback concern, but its only under a specific (working as intended) scenario ...

  • websocket idle timeout is -1 (unlimited)
  • the connection is active
  • the remote is not reading
  • the local is attempting write
  • the connection is congested

This is normal behavior for idle timeouts of -1.

@elekktrisch
Copy link

elekktrisch commented Jan 15, 2018

Here is a way to reproduce it consistently (even though I am not sure it actually is the exact same issue):

  • Register a few Transparent Proxy Beans in the Spring-Boot Configuration (the more of them, the more likely the issue occurs)
  @Bean
  public ServletRegistrationBean transparentProxyServlet() throws MalformedURLException {
    ServletRegistrationBean servletRegistrationBean = new ServletRegistrationBean(new ProxyServlet.Transparent(), contextPath);
    servletRegistrationBean.setName("testProxy");
    servletRegistrationBean.addInitParameter("proxyTo", new URL("http", serverHost, serverPort).toString());

    servletRegistrationBean.addInitParameter("prefix", "/foo");

    return servletRegistrationBean;
  }
  • Start on a Server with a lot of Hardware Resources (I encountered the issue after I added 5 transparent proxies. But only on the Server with 80 CPU Cores. It did not occur on the Server which only had 24 CPU Cores with the exact same Source Code)

My workaround for the time being was to allow for more Threads for Jetty:

 @Bean
  public JettyEmbeddedServletContainerFactory jettyEmbeddedServletContainerFactory(
      @Value("${server.port}") final String port,
      @Value("${jetty.threadPool.maxThreads:1000}") final String maxThreads,
      @Value("${jetty.threadPool.minThreads:8}") final String minThreads,
      @Value("${jetty.threadPool.idleTimeout:60000}") final String idleTimeout
  ) {
    final JettyEmbeddedServletContainerFactory factory = new JettyEmbeddedServletContainerFactory(Integer.valueOf(port));
    factory.addServerCustomizers((JettyServerCustomizer) server -> {
      final QueuedThreadPool threadPool = server.getBean(QueuedThreadPool.class);
      threadPool.setMaxThreads(Integer.valueOf(maxThreads));
      threadPool.setMinThreads(Integer.valueOf(minThreads));
      threadPool.setIdleTimeout(Integer.valueOf(idleTimeout));
    });

    return factory;
  }

At runtime, Jetty actually allocated over 300 Threads even before any traffic was on the server. Since 200 is the default max, Jetty probably tried to wait for more Threads available before serving any responses.

Might have to do with this: spring-projects/spring-boot#8917

@joakime
Copy link
Contributor

joakime commented Jan 15, 2018

Ah! a proxy.
How do we know that the problem lies in the websocket side? or the proxy side?
A standard HTTP proxy is a common fault with websocket.
Only WebSocket/Upgrade aware proxies are supported (they recognize the upgrade and no longer cache/buffer the resulting upgraded connection).
Prior to WebSocket there were not many HTTP upgraded connection types to worry about.
If you are using the Spring proxy, then that is a known to have issues with WebSocket (Netflix even wrote their own spring-proxy for project zuul to solve for this).
spring-cloud and spring-cloud-gateway projects also report many issues with WebSocket using spring's proxy.

Regarding threading configuration.
Keep in mind that allocated threads vs used threads is different.
The threadpool will pre-allocate a number of threads for its pool, as the allocation step is costly one.
But once allocated, the threads exist, are idle/waiting, and are being tracked by the pool.

@joakime
Copy link
Contributor

joakime commented Jan 15, 2018

@elekktrisch the spring-projects/spring-boot#8917 is about excessively low/insufficient thread pool configurations.
The configurations you are talking about in your recent comment are healthy and not in the realm of that specific spring-boot issue.

If you are using Jetty 9.4.8.v20171121 then you will see warning/error logging events telling you of insufficient threads configurations (which was recently added as a response to spring-projects/spring-boot#8917

See: Issue #1851 and https://github.com/eclipse/jetty.project/blob/jetty-9.4.8.v20171121/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ThreadPoolBudget.java#L137-L157

@dreis2211
Copy link
Contributor Author

We don't use Spring proxy facilities and don't have an invalid thread-configuration (at least we're not affected by the mentioned Spring-Boot issue).

What we do use though is nginx as a reverse proxy. Which to my knowledge is WebSocket/Upgrade aware.

@joakime
Copy link
Contributor

joakime commented Jan 15, 2018

nginx added websocket support mid 2014 with I believe nginx version 1.3.

Just make sure you turn off proxy buffering (a known problem with nginx's implementation)

proxy_buffering off;

Also make sure you set a valid proxy_read_timeout for your nginx configuration (do not use infinite values here).

@joakime
Copy link
Contributor

joakime commented Jan 15, 2018

Looking at the various reports of websocket support in nginx it seems that version 1.3.13 is the first version that the userbase reports as stable for websocket.

@dreis2211
Copy link
Contributor Author

Thanks. We already use 1.12.2 with proxy_buffering off;.

@joakime
Copy link
Contributor

joakime commented Jan 15, 2018

@dreis2211 what's your nginx proxy settings for ...

proxy_read_timeout 120s;
proxy_write_timeout 120s;

@joakime
Copy link
Contributor

joakime commented Jan 15, 2018

Also, depending on your version of nginx it might be proxy_send_timeout instead of proxy_write_timeout

@dreis2211
Copy link
Contributor Author

@joakime Both are not specified in our config, so they should use the default value of 60 seconds

@TheWizz
Copy link

TheWizz commented Jan 18, 2018

We've had this problem for some time. I've found a 100% sure way to reproduce it which may be of interest. First some background. We're currently running 3.9.19 under Spring Boot 1.4.7. Our application often has hundreds of mobile clients over websockets, based on Stomp and the simple broker built into spring. We had occasional problems under heavy load on our servers where the websocket traffic just stopped (while other "regular" requests continue to work, and the server stayed up). When this "zombie mode" occurred, the only remedy was to restart the server. We've scrutinized our code, and found nothing here causing it, so I began suspecting something else in the stack. Therefore I was very happy when I found this thread and the previous one #272, which seem to describe exactly the problem we're seeing. We had an older version of jetty before (9.2.16), and there were some hints along the way that this may have been "fixed" in 9.2.x, so we started by upgrading Spring Boot along with jetty to the versions mentioned above. That did NOT fix the problem. I then proceeded applying the work-around described in #272 ("overriding" the socket write timeout), and that seem to have fixed it for us. The sockets still hang on write, and if a lot of websockets "die" on us at the same time, this may still stave the thread pool for some time. But once those writes time out, the server comes back to normal again, instead of requiring a server restart. This was a huge improvement for us.

Now to the way to reproduce it. Unfortunately, all of this is in company code, so I can't share the code. But I can share the general idea.

We have tests written to simulate large numbers of clients, doing both regular requests as well as the websocket communication. A single nodejs instance can easily simulate several hundred users (with the exception of cellular network peculiarities). Since this is nodejs based, it's also easy to fire up a couple of Amazon EC2 instances to run really massive tests for an hour or so, and then shut them down.

Now, as long as everyone plays nice, and close their websocket connections in an orderly manner, everything works and is happy. It appears that the lock-up happens ONLY when sockets aren't closed but merely cease to communicate, as would happen if the phone in toe other end suddenly goes off grid, is turned off, battery dies, or similar.

The way I found I could simulate this situation is to run a test with, say, 100 simulated users on a separate machine on our network. I have the system in a state where it keeps sending data over the websocket to those 100 clients. I then just yank the Ethernet wire out of the machine running the test clients. This quickly causes any send buffers to fill up, and the sending threads to (presumably) block on some low level write call, very soon causing all threads in the pool to block. Our app uses the standard thread pool provided by Spring/Jetty here, which has a pool with twice the number of threads as the processor has cores, so we typically see 8 or 16 threads in the pool. That means that as long as we have more than this number of clients all "die" at about the same time, while data is being sent to those websockets, the server will almost instantly go into "zombie mode". Having a reproducible test like this quickly allowed us to try out various scenarios, and let us come to the conclusion that the timeout is a viable work-around for now.

Hopefully this will get fixed in jetty in some more permanent and elegant way, and perhapsmy ramblings here can allow you to at least get a reproducible test case to track it down. I don't feel sufficiently familiar with Jetty's innards to venture a fix myself.

-JM

@joakime
Copy link
Contributor

joakime commented Jan 18, 2018

We've tested Jetty 9.3.x and 9.4.x at 10k+ active websocket connections, with some connections going away (no traffic, just dead). If we have a reasonable Idle Timeout things work as expected.

Even in the infinite timeout scenario (Idle Timeout == -1) the dead network connections will eventually be cleaned up by the OS, dependent on network configuration, and the connection to Jetty will die and be cleaned up as well.

The only scenario we've been able to use to reproduce this reported issue is having a simple proxy with its own infinite timeout, and jetty configured for infinite timeout. Then the connection remains "live" between Jetty and the proxy, with the proxy itself not cleaning up the connection when it goes away. This behavior seems to be exacerbated with various proxy connection buffering configurations as well.

In summary so far (Jan 2018):

  • So far, only some stomp/spring users are the reporting this behavior with mobile clients. (many other heavy stomp + mobile users have no such issue, and other libraries that use Jetty WebSockets have not reported these behaviors on mobile clients as well)
  • The reported reproduction cases have not been provided to the Jetty developers yet.
  • If you have unreliable connections, use a sane idle timeout, everywhere. (your websockets, your connections, your OS networking configuration, your proxies, etc..)
  • If you have a high load scenario, the typical general advice is:
    • don't use a executor (threadpool) implementation/configuration with a fixed upper limit (use one that dynamically grows, applies low resource backpressure, and falls back to a more normal steady state when load subsides. Such as the QueuedThreadPool from Jetty)
    • don't use blocking send/writes. (this holds a thread)
    • use async send/writes. (only uses thread when read/write is possible)
    • don't use java.io streaming reads in websocket (each message causes a new thread)
    • don't use java.io streaming writes in websocket (each message causes a new thread)

If you want to help, fork the example project ... https://github.com/spring-guides/gs-messaging-stomp-websocket
Modify it to cause the issue being reported, and report back here with your forked version for us to test against.

@TheWizz
Copy link

TheWizz commented Jan 20, 2018

I can try putting together a reproducible case based on the "gs-messaging-stomp-websocket" example you suggest, combined with a nodejs-based multi-client simulator. That's what we use to run our own load testing.

However, before pursuing this, I'd like to make sure we indeed "use a sane idle timeout everywhere", as you suggest. Perhaps you could provide some pointers as to how/where timeouts are specified in Jetty and/or Spring for "your websockets" and "your connections". Any pointers related to timeouts in the "OS networking configuration" (assuming Linux OS) would be most appreciated too. After making sure we have those set to cope with unreliable clients (i.e., lots of mobile phones on cellular networks), I'll re-run our tests to see if I can still repro the lockup. If so, I'll do my best to put together a reproducible test case I can contribute.

-JM

@dreis2211
Copy link
Contributor Author

@TheWizz any news from your end about this ticket?

@TheWizz
Copy link

TheWizz commented Apr 23, 2018

No. We're using the work-around suggested by jonhill1977 in #272. This is really a "hack", since it involves replacing the org.eclipse.jetty.websocket.common.BlockingWriteCallback class with our own (as there seem to be no way to override this behavior from the "outside"). Doing so solved the problem for us, and we just moved on. A cleaner solution would of course be preferred.

@dreis2211
Copy link
Contributor Author

So no other "solution" than ours. But thanks for getting back.

@dreis2211
Copy link
Contributor Author

dreis2211 commented Apr 27, 2018

@joakime Triggered by #2491 I was thinking if the clients connecting to our servers use the fragments extension and if the two bugs might be connected. While they (unfortunately) don't use it they make use of the per-message-deflate extension. I don't know if that helps for this bug or if it's the same for the other reporters, but I thought this is some new information that I need to share for completeness reasons.

@joakime
Copy link
Contributor

joakime commented Apr 27, 2018

The Fragment Extension is very rarely used.
It's not even a formal spec extension.
Only the Autobahn lib and Jetty support it.
So you'll never see it from any other client implementation (such as a web browser).

If you suspect the permessage-deflate, disable it on the server side. (either by removing it form the websocket extension registry, or by not negotiating it during the handshake on the server side).

Option 1: If using WebSocketservlet (to unregister the permessage-deflate extension)

public static class MyWebSocketServlet extends WebSocketServlet
{
    @Override
    public void configure(WebSocketServletFactory factory)
    {
        factory.getExtensionFactory().unregister("permessage-deflate");

        // The rest of your processing here ...
    }
}

Option 2: If using WebSocketCreator (to remove just permessage-deflate)

public static class MyWebSocketCreator implements WebSocketCreator
{
    @Override
    public Object createWebSocket(ServletUpgradeRequest servletUpgradeRequest, ServletUpgradeResponse servletUpgradeResponse)
    {
        // Strip permessage-deflate negotiation
        servletUpgradeRequest.setExtensions(
                servletUpgradeRequest.getExtensions()
                        .stream()
                        .filter((extConfig) -> !extConfig.getName().equals("permessage-deflate"))
                        .collect(Collectors.toList())
        );

        // The rest of your processing here ...
    }
}

Option 3: If using JSR's ServerEndpointConfig.Configurator (to remove all extensions offered)

public static class MyServer_NoExtensionsConfigurator extends ServerEndpointConfig.Configurator
{
    @Override
    public List<Extension> getNegotiatedExtensions(List<Extension> installed, List<Extension> requested)
    {
        // Strip all offered extensions
        return Collections.emptyList();
    }
}

Option 4: If using JSR's ServerEndpointConfig.Configurator (to to filter permessage-deflate offered extension)

public static class MyServer_NoPerMessageConfigurator extends ServerEndpointConfig.Configurator
{
    @Override
    public List<Extension> getNegotiatedExtensions(List<Extension> installed, List<Extension> requested)
    {
        List<Extension> negotiated = new ArrayList<>();

        for (Extension offered : requested)
        {
            if (offered.getName().equalsIgnoreCase("permessage-deflate"))
            {
                // skip
                continue;
            }

            if (installed.stream().anyMatch((available) -> available.getName().equals(offered.getName())))
            {
                negotiated.add(offered);
            }
        }

        return negotiated;
    }
}

@TheWizz
Copy link

TheWizz commented Apr 27, 2018

We're definitely not using any uncommon options for our websockets. A way I found to repro it is to have a significant number (50 or so) of websockets running, and then "unplug" those clients rather than closing them gracefully. I do this by having a computer (runnig some NodeJS-based test code to excercise the websockets), and then merely pull the Ethernet cable from the test machine. I then terminate the Node test code (with Ethernet still unplugged). I then plug it back in, and restart the test code. This leaves numerous websocket connections in a "zombie" state. Any attempts at writing to those sockets will eventully block somewhere. It is possible that things will eventually time out. But from the clients point of view, the server is "hung" as it stops responding to calls.

Note that id I instead of unplugging the Ethernet cable close down the test clients orderly, no hang occurs. So it seems related to this "unexpected" termination of the socket. Surely, this is "wrong" on the websocket clients part. But I suspect that's essentially what happens with real websocket clients, which are typically phones. For instance, the phone leaves cellular coverage while being connected, it runs out of battery. Or perhaps it's just put to sleep and back in the pocket. If a few connections "misbehave" in this way, it seems to cause no harm. But if enough of them do so quickly enough, all websocket threads likely end up blocking somewhere.

Increasing the pool size seems to push this problem further a bit. But given enough misbehaving clients, it will still happen at some point.

It seems that applying some reasonable write timeout, causing the write to be aborted unless completed within a (relatively short - say seconds) timeframe, fixes the problem, as it indicates the websocket has gone bad, and will therefore be disscarded.

Some of the above is empirical tests (with out nodejs based test code, and pulling the Ethernet jack), while the last paragraph is speculation on my part. But in either case, hacking the jetty code, as suggested by jonhill1977 in #272, makes the "hang" go away, and clients happy.

-JM

@joakime
Copy link
Contributor

joakime commented May 2, 2018

@TheWizz doing various "unplug" tests show that the idle timeouts do kick in and terminate the connection and free up the resources associated with it.

Using this test project - https://github.com/joakime/jetty-websocket-browser-tool

It will compile down to a uber-jar if you want to move it around to other systems to try with.

The test setup used ...

Laptop A (Wired Networking only. Wifi turned off) - this is the client machine.
Laptop A is wired connection to switch X
Switch X is wired to Switch Y
Server Foo is wired to Switch Y
Server Foo is a Fedora 25 machine with Java 8u171 and Jetty 9.4.9.
Server Bar is wired to Switch Y
Server Bar is a Windows 10 machine with Java 8u171 and Jetty 9.4.9.

  1. Server is started.
  2. WebSocket client is connected (to one of 4 endpoints: JSR Basic Remote, JSR Async Remote, Jetty Blocking Remote, Jetty Async Remote)
  3. WebSocket client requests a flood of messages (text and/or binary, different sizes, etc)
  4. While WebSocket client is receiving messages, the network cable on Laptop A is pulled.
  5. Configured Idle Timeout later, the server shows the following behavior ...
java -jar .\target\jetty-websocket-browser-tool-1.0-SNAPSHOT.jar --websocket.timeout.idle=10000
2018-05-02 14:37:00.009:INFO::main: Logging initialized @197ms to org.eclipse.jetty.util.log.StdErrLog
2018-05-02 14:37:00.056:INFO:oejd.WSServer:main: Connector Idle Timeout (ms): 300000
2018-05-02 14:37:00.081:INFO:oejd.WSServer:main: WebSocket Idle Timeout (ms): 10000
2018-05-02 14:37:00.203:INFO:oejs.Server:main: jetty-9.4.z-SNAPSHOT; built: 2018-03-20T07:21:10-05:00; git: 1f8159b1e4a42d3f79997021ea1609f2fbac6de5; jvm 10+46
2018-05-02 14:37:00.264:INFO:oejsh.ContextHandler:main: Started o.e.j.s.ServletContextHandler@ed7f8b4{/,jar:file:/C:/code/jetty/github/jetty-websocket-browser-tool/target/jetty-websocket-browser-tool-1.0-SNAPSHOT.jar!/debug-base/,AVAILABLE}
2018-05-02 14:37:00.376:INFO:oejs.AbstractConnector:main: Started ServerConnector@7c06669{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
2018-05-02 14:37:00.377:INFO:oejs.Server:main: Started @572ms
2018-05-02 14:37:00.382:INFO:oejd.WSServer:main: Server started on http://192.168.0.119:8080/
2018-05-02 14:37:13.938:INFO:oejdj.JsrBasicWebSocket:qtp1401132667-14: onTextMessage(floodtext:20,10000)
2018-05-02 14:37:13.940:INFO:oejd.FloodTextTask:qtp1401132667-14: Creating min:20 max10000
2018-05-02 14:37:13.943:INFO:oejd.FloodTextTask:qtp1401132667-18: active:true isConnected:true
2018-05-02 14:37:36.493:WARN:oejdj.JsrBasicWebSocket:Scheduler-1897115967: onBaseError(java.util.concurrent.TimeoutException)
java.util.concurrent.TimeoutException: Idle timeout expired: 10001/10000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
        at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2018-05-02 14:37:36.497:WARN:oejwcec.CompressExtension:Scheduler-1897115967:
java.util.concurrent.TimeoutException: Idle timeout expired: 10001/10000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
        at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2018-05-02 14:37:36.498:WARN:oejut.QueuedThreadPool:qtp1401132667-18:
org.eclipse.jetty.websocket.api.WebSocketException: Unable to send basic text message: java.util.concurrent.TimeoutException: Idle timeout expired: 10001/10000 ms
        at org.eclipse.jetty.demo.jsr.JsrBasicWebSocket.sendTextMessage(JsrBasicWebSocket.java:46)
        at org.eclipse.jetty.demo.FloodTextTask.run(FloodTextTask.java:61)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:754)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:672)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by:
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 10001/10000 ms
        at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:226)
        at org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:90)
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(WebSocketRemoteEndpoint.java:107)
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(WebSocketRemoteEndpoint.java:394)
        at org.eclipse.jetty.websocket.jsr356.JsrBasicRemote.sendText(JsrBasicRemote.java:108)
        at org.eclipse.jetty.demo.jsr.JsrBasicWebSocket.sendTextMessage(JsrBasicWebSocket.java:42)
        at org.eclipse.jetty.demo.FloodTextTask.run(FloodTextTask.java:61)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:754)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:672)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by:
java.util.concurrent.TimeoutException: Idle timeout expired: 10001/10000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
        at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
2018-05-02 14:37:36.499:INFO:oejdj.JsrBasicWebSocket:Scheduler-1897115967: onBaseClose(1001, Idle Timeout)
2018-05-02 14:37:36.501:WARN:oejut.QueuedThreadPool:qtp1401132667-18: Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@2ab4ca4b in QueuedThreadPool[qtp1401132667]@5383967b{STARTED,8<=12<=200,i=4,q=0}
2018-05-02 14:37:36.502:WARN:oejdj.JsrBasicWebSocket:Scheduler-1897115967: onBaseError(java.util.concurrent.TimeoutException)
java.util.concurrent.TimeoutException: Idle timeout expired: 10001/10000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
        at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)

The behavior and results are the same for Blocking writes and Async writes ...

Blocking Writes

Async Writes

A variety of clients have been used (so far) to test this behavior with no change in results (the blocked write is exited when the idle timeout fires)

  • Google Chrome 66 (on desktop)
  • Firefox 59 (on desktop)
  • Jetty WebSocketClient
  • Jetty's JSR356 ClientContainer
  • Tyrus JSR356 Client
  • Autobahn WebSocket Client
  • Google pyWebSocket
  • wscat (found on npm)

I will continue to improve this simple tool to attempt to get to the bottom of this reported issue.

@joakime
Copy link
Contributor

joakime commented Mar 29, 2019

@gbrehmer thank you for the dump, but we're a bit more confused now.
That dump has no websocket endpoints present. None registered, none added.
It does have websocket support present (as evidenced by the WebSocketUpgradeFilter), but nothing is using it (that filter has 0 mappings present).

@gbrehmer
Copy link

You are right. I didn't checked the log before posting. I created a HTTP endpoint to trigger the dump creation, but probably injected the wrong webserver instance and didn't know that Spring Boot creates multiple instances. I have to go deeper into the Spring Boot code. It is possible, that the missing parts are produced by https://github.com/spring-projects/spring-framework/blob/master/spring-websocket/src/main/java/org/springframework/web/socket/server/jetty/JettyRequestUpgradeStrategy.java

@gbrehmer
Copy link

It seems that websocket part is created on the fly without direct mapping to the jetty server instance. Like using the jetty websocket support in a embbeded way as part of spring webmvc controller logic

@gbrehmer
Copy link

gbrehmer commented Apr 4, 2019

@joakime @sbordet do you have any contacts to the Spring developers? Should I raise a bug there because using WebSocketHandler manually is not supported by the jetty team? Last time they closed the bug on spring side because this bug exists

@valliman
Copy link

valliman commented Jun 3, 2019

Hi guys,

we've just encountered this issue with the latest 9.4.18.v20190429 version.

Our scenario:
We have a server component having a server socket open for UI clients and additionally the server component has downstream connections to other server components using Jetty WebSocket Client. In the test scenario the network cable is unplugged and afterwards our system gets blocked.
We do not use any spring library, just vanilla Jetty webSocket classes.

Our observations after unplugging the network cable (and plugging it in later again):

At some point in time an exception in our sender thread [Worker-0] is thrown:
WARN [Worker-0] o.e.j.w.c.e.c.CompressExtension:

java.lang.NullPointerException: Deflater has been closed
	at java.util.zip.Deflater.ensureOpen(Unknown Source) ~[?:1.8.0_131]
	at java.util.zip.Deflater.deflate(Unknown Source) ~[?:1.8.0_131]
	at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.compress(CompressExtension.java:488) ~[websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.deflate(CompressExtension.java:451) ~[websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.process(CompressExtension.java:431) [websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) [jetty-util-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224) [jetty-util-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension.outgoingFrame(CompressExtension.java:218) [websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.extensions.ExtensionStack$Flusher.process(ExtensionStack.java:400) [websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) [jetty-util-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224) [jetty-util-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.outgoingFrame(ExtensionStack.java:277) [websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.uncheckedSendFrame(WebSocketRemoteEndpoint.java:307) [websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(WebSocketRemoteEndpoint.java:106) [websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(WebSocketRemoteEndpoint.java:385) [websocket-common-9.4.18.v20190429.jar:9.4.18.v20190429]
	at company.messaging.ws.WSSendTask.sendJsonViaSession(WSSendTask.java:62) [core-runtime-7.0.3.jar:7.0.3]
	at company.messaging.ws.WSSendTask.sendResponseOrNotification(WSSendTask.java:53) [core-runtime-7.0.3.jar:7.0.3]
	at company.messaging.ws.WSSendTask.run(WSSendTask.java:47) [core-runtime-7.0.3.jar:7.0.3]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:1.8.0_131]
	at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_131]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_131]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_131]
	at java.lang.Thread.run(Unknown Source) [?:1.8.0_131]

From that point on the sender thread Worker-0 is in WAITING as our thread dump can confirm and is blocked indefinitely, as a result the system is stuck:

2019-05-29 15:39:25
Full thread dump Java HotSpot(TM) Embedded Client VM (25.131-b11 mixed mode):

"JMX server connection timeout 73" - Thread t@73
   java.lang.Thread.State: TIMED_WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <177617e> (a [I)
        at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"RMI TCP Connection(22)-10.10.40.125" - Thread t@72
   java.lang.Thread.State: RUNNABLE
        at sun.management.ThreadImpl.dumpThreads0(Native Method)
        at sun.management.ThreadImpl.dumpAllThreads(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at sun.reflect.misc.Trampoline.invoke(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(Unknown Source)
        at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(Unknown Source)
        at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(Unknown Source)
        at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(Unknown Source)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
        at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
        at javax.management.StandardMBean.invoke(Unknown Source)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor114.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
        at sun.rmi.transport.Transport$1.run(Unknown Source)
        at sun.rmi.transport.Transport$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$222/26122873.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - locked <77a1ee> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"qtp26395465-67" - Thread t@67
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)
        at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
        - locked <1526db5> (a sun.nio.ch.Util$3)
        - locked <84a19b> (a java.util.Collections$UnmodifiableSet)
        - locked <1afe194> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(Unknown Source)
        at sun.nio.ch.SelectorImpl.select(Unknown Source)
        at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:464)
        at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:401)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"RMI Scheduler(0)" - Thread t@51
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <181ee17> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"Connector-Scheduler-1b7183b" - Thread t@38
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <64c0db> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"HttpClient@193e4a8-scheduler" - Thread t@35
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <7aae89> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"JmDNS pool-3-thread-1" - Thread t@34
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <a35e36> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
        at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"JmDNS(dheva01.local.).State.Timer" - Thread t@33
   java.lang.Thread.State: TIMED_WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <190b9c0> (a java.util.TaskQueue)
        at java.util.TimerThread.mainLoop(Unknown Source)
        at java.util.TimerThread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"JmDNS(dheva01.local.).Timer" - Thread t@32
   java.lang.Thread.State: TIMED_WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <1975a67> (a java.util.TaskQueue)
        at java.util.TimerThread.mainLoop(Unknown Source)
        at java.util.TimerThread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"SocketListener(dheva01.local.)" - Thread t@31
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainDatagramSocketImpl.receive0(Native Method)
        - locked <1aa7fa> (a java.net.PlainDatagramSocketImpl)
        at java.net.AbstractPlainDatagramSocketImpl.receive(Unknown Source)
        - locked <1aa7fa> (a java.net.PlainDatagramSocketImpl)
        at java.net.DatagramSocket.receive(Unknown Source)
        - locked <e21ea1> (a java.net.DatagramPacket)
        - locked <32e904> (a java.net.MulticastSocket)
        at javax.jmdns.impl.SocketListener.run(SocketListener.java:41)

   Locked ownable synchronizers:
        - None

"saga-worker" - Thread t@30
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <163ce07> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
        at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"Session-HouseKeeper-1a69af7" - Thread t@28
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <bf9f26> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"qtp26395465-27-acceptor-0@18e8432-ServerConnector@1b7183b{HTTP/1.1,[http/1.1]}{0.0.0.0:8888}" - Thread t@27
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
        at sun.nio.ch.ServerSocketChannelImpl.accept(Unknown Source)
        at sun.nio.ch.ServerSocketChannelImpl.accept(Unknown Source)
        - locked <3e8206> (a java.lang.Object)
        at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:385)
        at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:648)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-24" - Thread t@24
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <1db3f8c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:858)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:783)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-23" - Thread t@23
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <1db3f8c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:858)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:783)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-22" - Thread t@22
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <1db3f8c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:858)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:783)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-21" - Thread t@21
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <1db3f8c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:858)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:783)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-20" - Thread t@20
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <1db3f8c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:858)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:783)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-19" - Thread t@19
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <1db3f8c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:858)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:783)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-18" - Thread t@18
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(Unknown Source)
        at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
        - locked <cbc7c6> (a sun.nio.ch.Util$3)
        - locked <953396> (a java.util.Collections$UnmodifiableSet)
        - locked <3db1f0> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(Unknown Source)
        at sun.nio.ch.SelectorImpl.select(Unknown Source)
        at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:464)
        at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:401)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"WebSocketClient@27492551-17" - Thread t@17
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <1db3f8c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:858)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:783)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"Worker-0" - Thread t@14
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for <2ec724> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)
        at org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:219)
        at org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:90)
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite(WebSocketRemoteEndpoint.java:107)
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString(WebSocketRemoteEndpoint.java:385)
        at company.messaging.ws.WSSendTask.sendJsonViaSession(WSSendTask.java:62)
        at company.messaging.ws.WSSendTask.sendResponseOrNotification(WSSendTask.java:53)
        at company.messaging.ws.WSSendTask.run(WSSendTask.java:47)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - locked <3793c7> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"RMI TCP Accept-0" - Thread t@13
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at java.net.ServerSocket.accept(Unknown Source)
        at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"RMI TCP Accept-1189" - Thread t@12
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at java.net.ServerSocket.accept(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"RMI TCP Accept-0" - Thread t@11
   java.lang.Thread.State: RUNNABLE
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
        at java.net.ServerSocket.implAccept(Unknown Source)
        at java.net.ServerSocket.accept(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
        at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"VM JFR Buffer Thread" - Thread t@7
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"JFR request timer" - Thread t@5
   java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <6bcd90> (a java.util.TaskQueue)
        at java.lang.Object.wait(Unknown Source)
        at java.util.TimerThread.mainLoop(Unknown Source)
        at java.util.TimerThread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" - Thread t@4
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" - Thread t@3
   java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <13d5908> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"Reference Handler" - Thread t@2
   java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <12cfae> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Unknown Source)
        at java.lang.ref.Reference.tryHandlePending(Unknown Source)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"main" - Thread t@1
   java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <3739af> (a java.lang.Object)
        at java.lang.Object.wait(Unknown Source)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.join(QueuedThreadPool.java:498)
        at org.eclipse.jetty.server.Server.join(Server.java:558)
        at company.ClusterCore.join(ClusterCore.java:98)
        at company.ClusterKitt.join(ClusterKitt.java:71)
        at company.apps.bootstrap.AppsBootstrapper.main(AppsBootstrapper.java:19)

   Locked ownable synchronizers:
        - None

Update:
As we had some time-pressure to address this issue we choose the way to override the getIdleTimeout method in SharedBlockingCallback of the jetty-util artifact. This work around is working in our case.

@dreis2211
Copy link
Contributor Author

I can see the NullPointerException coming from the CompressExtension as well, but I can't really connect it directly to the times where we see blocked threads (yet). Still this might be a hint of what's going wrong.

@jsteinberg
Copy link

Found this issue on our production servers yesterday. Currently on version 9.4.9.v20180320, but digging through release notes/issues lead me here. I also dug through the code a bit and seems there haven't been many changes relating to this issue.

I'll try to provide as much information as I can.

We are getting this exact issue with an NPE in the CompressExtension.

java.lang.NullPointerException: Deflater has been closed
at java.util.zip.Deflater.ensureOpen(Deflater.java:559)
at java.util.zip.Deflater.deflate(Deflater.java:440)
at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.compress(CompressExtension.java:488)
at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.deflate(CompressExtension.java:451)
at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.process(CompressExtension.java:431)
at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension.outgoingFrame(CompressExtension.java:218)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack$Flusher.process(ExtensionStack.java:419)
at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.outgoingFrame(ExtensionStack.java:296)
at org.eclipse.jetty.websocket.common.message.MessageWriter.flush(MessageWriter.java:154)
at org.eclipse.jetty.websocket.common.message.MessageWriter.close(MessageWriter.java:122)
at org.eclipse.jetty.websocket.jsr356.AbstractJsrRemote.$closeResource(AbstractJsrRemote.java:139)
at org.eclipse.jetty.websocket.jsr356.AbstractJsrRemote.sendObjectViaFuture(AbstractJsrRemote.java:139)
at org.eclipse.jetty.websocket.jsr356.JsrAsyncRemote.sendObject(JsrAsyncRemote.java:85)

I believe this is happening when we are sending a message as a client disconnects. I'm going to attempt to beef up our logging to validate that assumption, but was hoping to find a fix instead.

Name: NAME_OF_THREAD_FROM_ABOVE_NPE
State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3e91ce4a
Total blocked: 4  Total waited: 3,301

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:210)
org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block(BlockingWriteCallback.java:90)
org.eclipse.jetty.websocket.common.message.MessageWriter.flush(MessageWriter.java:155)
   - locked org.eclipse.jetty.websocket.common.message.MessageWriter@291bdb74
org.eclipse.jetty.websocket.common.message.MessageWriter.close(MessageWriter.java:122)
org.eclipse.jetty.websocket.jsr356.AbstractJsrRemote.$closeResource(AbstractJsrRemote.java:139)
org.eclipse.jetty.websocket.jsr356.AbstractJsrRemote.sendObjectViaFuture(AbstractJsrRemote.java:139)
org.eclipse.jetty.websocket.jsr356.JsrAsyncRemote.sendObject(JsrAsyncRemote.java:85)

I'm not ruling out the possibility that we are sending messages incorrectly. I'm admittedly not very familiar with the jetty codebase in general. But this is our general setup.

RabbitMqConsumer -> WebSocketSession

public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties,
                             byte[] body) throws IOException {
    websocketSession.getAsyncRemote().sendObject(new RawBytes(body));
  }

This is currently running in our rabbit consumer thread pool. When we experience this exception, it effectively lowers our consumer pool by 1 thread(until we run out). The Jetty threads are still functional. You can initiate new connections and generally interact with the server. But our message generation side is locked.

Given it appears you need to experience a hard disconnect while a message is sending, I can't really attach a sample project.

In my case it is obvious the message sending thread and the thread that handles the session disconnecting are different. Is there a best practice to ensure I'm sending messages on a thread that would sync with any disconnects?

My workaround for now was to increase my consumer pool count and restart the servers every night. So right now I can see I have locked threads in production, but there are many more to work with.

Thanks,

Jeremy

@joakime
Copy link
Contributor

joakime commented Jun 21, 2019

I'm keen on attempting (again) to replicate this ...

What is RawBytes declared like?
From the stacktrace it appears to be registered against javax.websocket.Encoder.TextStream. (The use of MessageWriter tells us this)

The stacktrace is also telling us that you are not paying attention to the Future returned on javax.websocket.RemoteEndpoint.Async.sendObject().
Since you stated you are sending from another thread, you are now responsible for handling the sanity of the sends of your messages.
Likely that you are not paying attention to any of the callbacks or futures and are attempting to send messages after one of them failed. (If you use the async methods you must pay attention to the futures or callbacks on your messages. example: most messaging libraries don't send another message until the current one completes, making note of what messages haven't been sent for retransmissions on reconnect, etc.)

The fact that you got "Deflater has been closed" means you attempted to send a message after the websocket closed.
This could be due to simple concurrency, your thread is attempting to send messages after the session was close.
Or it could be due to some batching behavior. The message is sitting in the outgoing batch and then the connection died.

Since you are also using javax.websocket, you have limited ability to control the batching of outgoing messages.
You basically have 3 methods on javax.websocket.RemoteEndpoint to pay attention to.

  • .flushBatch()
  • .setBatchingAllowed(boolean allowed)
  • .getBatchingAllowed()

Consider using those as well.

You can also disable the permessage-deflate extension, if you want.
Use the following Configurator to strip all extensions during negotiation.

public class NoExtensionsConfigurator extends ServerEndpointConfig.Configurator
{
    @Override
    public List<Extension> getNegotiatedExtensions(List<Extension> installed, 
                     List<Extension> requested)
    {
        return Collections.emptyList();
    }
}

If you choose to disable the permessage-deflate extension, let us know if you can still replicate this issue after you have disabled it.
This could be valuable information on attempting to fix this.

Lastly, 9.4.9.v20180320 is getting a bit old.
There's been 10 releases since then, we are on 9.4.19.v20190610 now.
Jetty has had multiple CVE's reported against it and fixed, versions prior to 9.4.17.v20190418 are known to be vulnerable.

@jsteinberg
Copy link

Thank you for the response. Updating to the latest is part of my plan but I wanted to see if I there was a changelog/issue resolution that aligned with the issue I am experiencing instead of just hoping it was fixed.

RawBytes just wraps the byte[] and we've implemented a matching Encoder.TextStream<RawBytes> encoder. After digging a bit it seems there might be an alternative object/encoder we could use.

We are ignoring the future response. I will also look into handling that.

The way our rabbitmq integration works will ensure there is only 1 message sent(queues are single threaded) to a websocket session at a time. When we detect that a websocket has closed we disconnect from the rabbit queue.

We are not batching anything(on purpose). As you've pointed out, that might not matter.

I can try disabling the permessage-deflate extension, but since I've only seen this in production I will need to do some more research before deployment.

Based off of the @TheWizz 's comment above about yanking the ethernet cord, it aligns with my theory of a hard disconnect while sending a message. I thought providing my usage example where that is a very likely scenario might be helpful to this discussion.

@lachlan-roberts
Copy link
Contributor

@joakime I managed to reproduce the same NPE with this test

@Test
public void test() throws Exception
{
    URI uri = new URI("ws://localhost:" + connector.getLocalPort());
    EventSocket clientSocket = new EventSocket();

    ClientUpgradeRequest upgradeRequest = new ClientUpgradeRequest();
    upgradeRequest.addExtensions("permessage-deflate");
    client.connect(clientSocket, uri, upgradeRequest).get(5, TimeUnit.SECONDS);

    clientSocket.getSession().getRemote().sendStringByFuture("test1");
    ((WebSocketSession)clientSocket.getSession()).stop();
    clientSocket.getSession().getRemote().sendStringByFuture("test2");
}

@dreis2211
Copy link
Contributor Author

@lachlan-roberts Awesome. Could we also test if this leads to the blocked threads?

@lachlan-roberts
Copy link
Contributor

@dreis2211 If the blocking send is used then it just blocks forever, it seems the callback is never notified of the failure.

lachlan-roberts added a commit that referenced this issue Jun 27, 2019
Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
lachlan-roberts added a commit that referenced this issue Jun 27, 2019
Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
lachlan-roberts added a commit that referenced this issue Jun 27, 2019
Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
lachlan-roberts added a commit that referenced this issue Jul 3, 2019
* Issue #2061 - fail current entry in CompressExtension on failure

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

* Issue #2061 - do not notify callback failure twice

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>
@GFriedrich
Copy link

@lachlan-roberts: This ticket should be attached to Jetty milestone 9.4.x again, as it has been fixed right there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

Successfully merging a pull request may close this issue.