Possible thread exhaustion issue #1998

morph166955 · 2020-12-23T17:54:49Z

We've been working through an odd issue for the past few weeks on the lutron binding (openhab/openhab-addons#9178). For no reason we've been able to identify, the binding just "stops working" and OH requires a restart. As we hadn't seen this on other bindings, we attributed it to the new LEAP code and were working down that path. This morning I had a rule "misbehave" because I was missing a check. It caused the rule to effectively trigger itself in an endless loop. When this happened, the Lutron went offline again. My ecobee also stopped working (but with no errors in the log). Some of my periodic rules also stopped working (for example, I have a periodic speedtest run by cron and it never fired). The only fix was to restart OH. I'm happy to attempt to dig through debugs to figure out what is happening, I just have no idea which debugs to turn on to monitor the threadpools. Any help in getting started would be appreciated.

kaikreuzer · 2020-12-23T18:11:20Z

If you have your installation in exactly such a situation, simply do a thread dump and post it here - this should help to identify which thread pool is exhausted and why.

morph166955 · 2020-12-23T18:16:20Z

Works for me. I'll wait until my system gets into a bad state and peel off a thread dump. Are there any debugs to peel off to try to capture this? I have org.openhab.core.common peeled off to a file as I believe the threadpool managers live in there right now.

While we wait for the system to go into a bad state, if there is an exhaustion issue, would that cause threads to not come back (e.g. lutron, ecobee, etc doing periodic "stuff")?

kaikreuzer · 2020-12-23T18:25:47Z

No need for logs, the thread dump should say it all.

would that cause threads to not come back (e.g. lutron, ecobee, etc doing periodic "stuff")?

Those are not threads, but periodically scheduled jobs. So yes, if a thread pool is exhausted, those jobs would be impacted as well.

morph166955 · 2020-12-23T18:50:40Z

Sounds good. I've reversed the fix for my rule to see if it will stress the system into a bad state. I'll post the thread dump once it goes. Could be a few days, seems to happen randomly when it fails. I've put a notification in to let me know as soon as it goes sideways.

morph166955 · 2020-12-23T21:17:13Z

See attached thread dump. This was about 6 minutes after the Lutron things all went offline.
Thread-Exhaustion-1.txt

kaikreuzer · 2020-12-23T23:15:21Z

Thanks. I think this shows a few bugs in bindings:

DenonMarantz:
It permanently blocks one thread from the thread pool as it uses it to listen to a telnet connection. It should use its own thread instead of the shared thread pool for that.

"OH-thingHandler-2" #226 daemon prio=5 os_prio=0 cpu=76.63ms elapsed=3419.19s tid=0x00007fbd1402c000 nid=0x40cdb runnable  [0x00007fbd00825000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(java.base@11.0.9.1/Native Method)
	at java.net.SocketInputStream.socketRead(java.base@11.0.9.1/SocketInputStream.java:115)
	at java.net.SocketInputStream.read(java.base@11.0.9.1/SocketInputStream.java:168)
	at java.net.SocketInputStream.read(java.base@11.0.9.1/SocketInputStream.java:140)
	at sun.nio.cs.StreamDecoder.readBytes(java.base@11.0.9.1/StreamDecoder.java:284)
	at sun.nio.cs.StreamDecoder.implRead(java.base@11.0.9.1/StreamDecoder.java:326)
	at sun.nio.cs.StreamDecoder.read(java.base@11.0.9.1/StreamDecoder.java:178)
	- locked <0x00000000c45000f0> (a java.io.InputStreamReader)
	at java.io.InputStreamReader.read(java.base@11.0.9.1/InputStreamReader.java:185)
	at java.io.BufferedReader.fill(java.base@11.0.9.1/BufferedReader.java:161)
	at java.io.BufferedReader.readLine(java.base@11.0.9.1/BufferedReader.java:326)
	- locked <0x00000000c45000f0> (a java.io.InputStreamReader)
	at java.io.BufferedReader.readLine(java.base@11.0.9.1/BufferedReader.java:392)
	at org.openhab.binding.denonmarantz.internal.connector.telnet.DenonMarantzTelnetClient.run(DenonMarantzTelnetClient.java:71)

The SamsungTV binding has almost every method of SamsungTvHandler marked as synchronized, which makes it get blocked when multiple things happen at the same time (here UPnP discovery and value reading).

"upnp-main-173" #5582 prio=5 os_prio=0 cpu=52.86ms elapsed=1184.08s tid=0x00007fbd78154800 nid=0x43ac1 waiting for monitor entry  [0x00007fbcf4eff000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:69)
	- waiting to lock <0x00000000c452b620> (a java.lang.Object)
	at org.openhab.binding.samsungtv.internal.protocol.RemoteControllerWebSocket.openConnection(RemoteControllerWebSocket.java:147)
	at org.openhab.binding.samsungtv.internal.service.RemoteControllerService.start(RemoteControllerService.java:203)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.checkCreateManualConnection(SamsungTvHandler.java:312)
	- locked <0x00000000c452ca20> (a org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.checkAndCreateServices(SamsungTvHandler.java:253)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.thingDiscovered(SamsungTvHandler.java:362)
	at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl$1.run(DiscoveryServiceRegistryImpl.java:260)

"OH-thingHandler-1" #141 daemon prio=5 os_prio=0 cpu=5576.05ms elapsed=3419.65s tid=0x00007fbd20052800 nid=0x40c8a waiting for monitor entry  [0x00007fbd05d76000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.valueReceived(SamsungTvHandler.java:225)
	- waiting to lock <0x00000000c452ca20> (a org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler)
	at org.openhab.binding.samsungtv.internal.service.MediaRendererService.onValueReceived(MediaRendererService.java:205)
	at org.openhab.binding.samsungtv.internal.service.MediaRendererService.updateResourceState(MediaRendererService.java:240)
	at org.openhab.binding.samsungtv.internal.service.MediaRendererService.handleCommand(MediaRendererService.java:121)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.poll(SamsungTvHandler.java:217)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler$$Lambda$1161/0x0000000101137440.run(Unknown Source)

Additionally, it registers as a discovery listener (also not really allowed for a handler) and creates websocket connections to the remote device instead of returning the call immediately, which blocks the whole discovery infrastructure:

"upnp-main-173" #5582 prio=5 os_prio=0 cpu=52.86ms elapsed=1184.08s tid=0x00007fbd78154800 nid=0x43ac1 waiting for monitor entry  [0x00007fbcf4eff000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:69)
	- waiting to lock <0x00000000c452b620> (a java.lang.Object)
	at org.openhab.binding.samsungtv.internal.protocol.RemoteControllerWebSocket.openConnection(RemoteControllerWebSocket.java:147)
	at org.openhab.binding.samsungtv.internal.service.RemoteControllerService.start(RemoteControllerService.java:203)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.checkCreateManualConnection(SamsungTvHandler.java:312)
	- locked <0x00000000c452ca20> (a org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.checkAndCreateServices(SamsungTvHandler.java:253)
	at org.openhab.binding.samsungtv.internal.handler.SamsungTvHandler.thingDiscovered(SamsungTvHandler.java:362)
	at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl$1.run(DiscoveryServiceRegistryImpl.java:260)

Ecobee:
The EcobeeThermostatBridgeHandler does nasty stuff by starting discovery services itself (and not returning immediately, what would be anyhow expected from startBackgroundDiscovery()). This causes multiple discoveries running in parallel and blocking the 3 remaining threads of the thread pool:

"OH-thingHandler-3" #229 daemon prio=5 os_prio=0 cpu=6314.43ms elapsed=3419.08s tid=0x00007fbd14039000 nid=0x40cde waiting for monitor entry  [0x00007fbd00521000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl.thingDiscovered(DiscoveryServiceRegistryImpl.java:252)
	- waiting to lock <0x00000000c4f0d968> (a org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl)
	at org.openhab.core.config.discovery.AbstractDiscoveryService.thingDiscovered(AbstractDiscoveryService.java:266)
	at org.openhab.binding.ecobee.internal.discovery.SensorDiscoveryService.discoverSensors(SensorDiscoveryService.java:111)
	- locked <0x00000000cee7e450> (a org.openhab.binding.ecobee.internal.discovery.SensorDiscoveryService)
	at org.openhab.binding.ecobee.internal.discovery.SensorDiscoveryService.startBackgroundDiscovery(SensorDiscoveryService.java:84)
	at org.openhab.binding.ecobee.internal.handler.EcobeeThermostatBridgeHandler.discoverSensors(EcobeeThermostatBridgeHandler.java:847)

"OH-thingHandler-5" #238 daemon prio=5 os_prio=0 cpu=7112.63ms elapsed=3419.01s tid=0x00007fbd4c012800 nid=0x40cea waiting for monitor entry  [0x00007fbcff918000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl.thingDiscovered(DiscoveryServiceRegistryImpl.java:252)
	- waiting to lock <0x00000000c4f0d968> (a org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl)
	at org.openhab.core.config.discovery.AbstractDiscoveryService.thingDiscovered(AbstractDiscoveryService.java:266)
	at org.openhab.binding.ecobee.internal.discovery.ThermostatDiscoveryService.discoverThermostats(ThermostatDiscoveryService.java:118)
	- locked <0x00000000ce589800> (a org.openhab.binding.ecobee.internal.discovery.ThermostatDiscoveryService)
	at org.openhab.binding.ecobee.internal.discovery.ThermostatDiscoveryService.startBackgroundDiscovery(ThermostatDiscoveryService.java:88)
	at org.openhab.binding.ecobee.internal.handler.EcobeeAccountBridgeHandler.discoverThermostats(EcobeeAccountBridgeHandler.java:253)
	at org.openhab.binding.ecobee.internal.handler.EcobeeAccountBridgeHandler.refresh(EcobeeAccountBridgeHandler.java:226)
	at org.openhab.binding.ecobee.internal.handler.EcobeeAccountBridgeHandler$$Lambda$900/0x0000000100e30040.run(Unknown Source)

"OH-thingHandler-4" #237 daemon prio=5 os_prio=0 cpu=6348.00ms elapsed=3419.01s tid=0x00007fbd14049800 nid=0x40ceb waiting for monitor entry  [0x00007fbcff816000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl.thingDiscovered(DiscoveryServiceRegistryImpl.java:252)
	- waiting to lock <0x00000000c4f0d968> (a org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl)
	at org.openhab.core.config.discovery.AbstractDiscoveryService.thingDiscovered(AbstractDiscoveryService.java:266)
	at org.openhab.binding.ecobee.internal.discovery.SensorDiscoveryService.discoverSensors(SensorDiscoveryService.java:111)
	- locked <0x00000000cef465f8> (a org.openhab.binding.ecobee.internal.discovery.SensorDiscoveryService)
	at org.openhab.binding.ecobee.internal.discovery.SensorDiscoveryService.startBackgroundDiscovery(SensorDiscoveryService.java:84)
	at org.openhab.binding.ecobee.internal.handler.EcobeeThermostatBridgeHandler.discoverSensors(EcobeeThermostatBridgeHandler.java:847)

TL;DR: Close this issue, but open 3 new ones for the respective bindings 😎 .

morph166955 · 2020-12-23T23:21:08Z

Works for me! Before I kill it, could the SamsungTV been the cause of that long long long JuPnP bug that was going for forever?

kaikreuzer · 2020-12-23T23:23:48Z

I didn't follow that bug, but indeed the SamsungTV does bad things within the UPnP thread - I just added a second finding in my comment above.

morph166955 · 2020-12-23T23:24:36Z

The JuPnP "bug" was openhab/openhab-addons#5892

morph166955 · 2020-12-23T23:28:16Z

3 bugs opened, copy/pasted your comments and referenced this bug. Closing this. THANK YOU!

kaikreuzer · 2020-12-23T23:29:41Z

Thanks! You've missed my "Additionally" part on the SamsungTV binding.

morph166955 · 2020-12-23T23:32:42Z

Not seeing what I missed. I see the Additionally upnp-main-173 note on 9495 when I pull it up.

kaikreuzer · 2020-12-23T23:42:18Z

All there, all good!

morph166955 · 2020-12-23T23:46:24Z

Sweet. Hopefully fixing these up will fix some of the annoying thread issues floating around.

morph166955 · 2020-12-23T23:57:56Z

One last question, I promise. Shouldn't the thread pool grow automatically when all threads are in use so that even events like this don't cause exhaustion?

kaikreuzer · 2020-12-24T00:07:04Z

No, these pools always have a maximum size. Anything else would only hide the issues and potentially result in resource leaks by an constantly growing number of threads.

morph166955 · 2020-12-24T00:14:04Z

Thank you. That makes sense. And this thread is now dead... (pun intended)

lolodomo · 2020-12-24T08:07:19Z

SamsungTV binding as source of problems, it was clear for me since ages and was discussed many times in the JUpnp issue. I even proposed a fix relative to the discovery for this binding but it triggered no interest, no review and no testing. The PR is probably still in 2.5 branch if I didn't close it.

kaikreuzer · 2020-12-24T10:28:49Z

@lolodomo If you have a fix for openhab/openhab-addons#9495, it would be great if you could comment on that issue and port your existing PR. I am pretty sure that it will trigger interest.

morph166955 · 2020-12-28T02:34:01Z

@kaikreuzer I've loaded each of the 3 fixed bindings to see if this issue is resolved. I had to compile 2 from the PRs but confirmed that the git log included the patches. I'll let this stew for a bit to make sure the problems are gone.

morph166955 mentioned this issue Dec 23, 2020

[DenonMarantz] Telnet Thread Issue openhab/openhab-addons#9494

Closed

This was referenced Dec 23, 2020

[SamsungTV] Several Thread Issues openhab/openhab-addons#9495

Closed

[Ecobee] Thread issues with discovery services openhab/openhab-addons#9496

Closed

morph166955 closed this as completed Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible thread exhaustion issue #1998

Possible thread exhaustion issue #1998

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020 •

edited

Loading

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020 •

edited

Loading

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 24, 2020

morph166955 commented Dec 24, 2020

lolodomo commented Dec 24, 2020 •

edited

Loading

kaikreuzer commented Dec 24, 2020

morph166955 commented Dec 28, 2020

Possible thread exhaustion issue #1998

Possible thread exhaustion issue #1998

Comments

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020 • edited Loading

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020 • edited Loading

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 23, 2020

morph166955 commented Dec 23, 2020

morph166955 commented Dec 23, 2020

kaikreuzer commented Dec 24, 2020

morph166955 commented Dec 24, 2020

lolodomo commented Dec 24, 2020 • edited Loading

kaikreuzer commented Dec 24, 2020

morph166955 commented Dec 28, 2020

kaikreuzer commented Dec 23, 2020 •

edited

Loading

morph166955 commented Dec 23, 2020 •

edited

Loading

lolodomo commented Dec 24, 2020 •

edited

Loading