-
-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible thread exhaustion issue #1998
Comments
If you have your installation in exactly such a situation, simply do a thread dump and post it here - this should help to identify which thread pool is exhausted and why. |
Works for me. I'll wait until my system gets into a bad state and peel off a thread dump. Are there any debugs to peel off to try to capture this? I have org.openhab.core.common peeled off to a file as I believe the threadpool managers live in there right now. While we wait for the system to go into a bad state, if there is an exhaustion issue, would that cause threads to not come back (e.g. lutron, ecobee, etc doing periodic "stuff")? |
No need for logs, the thread dump should say it all.
Those are not threads, but periodically scheduled jobs. So yes, if a thread pool is exhausted, those jobs would be impacted as well. |
Sounds good. I've reversed the fix for my rule to see if it will stress the system into a bad state. I'll post the thread dump once it goes. Could be a few days, seems to happen randomly when it fails. I've put a notification in to let me know as soon as it goes sideways. |
See attached thread dump. This was about 6 minutes after the Lutron things all went offline. |
Thanks. I think this shows a few bugs in bindings:
Additionally, it registers as a discovery listener (also not really allowed for a handler) and creates websocket connections to the remote device instead of returning the call immediately, which blocks the whole discovery infrastructure:
TL;DR: Close this issue, but open 3 new ones for the respective bindings 😎 . |
Works for me! Before I kill it, could the SamsungTV been the cause of that long long long JuPnP bug that was going for forever? |
I didn't follow that bug, but indeed the SamsungTV does bad things within the UPnP thread - I just added a second finding in my comment above. |
The JuPnP "bug" was openhab/openhab-addons#5892 |
3 bugs opened, copy/pasted your comments and referenced this bug. Closing this. THANK YOU! |
Thanks! You've missed my "Additionally" part on the SamsungTV binding. |
Not seeing what I missed. I see the Additionally upnp-main-173 note on 9495 when I pull it up. |
All there, all good! |
Sweet. Hopefully fixing these up will fix some of the annoying thread issues floating around. |
One last question, I promise. Shouldn't the thread pool grow automatically when all threads are in use so that even events like this don't cause exhaustion? |
No, these pools always have a maximum size. Anything else would only hide the issues and potentially result in resource leaks by an constantly growing number of threads. |
Thank you. That makes sense. And this thread is now dead... (pun intended) |
SamsungTV binding as source of problems, it was clear for me since ages and was discussed many times in the JUpnp issue. I even proposed a fix relative to the discovery for this binding but it triggered no interest, no review and no testing. The PR is probably still in 2.5 branch if I didn't close it. |
@lolodomo If you have a fix for openhab/openhab-addons#9495, it would be great if you could comment on that issue and port your existing PR. I am pretty sure that it will trigger interest. |
@kaikreuzer I've loaded each of the 3 fixed bindings to see if this issue is resolved. I had to compile 2 from the PRs but confirmed that the git log included the patches. I'll let this stew for a bit to make sure the problems are gone. |
We've been working through an odd issue for the past few weeks on the lutron binding (openhab/openhab-addons#9178). For no reason we've been able to identify, the binding just "stops working" and OH requires a restart. As we hadn't seen this on other bindings, we attributed it to the new LEAP code and were working down that path. This morning I had a rule "misbehave" because I was missing a check. It caused the rule to effectively trigger itself in an endless loop. When this happened, the Lutron went offline again. My ecobee also stopped working (but with no errors in the log). Some of my periodic rules also stopped working (for example, I have a periodic speedtest run by cron and it never fired). The only fix was to restart OH. I'm happy to attempt to dig through debugs to figure out what is happening, I just have no idea which debugs to turn on to monitor the threadpools. Any help in getting started would be appreciated.
The text was updated successfully, but these errors were encountered: