-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JuPnP failing - causing devices to go offline #5892
Comments
Do you think this issue should be solved in openhab-core or the jupnp library @lolodomo? See also: eclipse-archived/smarthome#6779 |
I am not 100% sure, my analysis was done several months ago. But I guess it is more a bug in the JUPnP library. If it was a bug in the openHAB core framework, I would have probably already fixed it. |
Thanks! It might help to also create an issue for this in the jupnp issue tracker. |
Sonos & jUPNP Bounty to Resolve Communication Errors on OpenHAB Environment: Sonos Devices: This situation below has been happening to me and others for over a year now. OpenHAB Forum Postings: https://community.openhab.org/t/sonos-communication-error-after-speaker-firmware-update/84135 Scenarios:
https://community.openhab.org/t/oh-2-4-0-m7-sonos-online-and-immediately-offline-again/59253/9 I have turned on DEBUG logging on the Sonos binding and have NOT seen anything that shows a direct result of this Communication Error behaviour. I believe this issue to be a situation where both the 2 bindings (Sonos and jUPNP) are at fault partially. Best, Jay |
Thanks for adding the big bounty @jaywiseman1971! For the bounty see: https://www.bountysource.com/issues/80365719-jupnp-failing-causing-devices-to-go-offline |
I decided to turn on jupnp debugging for a few minutes while my Sonos PlayBar was in a bad state. Every second I get this message set: 2020-02-02 13:40:34.657 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control Then, every 600 seconds I get: 2020-02-02 13:44:11.804 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control 2020-02-02 13:54:22.222 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control A CURL (with POST) to the URL gives me: I have verified that I have <1ms ping times to the bar (no packet loss) and they are on the same network so the Sonos TTL=1 is not an issue here. I let this go for about an hour with the same result. Then I restarted OH2. All of my speakers immediately came online and are stable. Given this, I do NOT believe the issue to be the Sonos itself as a "random" restart of OH2 should not clear out the issue on the Sonos. Going through the debug of the OH2 restart it looks like OH2 effectively registers as an endpoint with the Sonos speaker. My assumption (since I can't just make one fail at will) is that this registration becomes invalid at some point. Assuming that this theory is correct, at this point I would suggest that the Sonos binding be modified to detect this issue and "self heal" by restarting the registration process with that specific speaker when the HTTP request to the control URL fails to reply. |
Are you powering off sometimes your Sonos devices? |
They are almost never powered off. Software upgrade reboots would be the exception. I don't think my playbar has been unplugged in over a year. |
Is there any way to adjust the 600 second wait/hold timer on JuPnP so things at least come back faster when this happens? |
Is there anyway to exclude devices (Sonos Units) that the JuPnP binding tries to update the status on. Just another option to keep the Sonos devices online with openHAB. Best, Jay |
If the JUPnP library sees devices as offline then any binding using UPnP commands will not work properly for these devices. So the last suggestion makes no real sense IMHO. |
Does this problem concern everybody or just few users ? |
I'd be happy to provide any logs you want; please let me know what logging you want DEBUG enabled on? My network is extremely solid with 3 Unifi AP's and and 3 Sonos Connect's all on the same subnet. I never have music drop outs on my 14 Sonos units due to connectivity/routing. As you can see; below this has been happening for quite some time now. Everything works fine when OH is restarted from scratch then unidentified things trigger the Sonos units to go offline into communication-error status. Sometimes they will recover (usually after 10 minutes). Certain Sonos related activities trigger a domino affect like starting and stopping Sonos units across the house (this starting/stopping can either be by the App or via OH rules) seems to make it worse quicker. https://community.openhab.org/t/sonos-communication-error-after-speaker-firmware-update/84135 Best, Jay |
Hey @lolodomo, Just turned on DEBUG on org.jupnp binding and here's just a small snapshot which all my Sonos units are in communication error now.
If I copy/paste one of the /control URL's into a browser (http://192.168.0.158:1400/DeviceProperties/Control) - here's the result which is a upnp 401 error.
Best, Jay |
I agree that I don't believe this to be a networking issue. Among all of the other evidence, my playbar seems to be the biggest offender to this issue. The oddity here is that the playbar is actually the device I have connected via Ethernet. With the Sonos wifi mesh, all of the other Sonos devices connect to my network through the playbar. Those devices seem to not be impacted by the playbar going offline. And before it's asked, yes I have unplugged the playbar and moved the Ethernet wire to three other speakers in the house and there is no change in the behavior. As far as noticing when it's offline, again I notice most on my playbar. I use the Neeo remote via my OH2 server to control volume in my livingroom (and everything else for that matter). When the playbar goes offline, I can't change the volume. So 9/10 times I notice it's offline because I'm jamming on the button and nothing is happening. The playbar has actually been offline for almost 48 hours now yet my other speakers are up and chugging passing their data through the playbar. |
What would be the best solution to check on a long period if my Sonos things are sometimes going offline ? Edit: change of thing state should be in events.log. I am going to check that. |
Please make sure your running the latest firmware for Sonos; there are some folks that have not upgraded from v8.x to 10.6.1 which this is not affecting them. How I monitor my sonos devices is two ways - OH network connection which never goes offline and one long Thing status monitoring rule. Here's the example of mine. `
end Best, Jay |
My devices are with the firmware 10.6.1. I see no status change to OFFLINE for my 3 Sonos things in my file events.log since the beginnign of February. |
One of my device is connected to my LAN with a RJ45 cable, the 2 others are connected to SonosNet without any physical connection (cable). |
That's fine, I have the same setup. What version of the jupnp and sonos binding are you using? Best, Jay |
I am running the official openHAB distribution 2.5.1. |
|
Can you confirm what version org.jupnp is under Karaf? If its higher than 2.5.2, I will upgrade my version to your level. |
I will try to manually upgrade mine to your same versions today. |
At my knowledge there was no recent changes. |
I can say this only became an issue for me when I moved from 2.4 stable up to the nightly 2.5 release when it was very early in development. my Sonos is on 10.6.1. I'm on the current OH2 2.5 release. My belief is that the issue is with JuPnP. I can have a device offline for days and restarting JuPnP instantly fixes the issue. I would expand the question to say "how many devices in your network use JuPnP" instead of just Sonos. For example, my Samsung TVs occasionally have this issue. It's less obvious because those are offline unless the TV is turned on. |
I have 224 Things in PaperUI; not all of them are tied to jupnp. Yes, I have the same issue with my Samsung binding also. What is the karaf query to find out the exact number of jupnp tied things? Best, Jay |
I'll try to pull it in later and start to burn it in. I'm on 15 days now with 2.4.0RC1 and I have no devices that are in a permanent failed state. I'd agree, what ever broke all of this came after that. |
2.4.0RC1 of the samsungTV binding in a 2.5.6 OH server ? |
Just as a reminder: if you uninstall the samsungTV binding in a OH 2.5.6 server, you have 0 problem? |
Both correct. As per discussions above, I have the jar from 2.4.0RC1 loaded on my 2.5.6 install. The sonos speakers still flap when the thread count rises suddenly but none of them completely die off like they do in newer code. I believe there are two issues here. 1) Samsung does something to cause the permanent failures. 2) there is a thread pool issue where exhaustion is occurring. This is causing the devices to flap and come right back. |
I believe this only fixed #1 above. It's been a minute since I've tested this, I can try later to confirm. |
Sorry, what does it fix only ? Your link is wrong. |
Sorry that wasnt supposed to link. I was just meaning issue 1 I noted in the post immediately above. I believe removal of Samsung only fixes the issue where the sonos speakers fall off and don't come back. I do not believe it fixes the issue where it falls off and returns after the JuPnP waitng period. |
This is very important to have a clear idea about that. The conclusion we got few months ago was that only a combination of the samsungTV and Sonos bindings was leading to the Sonos things going OFFLINE. If this is not the case, we have to know it. I don't use the SamsungTV but I use the Sonos binding and I never encounter this problem for example. If you encounter the OFFLINE effect even without the samsungTV binding, that could be due to your very big and unusual openHAB setup exhibiting resource leaks or bugs a normal user like me with only 3 Sonos things will never encounter. |
I guess I should be somewhat glad I didn't have time to do much to my openhab over the past few weeks. After 3 weeks of waiting I had 3 Sonos speakers fall off and not come back. I normally end up restarting my openhab every 2 weeks or so for one reason or another. This is with the 2.4.0RC1 SamsungTV binding in place. I will say that this was definitely a longer amount of time to wait, it used to be closer to 4-8 days. I will now remove the samsungtv binding and see if it can stay stable at all. openhab> bundle:list -s | grep -i jupnp |
With the SamsungTV binding completely removed:
|
To note for tracking, I've updated to 2.5.7 and have installed the 2.5.7 samsungtv binding openhab> bundle:list -s | grep -i jupnp |
I tried to capture this graphically with Grafana. I hope this helps to explain what I'm seeing. To note, I very intentionally disabled some thread::sleep functions in my rules to cause the spikes to happen more frequently in the hopes to get more results in a shorter time. This compares CPU threads versus devices that are offline. Notice the 4 spikes on the yellow line, each is a sonos device going offline and coming back 10 seconds later. Each time they line up with the number of CPU threads spiking up quickly. |
So your conclusion is the opposite of the one few months ago, that is the samsungtv has absolutely no impact on the problem ? |
In case the problem is the number of threads, can you check how many java threads you have ?
|
And with top, I can see my total number of threads (in the system) is around 295 on my RPI running openHAB. |
After all of this, I believe the samsungtv binding makes the failures happen more frequently. I believe that they will still happen just not as often. Remember, there are two issues. 1) devices falling off and coming back as soon as JuPnP waits. 2) devices falling off and not returning at all. With samsungtv 2.4.0RC1, it took close to 3 weeks for devices to fall off and not return. With the 2.5.x variants, that goes down to 6-10 days. You are also correct, I have a substantially higher number of threads. I have some of my threadpools dialed up a bit because I had threadpool exhaustion happening, mostly on the rules side of the house. I'll admit the pools are probably larger than they need to be, but with the exception of this issue my system is incredibly stable so I've opted to not try and dial it down. $ ls /proc/29875/task | wc My system has not had any issues handling the process load however, the host it is on has a good bit of juice. |
Hello, I was debugging this issue a little further since i'm still having out of memory issues on 2.5.8. What i've found is that if the addresses change on my eth0 interface, jupnp is basically restarted. This happens quite frequently with ipv6 and privacy extensions.
which is called from |
Interesting. |
Here it was pretty apparent from the logs. I had an additional problem on my router configuration that the RouterAdvertisements expired after 2 minutes, something i had set to debug some unrelated problem but forgot to restore, so that caused the ipv6 addresses to be removed from the interface much more often.
|
Just to note, my OH2 instance is statically IP'ed and all of my Sonos speakers have permanent assignments in my DHCP so they should never change addresses. |
@lolodomo As 3.0.0 is working it's way out, and we really never found an answer to this, would it be possible for you to commit the change to JuPnP that created the org.jupnp:retryAfterSeconds variable so that it can be rolled into 3.0.0? While it's not a fix, it at least recovers me faster than 10 minutes. As we move to 3.0.0 having the 2.4.0 Samsung binding isn't really an option. |
This issue has been mentioned on openHAB Community. There might be relevant details there: https://community.openhab.org/t/sonos-broadcasting-between-different-subnets/98739/40 |
For the record, I have the same problem (OFFLINE (COMMUNICATION_ERROR): The UPnP device RINCON_xxxxxxxxxxxxxxxx is not yet registered). My Sonos devices are powered off when not used. Restarting the org.jupnp binding fixes it immediately. |
Refer to https://community.openhab.org/t/too-much-time-before-a-sonos-thing-becomes-definitively-online/62214
For no observed reason, JuPnP seems to fail randomly and cause things to go offline. Sonos speakers seem to be the biggest culprit. Once this condition is reached, all things that rely on JuPnP seem to fall offline quickly. I've seen this happen between 3 days and 2 weeks, no specific time length causes the failure.
This can be mitigated currently by having a rule monitor getThingStatusInfo(mything).getStatus().toString() and when it goes offline it executes "executeCommandLine(”/usr/bin/ssh -p 8101 -i /home/openhab/karaf_keys/openhab.id_rsa openhab@localhost bundle:restart org.jupnp", 120000)"
The text was updated successfully, but these errors were encountered: