-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hue] ApiException fails to reconnect bridge #15350
Comments
This issue is probably the same as this one I just noticed when getting back home after a few days away and noticing some power management not working and the bridge being offline?
|
@jlaur I think the binding DID successfully restart after the 'normal' GO_AWAY errors. But it did NOT restart after the 'Unexpected Content Type' error. As mentioned elsewhere it seems that overlapping PUT requests can lead to Unexpected Content Type so my other open PR should prevent the majority of such cases. Nevertheless it seems that Unexpected Content Type can still be triggered by something else as well. Maybe bridge firmware updating?? I have been logging to debug for about one week, but still have not tracked down what is happening. So maybe you help with logging to debug too? |
PS in case of Unexpected Content Type the handler restart code tries for 15 minutes and fails to restart. Whereas calling dispose() and initialize() does seem to fix it ('hammer' restart) .. but I am not sure why.. |
I'm on the latest firmware, but I don't auto-update firmware, so in this scenario I experienced it wasn't caused by an ongoing firmware update.
Sure, I might have a look and try to add some tailored logging for this specific scenario. I'm a little hesitant to enable full debug logging in my production system since it would need to be enabled for long time period and it's quite verbose. |
I added logging of http code and returned content just above this line: Line 954 in 3b9b023
So the next time it happens, at least I'll know that, which could give a hint. Unless you already know this? |
Well, I DO know that whenever the bridge returns HTML instead of JSON it means that it is more difficult to restart the connection than when any other error occurs. But what I do NOT know is a) why it returns HTML rather than JSON, and b) why it is more difficult to restart. |
Just for info: I was reading more about HTTP2 GO_AWAY; the Hue Bridge runs on Linux and it uses the ‘nginx’ web server; and it seems that by default the nginx server sends a GO_AWAY after 1000 GET or PUT requests. The binding GET / PUT request count is roughly as in the following formula; so it is likely that the 1000 limit will occur every 1…2 days depending on your system size and load.
I shall run some more tests on my own system to try to confirm this 1000 GET / PUT hypothesis. And if it seems to be correct, then I shall modify the binding to keep count of the number of GET / PUT calls sent, and to softly reset the HTTP2 connection on the client side BEFORE the Hue Bridge encounters the 1000 call GO_AWAY limit. That would certainly very much reduce the occurrences of the bug that we are observing. |
Thanks. Your research made some new searches possible. If you didn't already read this, this seems interesting: https://stackoverflow.com/questions/55087292/how-to-handle-http-2-goaway-with-httpclient Also confirmed that nginx has this default configuration (previously 100, now 1000). Keep in mind that other clients can also connect to the bridge, so we should aim for correct handling rather than trying to predict when it will happen. |
Actually it was the other way round. :) The post that you cited is among several that I had already googled before I made my own post above.
The 1000 limit would be the max number of streams that could be opened on any given HTTP2 session. So the presence of other clients is irrelevant in this context. Nevertheless when I make the next PR it will probably a) count the used streams and softly restart the session at (say) 90% of the 1000 limit, b) make the 1000 a config param (in case nginx changes in future), c) nevertheless process any eventual GO_AWAY, by d) checking its error code to see if it is a soft or a hard GO_AWAY and if the former we can be more soft on restart than if the latter. PS I am currently running some tests on my operative system to do more detailed logging of stream counts and soft/hard GO_AWAY occurrences; I have a rule running that power cycles a test lamp every n seconds and logging to correlate GO_AWAY with PUT call stream count.. |
^ EDIT: actually it occurs when the 999th call is made, between receiving the GET/PUT call and sending the content response. |
Problem
On GET and PUT requests, (for as yet unknown reasons), the Hue Bridge server sometimes returns an HTML payload instead of the expected XML payload. This causes an
ApiException: Unexpected Content-Type: text/html
. And this causes the bridge handler to try to reconnect to the server. It attempts to reconnect for RECONNECT_MAX_TRIES (5) using an exponential back off delay of 0, 1, 2, 4, 8 minutes. However even after the maximum number of attempts (a total of 15 minutes), the reconnection attempt fails, and the bridge goes permanently offline.Work Around
The work around is to manually disable and then reenable the bridge thing.
Solution
The bridge thing should go online automatically.
The text was updated successfully, but these errors were encountered: