[hue] ApiException fails to reconnect bridge #15350

andrewfg · 2023-08-01T13:30:08Z

Problem

On GET and PUT requests, (for as yet unknown reasons), the Hue Bridge server sometimes returns an HTML payload instead of the expected XML payload. This causes an ApiException: Unexpected Content-Type: text/html. And this causes the bridge handler to try to reconnect to the server. It attempts to reconnect for RECONNECT_MAX_TRIES (5) using an exponential back off delay of 0, 1, 2, 4, 8 minutes. However even after the maximum number of attempts (a total of 15 minutes), the reconnection attempt fails, and the bridge goes permanently offline.

Work Around

The work around is to manually disable and then reenable the bridge thing.

Solution

The bridge thing should go online automatically.

The text was updated successfully, but these errors were encountered:

jlaur · 2023-08-06T11:13:19Z

This issue is probably the same as this one I just noticed when getting back home after a few days away and noticing some power management not working and the bridge being offline?

2023-08-01 04:20:44.601 [WARN ] [.hue.internal.connection.Clip2Bridge] - Fatal error SessionListenerAdapter GO_AWAY => closing session.
2023-08-02 16:02:15.310 [WARN ] [.hue.internal.connection.Clip2Bridge] - Fatal error SessionListenerAdapter GO_AWAY => closing session.
2023-08-02 21:32:34.078 [WARN ] [.hue.internal.connection.Clip2Bridge] - Fatal error SessionListenerAdapter GO_AWAY => closing session.
2023-08-02 22:02:27.068 [WARN ] [e.internal.handler.Clip2ThingHandler] - Command 'ON' for thing 'hue:device:home:hueplug1', channel 'hue:device:home:hueplug1:switch' failed with error 'Unexpected Content-Type: text/html'.
2023-08-02 22:02:28.083 [WARN ] [.hue.internal.connection.Clip2Bridge] - Fatal error SessionListenerAdapter RESET => closing session.

andrewfg · 2023-08-06T13:27:51Z

@jlaur I think the binding DID successfully restart after the 'normal' GO_AWAY errors. But it did NOT restart after the 'Unexpected Content Type' error.

As mentioned elsewhere it seems that overlapping PUT requests can lead to Unexpected Content Type so my other open PR should prevent the majority of such cases.

Nevertheless it seems that Unexpected Content Type can still be triggered by something else as well. Maybe bridge firmware updating?? I have been logging to debug for about one week, but still have not tracked down what is happening. So maybe you help with logging to debug too?

andrewfg · 2023-08-06T13:32:11Z

PS in case of Unexpected Content Type the handler restart code tries for 15 minutes and fails to restart. Whereas calling dispose() and initialize() does seem to fix it ('hammer' restart) .. but I am not sure why..

jlaur · 2023-08-06T13:47:20Z

Nevertheless it seems that Unexpected Content Type can still be triggered by something else as well. Maybe bridge firmware updating??

I'm on the latest firmware, but I don't auto-update firmware, so in this scenario I experienced it wasn't caused by an ongoing firmware update.

I have been logging to debug for about one week, but still have not tracked down what is happening. So maybe you help with logging yo debug too?

Sure, I might have a look and try to add some tailored logging for this specific scenario. I'm a little hesitant to enable full debug logging in my production system since it would need to be enabled for long time period and it's quite verbose.

jlaur · 2023-08-06T18:45:07Z

I added logging of http code and returned content just above this line:

openhab-addons/bundles/org.openhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java

Line 954 in 3b9b023

throw new ApiException("Unexpected Content-Type: " + contentType);

So the next time it happens, at least I'll know that, which could give a hint. Unless you already know this?

andrewfg · 2023-08-06T23:37:56Z

Unless you already know this?

Well, I DO know that whenever the bridge returns HTML instead of JSON it means that it is more difficult to restart the connection than when any other error occurs. But what I do NOT know is a) why it returns HTML rather than JSON, and b) why it is more difficult to restart.

andrewfg · 2023-08-18T15:21:01Z

Just for info: I was reading more about HTTP2 GO_AWAY; the Hue Bridge runs on Linux and it uses the ‘nginx’ web server; and it seems that by default the nginx server sends a GO_AWAY after 1000 GET or PUT requests.

The binding GET / PUT request count is roughly as in the following formula; so it is likely that the 1000 limit will occur every 1…2 days depending on your system size and load.

GET / PUT count ~= (quantity of things x 4) + (hours up time X 24) + (number of thing commands issued)

I shall run some more tests on my own system to try to confirm this 1000 GET / PUT hypothesis. And if it seems to be correct, then I shall modify the binding to keep count of the number of GET / PUT calls sent, and to softly reset the HTTP2 connection on the client side BEFORE the Hue Bridge encounters the 1000 call GO_AWAY limit. That would certainly very much reduce the occurrences of the bug that we are observing.

jlaur · 2023-08-18T17:14:19Z

Just for info: I was reading more about HTTP2 GO_AWAY

Thanks. Your research made some new searches possible. If you didn't already read this, this seems interesting: https://stackoverflow.com/questions/55087292/how-to-handle-http-2-goaway-with-httpclient

Also confirmed that nginx has this default configuration (previously 100, now 1000).

Keep in mind that other clients can also connect to the bridge, so we should aim for correct handling rather than trying to predict when it will happen.

andrewfg · 2023-08-18T18:24:47Z

Your research made some new searches possible

Actually it was the other way round. :) The post that you cited is among several that I had already googled before I made my own post above.

Keep in mind that other clients can also connect to the bridge

The 1000 limit would be the max number of streams that could be opened on any given HTTP2 session. So the presence of other clients is irrelevant in this context. Nevertheless when I make the next PR it will probably a) count the used streams and softly restart the session at (say) 90% of the 1000 limit, b) make the 1000 a config param (in case nginx changes in future), c) nevertheless process any eventual GO_AWAY, by d) checking its error code to see if it is a soft or a hard GO_AWAY and if the former we can be more soft on restart than if the latter.

PS I am currently running some tests on my operative system to do more detailed logging of stream counts and soft/hard GO_AWAY occurrences; I have a rule running that power cycles a test lamp every n seconds and logging to correlate GO_AWAY with PUT call stream count..

andrewfg · 2023-08-19T17:23:56Z

^
In the meantime I can confirm that the GO_AWAY does indeed occur every 1000 GET / PUT calls.

EDIT: actually it occurs when the 999th call is made, between receiving the GET/PUT call and sending the content response.

andrewfg added the bug An unexpected problem or unintended behavior of an add-on label Aug 1, 2023

andrewfg changed the title ~~[hue] An ApiException during discovery scan should cause bridge thing to reconnect~~ [hue] ApiException: Unexpected Content-Type: text/html fails to reconnect bridge Aug 1, 2023

andrewfg changed the title ~~[hue] ApiException: Unexpected Content-Type: text/html fails to reconnect bridge~~ [hue] ApiException fails to reconnect bridge Aug 1, 2023

andrewfg self-assigned this Aug 2, 2023

This was referenced Aug 21, 2023

[hue] Error handling improvements #15460

Closed

[hue] Improve connection stability (API v2) #15477

Merged

jlaur closed this as completed in #15477 Aug 26, 2023

markus7017 mentioned this issue Oct 22, 2023

[shelly] BLU Motion, optimize ShellyManager for BLU devices #15401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hue] ApiException fails to reconnect bridge #15350

[hue] ApiException fails to reconnect bridge #15350

andrewfg commented Aug 1, 2023 •

edited

Loading

jlaur commented Aug 6, 2023

andrewfg commented Aug 6, 2023 •

edited

Loading

andrewfg commented Aug 6, 2023

jlaur commented Aug 6, 2023

jlaur commented Aug 6, 2023

andrewfg commented Aug 6, 2023

andrewfg commented Aug 18, 2023

jlaur commented Aug 18, 2023 •

edited

Loading

andrewfg commented Aug 18, 2023

andrewfg commented Aug 19, 2023 •

edited

Loading

[hue] ApiException fails to reconnect bridge #15350

[hue] ApiException fails to reconnect bridge #15350

Comments

andrewfg commented Aug 1, 2023 • edited Loading

Problem

Work Around

Solution

jlaur commented Aug 6, 2023

andrewfg commented Aug 6, 2023 • edited Loading

andrewfg commented Aug 6, 2023

jlaur commented Aug 6, 2023

jlaur commented Aug 6, 2023

andrewfg commented Aug 6, 2023

andrewfg commented Aug 18, 2023

jlaur commented Aug 18, 2023 • edited Loading

andrewfg commented Aug 18, 2023

andrewfg commented Aug 19, 2023 • edited Loading

andrewfg commented Aug 1, 2023 •

edited

Loading

andrewfg commented Aug 6, 2023 •

edited

Loading

jlaur commented Aug 18, 2023 •

edited

Loading

andrewfg commented Aug 19, 2023 •

edited

Loading