-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hue] Improve connection stability (API v2) #15477
Conversation
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@jlaur / @maniac103 I am still testing this code on my operative system, but I am posting this PR so that you can a) test it for yourselves, and b) critique the code changes. |
NOTA BENE: due to the added HTTP status checking this PR now fails with '404' errors on some things due to #15468 !! |
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
TODO ... currently the code waits until it receives a GO_AWAY before recycling the session. The Throttler and SessionSynchronizer objects should prevent most conflicts during the session recycle phase. But there is still just a slim chance that if a) the binding makes two GET requests almost concurrently, and b) the bridge sends the GO_AWAY in response to the first GET, and c) the first GET takes more than 100 msec to complete, then it is possible that the second GET might pass through the Throttler lock and start executing before the session recycle process has taken over the SessionSynchronizer lock, and this would cause the GET/PUT stream count to have exceeded the nginx 1000 limit, which would in turn cause the second GET to fail catastrophically. I think the only solution is for the binding to start the session recycle process on its own side when the GET/PUT stream count is at least 3 lower than the nginx limit (3 because the binding may make up to 3 GET calls concurrently). It is tricky to imagine exactly how such timing can work, and impossible to simulate in tests. So I would appreciate your thoughts on this -- especially @maniac103 .. EDIT: resolved see next post. |
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Apropos my prior post on session recycling, I just committed a change whereby it recycles the session 6 calls before the nginx 1000 limit is reached. This eliminates the potential timing error alluded to above, since the Hue bridge server would still have 6 calls in hand before it would send the GO_AWAY message, and could therefore accept a handful of GET calls getting past the SessionSynchronizer locks. |
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
I'm not sure whether this really is the ideal solution, given the 1000 request limit can change with any bridge FW update.
Can't we detect that situation (by receiving the error and noticing the session of the failed request doesn't match the reestablished or closed session) and thus just issue a retry in that case? |
Indeed. For that reason I am recycling the session a) after My ideal hypothesis is that you -- @maniac103 -- have an amazingly bright idea to fix my synchronization code so that the current edge case risk of timing errors can be eliminated. But if you don't have any such ideas, then we have to figure out a way to ameliorate rather than eliminate the risk. I thought about adding a config param for the bridge thing whereby one could manually change the 1000 limit in case the nginx firmware did reduce that number (it would not be a problem if they increased it). However I concluded that if such a thing would happen, it would better, as a courtesy to the users, to make a new PR to change the As I write this, it occurs to me that we could even perhaps make the above behaviour self adaptive. If we see that the Hue server consistently sends GO_AWAY messages before the presumptive 1000 limit, we could dynamically reduce that limit in code.
Well we do detect the error too, and this does trigger a connection restart. But the OH handler architecture is asynchronous and blocking, so although we can BLOCK one single GET call until a recycle has completed, we cannot issue a call, receive an exception, trigger a restart, and then RE-ISSUE a duplicate of the same call. We would need to create an architecture mechanism to cache such failed calls and repeat them, after reconnection, until they succeed, or time out, or whatever. I think that would be horribly messy. |
I spent a few hours making flow charts. And as a result I am pretty sure that the GO_AWAY synchronization scheme DOES work in all cases after all. In which case you can ignore the 'nightmares' in my prior posts. However I want to complete those flow charts, and post them here for you to critique. And I also need to do some timing tests on the Hue Bridge server to determine its exact sequence of events. And study the source code of ReentrantReadWriteLock too. I will get back to you ASAP. |
I am very happy! A few things..
The above analysis and chart proves that the GO_AWAY process thread synchronization should always succeed -- specifically..
Conclusions..
|
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
^ |
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
So far all is looking good :) EDIT: just now I was able to actually observe a close batch of 4 GET requests where the 2nd request triggered the GO_AWAY limit, and I can confirm that the first two calls were made on the original session and the last two were postponed to the new session i.e. a real proof that the synchronization does work. |
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements. I'm now also running this version in my production system. I have added a few minor comments. @maniac103 - as always, thanks for reviewing.
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
...enhab.binding.hue/src/main/java/org/openhab/binding/hue/internal/connection/Clip2Bridge.java
Outdated
Show resolved
Hide resolved
@maniac103 - do you want to check your comment resolutions before merging this PR? |
@Laur Looks good to me as far as I am concerned. |
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch> Signed-off-by: Jørgen Austvik <jaustvik@acm.org>
This PR contains several improvements to improve the stability of HTTP/2 connections.
Resolves #15350
Resolves #15460 (part 2)
Related to #15468 (temporary fix)
Resolves issue with duplicate event messages after recycle as reported here
Signed-off-by: Andrew Fiddian-Green software@whitebear.ch