Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[powermax] Improved error handling for Powermax binding #9817

Merged
merged 5 commits into from
Feb 8, 2021

Conversation

ronisaacson
Copy link
Contributor

Catch all exceptions, and take all Powermax things offline in the event
of a communication failure. Also adds a 5-minute watchdog for
connections in Standard (non-Powerlink) mode.

Signed-off-by: Ron Isaacson isaacson.ron@gmail.com

Closes #9688

Catch all exceptions, and take all Powermax things offline in the event
of a communication failure. Also adds a 5-minute watchdog for
connections in Standard (non-Powerlink) mode.

Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
@Hilbrand Hilbrand added the enhancement An enhancement or new feature for an existing add-on label Jan 20, 2021
Copy link
Member

@Hilbrand Hilbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've changed a lot of log messages from debug to info or warn. In general if an error occurs the message should be reflected in the status. Also we try to minimize the logging in bindings. See also https://www.openhab.org/docs/developer/guidelines.html#f-logging . There I think you should not change the log levels here.

@lolodomo
Copy link
Contributor

I did not yet find the time to look at this PR but please note I absolutely don't want that a CRC error leads to all things going OFFLINE. This will simply break the binding!
For an unexplained reason, it sometimes happen (not often) especially when the panel setup is downloaded.

@ronisaacson
Copy link
Contributor Author

I did not yet find the time to look at this PR but please note I absolutely don't want that a CRC error leads to all things going OFFLINE.

I can confirm, I see that CRC error occasionally too. Currently it only goes offline in the event of an I/O exception where communication with the panel is lost. The 5-minute keepalive in standard mode is needed to ensure that the binding attempts to talk to the panel regularly, because otherwise we might not know that the connection is lost.

I'm going to make the changes requested by Hilbrand and I might also add a check that goes offline if there are no messages received in a period of time. That would catch the case where the serial connection is still active but the panel just stops responding. I think this happened to me once.

@ronisaacson
Copy link
Contributor Author

So, digging into this a bit more... there were a bunch of error messages being swallowed before. See here for example. If the connection failed for any of those reasons, the exact reason was logged only at debug level, and the binding would give you a generic failure message. Switching them all from debug to warn was the easiest change, but not consistent with the principle @Hilbrand mentioned above.

I can work on propagating these messages up to the bridge state but it will be a bit more involved.

Copy link
Contributor

@lolodomo lolodomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one important question.

Regarding log level, almost all logs are DEBUG very probably because I was requested to do it like that during the review process of the binding. If every binding tries to use INFO/WARN logging, you have by default a lot of logs when starting openHAB. The idea is I believe to have a clean log when starting OH. And you have to take care to not have by default regular logging just for something which is finally normal.

@ronisaacson ronisaacson marked this pull request as draft January 24, 2021 20:31
@ronisaacson
Copy link
Contributor Author

CRC error

Incidentally @lolodomo, this is the one CRC error I see consistently, every time the panel exits Downloading mode:

Powermax alarm binding: message CRC check failed (expected 56, got 38, message 0DF1074300008B560A)

In this client, they skip the CRC check on 0xF1 messages. Maybe it's a panel bug? Nobody seems to know what's in the 0xF1 messages anyway, so maybe I'll put in the same fix.

I also noticed that my settings download fails every time and forces unnecessary retries:

2021-01-27 17:19:47.878 [DEBUG] [internal.state.PowermaxPanelSettings] - readSettings(3, 0, 46): missing data
2021-01-27 17:19:47.878 [DEBUG] [internal.state.PowermaxPanelSettings] - Cannot get partitions information
2021-01-27 17:19:47.880 [INFO ] [ternal.handler.PowermaxBridgeHandler] - Powermax alarm binding: setup download failed!

This is because I have a Powermax+ which doesn't support partitions. I'm going to add a check which skips the DL_PARTITIONS step if the panel type only has one partition.

1. Mark the bridge and all things as offline when communication is lost
2. Bubble all I/O exceptions up to the bridge status
3. Set all channels to UnDefType.NULL when bridge goes offline
4. Ignore CRC errors on messages of type 0xF1
5. Don't try and download partition settings for panels that don't support partitions
6. Fix logging levels to avoid unnecessary non-DEBUG messages

Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
@ronisaacson ronisaacson marked this pull request as ready for review January 29, 2021 04:51
Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
Copy link
Contributor

@lolodomo lolodomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ronisaacson
Copy link
Contributor Author

@Hilbrand - ready to merge if you're good.

@ronisaacson
Copy link
Contributor Author

@Hilbrand - any additional comments?

Copy link
Member

@Hilbrand Hilbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Although the change of the log to info could be better. Because these are now on by default the messages itself don't give any context about what thing is given these messages. So for the user it might not directly clear where these messages come from. Might be something to see if people will notice mention this.

@Hilbrand Hilbrand merged commit 7898f4f into openhab:main Feb 8, 2021
@Hilbrand Hilbrand added this to the 3.1 milestone Feb 8, 2021
@ronisaacson ronisaacson deleted the powermax_reconnect branch February 9, 2021 04:15
themillhousegroup pushed a commit to themillhousegroup/openhab2-addons that referenced this pull request May 10, 2021
* Improved error handling for Powermax binding

Catch all exceptions, and take all Powermax things offline in the event
of a communication failure. Also adds a 5-minute watchdog for
connections in Standard (non-Powerlink) mode.

* More Powermax error handling improvements

1. Mark the bridge and all things as offline when communication is lost
2. Bubble all I/O exceptions up to the bridge status
3. Set all channels to UnDefType.NULL when bridge goes offline
4. Ignore CRC errors on messages of type 0xF1
5. Don't try and download partition settings for panels that don't support partitions
6. Fix logging levels to avoid unnecessary non-DEBUG messages

Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
Signed-off-by: John Marshall <john.marshall.au@gmail.com>
thinkingstone pushed a commit to thinkingstone/openhab-addons that referenced this pull request Nov 7, 2021
* Improved error handling for Powermax binding

Catch all exceptions, and take all Powermax things offline in the event
of a communication failure. Also adds a 5-minute watchdog for
connections in Standard (non-Powerlink) mode.

* More Powermax error handling improvements

1. Mark the bridge and all things as offline when communication is lost
2. Bubble all I/O exceptions up to the bridge status
3. Set all channels to UnDefType.NULL when bridge goes offline
4. Ignore CRC errors on messages of type 0xF1
5. Don't try and download partition settings for panels that don't support partitions
6. Fix logging levels to avoid unnecessary non-DEBUG messages

Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
marcfischerboschio pushed a commit to bosch-io/openhab-addons that referenced this pull request May 5, 2022
* Improved error handling for Powermax binding

Catch all exceptions, and take all Powermax things offline in the event
of a communication failure. Also adds a 5-minute watchdog for
connections in Standard (non-Powerlink) mode.

* More Powermax error handling improvements

1. Mark the bridge and all things as offline when communication is lost
2. Bubble all I/O exceptions up to the bridge status
3. Set all channels to UnDefType.NULL when bridge goes offline
4. Ignore CRC errors on messages of type 0xF1
5. Don't try and download partition settings for panels that don't support partitions
6. Fix logging levels to avoid unnecessary non-DEBUG messages

Signed-off-by: Ron Isaacson <isaacson.ron@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature for an existing add-on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[powermax] Add a catch-all exception handler
3 participants