Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[boschshc] Fix NullPointerException during deserialization, make long polling more robust #17190

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

david-pace
Copy link
Member

@david-pace david-pace commented Aug 1, 2024

  • Fix NPE while deserializing service data JSON objects
  • Catch exceptions while handling long poll results
  • Correct @type name for user-defined states
  • Add unit tests and enhance existing unit tests

closes #17176

Test JAR can be downloaded here.

@david-pace david-pace requested a review from GerdZanker as a code owner August 1, 2024 05:37
@david-pace david-pace added the bug An unexpected problem or unintended behavior of an add-on label Aug 1, 2024
@david-pace david-pace requested review from lsiepel and jlaur August 1, 2024 05:38
Copy link
Contributor

@lsiepel lsiepel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM. You want to wait for @mike-bike test confirmation?

@david-pace
Copy link
Member Author

david-pace commented Aug 1, 2024

Thanks @lsiepel, actually this is a good idea. Let's add the additional testing preferred label. Maybe also wait for at least another reviewer to look at the code.

@david-pace david-pace added the additional testing preferred The change works for the pull request author. A test from someone else is preferred though. label Aug 1, 2024
@mike-bike
Copy link

mike-bike commented Aug 1, 2024

Ok, guys! You wanted to have another full test...

Status: Relay configured in impulse mode

  1. Deleted existing items and thing in openHAB
  2. Upgraded my test Pi2 with latest snapshot openHAB 4.3.0, Build 4207
  3. Cleared cache, loaded latest JAR (same as of Jul-30)
  4. upgrade binding
  5. relay identified as new thing --> added
  6. thing online, mode property correct, Location still with capital Bildschirmfoto 2024-08-01 um 22 04 09
  7. adding all channels as points to model (existing Relay Test group)
  8. checking newly created items
  • Signal Strength: no signal (0) (seems to be ok, as Bosch only provides update on new signal strength test); picks up correct value from connection to next node (please note, that this does not neccessarily equals Bosch App as Bosch traverses all nodes and reports weakest status of the chain
  • Child Protection: matches current state - state change in Bosch App: ok - state change in openHAB: ok
  • Impulse Switch: working in both directions
  • Impulse Length: matches current state - state change in Bosch App: ok - state change in openHAB: working with unit (s) and without (defaults to ds)
  • Instant of last Impulse: updated correctly

Summary: new instantiation with really configured in Impulse mode working correctly

CHECK!

Next test follow separately

@lsiepel lsiepel removed the additional testing preferred The change works for the pull request author. A test from someone else is preferred though. label Aug 1, 2024
@lsiepel
Copy link
Contributor

lsiepel commented Aug 1, 2024

Ok, guys! You wanted to have another full test...

Status: Relay configured in impulse mode

  1. Deleted existing items and thing in openHAB
  2. Upgraded my test Pi2 with latest snapshot openHAB 4.3.0, Build 4207
  3. Cleared cache, loaded latest JAR (same as of Jul-30)
  4. upgrade binding
  5. relay identified as new thing --> added
  6. thing online, mode property correct, Location still with capital Bildschirmfoto 2024-08-01 um 22 04 09
  7. adding all channels as points to model (existing Relay Test group)
  8. checking newly created items
  • Signal Strength: no signal (0) (seems to be ok, as Bosch only provides update on new signal strength test); picks up correct value from connection to next node (please note, that this does not neccessarily equals Bosch App as Bosch traverses all nodes and reports weakest status of the chain
  • Child Protection: matches current state - state change in Bosch App: ok - state change in openHAB: ok
  • Impulse Switch: working in both directions
  • Impulse Length: matches current state - state change in Bosch App: ok - state change in openHAB: working with unit (s) and without (defaults to ds)
  • Instant of last Impulse: updated correctly

Summary: new instantiation with really configured in Impulse mode working correctly

CHECK!

Next test follow separately

Seems like a solid set of tests, on top of the already confirmed basics tests. If you ask me this is ready to merge.
@david-pace prefers to have someone else review this before a merge, that’s fine. Ping me whenever you need me.

@mike-bike
Copy link

mike-bike commented Aug 1, 2024

Next round:

  1. Delete relay in Bosch app - message reported in log:
    (@david-pace , note this has stopped long poll for the prod instance, seems to be the root cause for [boschshc] Long polling fails after a few hours #17176. It will give a pointer to reproduce the issue. I will test that separately with the other JAR)
2024-08-01 22:42:16.122 [DEBUG] [.internal.devices.bridge.LongPolling] - Long poll response: {"result":[{"deleted":true,"@type":"DeviceServiceData","id":"ElectricalFaults","deviceId":"hdm:ZigBee:30fb10fffe46d732"}],"jsonrpc":"2.0"}

2024-08-01 22:42:16.126 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.NullPointerException: Cannot invoke "com.google.gson.JsonElement.getAsString()" because the return value of "com.google.gson.JsonObject.get(String)" is null
        at org.openhab.binding.boschshc.internal.serialization.BoschServiceDataDeserializer.deserialize(BoschServiceDataDeserializer.java:52) ~[?:?]
        at org.openhab.binding.boschshc.internal.serialization.BoschServiceDataDeserializer.deserialize(BoschServiceDataDeserializer.java:1) ~[?:?]
        at com.google.gson.internal.bind.TreeTypeAdapter.read(TreeTypeAdapter.java:76) ~[?:?]
        at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40) ~[?:?]
        at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:82) ~[?:?]
        at com.google.gson.internal.bind.CollectionTypeAdapterFactory$Adapter.read(CollectionTypeAdapterFactory.java:61) ~[?:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212) ~[?:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433) ~[?:?]
        at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393) ~[?:?]
        at com.google.gson.Gson.fromJson(Gson.java:1227) ~[?:?]
        at com.google.gson.Gson.fromJson(Gson.java:1137) ~[?:?]
        at com.google.gson.Gson.fromJson(Gson.java:1047) ~[?:?]
        at com.google.gson.Gson.fromJson(Gson.java:982) ~[?:?]
        at org.openhab.binding.boschshc.internal.devices.bridge.LongPolling.handleLongPollResponse(LongPolling.java:218) ~[?:?]
        at org.openhab.binding.boschshc.internal.devices.bridge.LongPolling.onLongPollComplete(LongPolling.java:196) ~[?:?]
        at org.openhab.binding.boschshc.internal.devices.bridge.LongPolling$1.lambda$0(LongPolling.java:169) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:840) [?:?]
  1. Thing in openHAB still online in impulse switch mode - sort of ok, why should it detect the deletion?
  2. Disabled thing and re-enabled --> Offline -> Configuration_Error -> Geräte ID ist ungültig
  3. Device in Bosch App reconfigured in Switch Mode, no switch attached, rest standard
  4. Disabled thing in openHAB and re-enabled --> Offline -> Configuration_Error -> Relay mode (power/impulse switch) change detected. Please delete and re-create this Thing. - ok
  5. unlink all items in openHAB and delete thing
  6. disabled Bosch thing and re-enabled --> Relay as new item identified
  7. new thing instantiated
  8. Property correctly set to PowerSwitch
  9. added items to new equipment
  10. Signal Strength: picked up latest value - ok
  11. Child Protection: matches current state - state change in Bosch App: ok - state change in openHAB: ok
  12. Betrieb (On/Off): matches current state - state change in Bosch App: ok - state change in openHAB: ok
  13. New Bosch feature to set shut-off timer: state change in openHAB: ok

Overall: All tests passed

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
@david-pace
Copy link
Member Author

Hi @lsiepel and @jlaur, I just rebased my commit and one build fails with an error that seems to be unrelated to my code. Maven fails while loading some property file:

java.lang.IllegalArgumentException: Malformed \uxxxx encoding.

I know this can be solved locally by deleting the offending .properties file, but who can do that on the build server? It's also tricky to find the file that causes the actual problem, I only managed it locally by debugging the Maven build. Any ideas on how to proceed here? Are the maintainers aware of the issue?

@lsiepel lsiepel added rebuild Triggers Jenkins PR build and removed rebuild Triggers Jenkins PR build labels Aug 5, 2024
@david-pace
Copy link
Member Author

david-pace commented Aug 5, 2024

Ah, looks like a simple rebuild already solved the issue, thanks @lsiepel 👍 So I guess this is not a systematic but rather a sporadic build issue?

@david-pace
Copy link
Member Author

@mike-bike are your systems stable with the latest JAR? Any NullPointerExceptions or long poll failures?

@GerdZanker or @jlaur, is it possible for one of you to review the code in the next days?

@mike-bike
Copy link

Hi, latest JAR seems to be running stable on my test PI2 with 4.3.x and my production PI4 with 4.2 release since Aug-05.

277 │ Active │  80 │ 4.3.0.202408050612    │ openHAB Add-ons :: Bundles :: Bosch Smart Home Binding

However, I have not tried to force any NullPointerExceptions since last test. I am good with current functionality.

@jlaur jlaur added rebuild Triggers Jenkins PR build and removed rebuild Triggers Jenkins PR build labels Aug 12, 2024
Copy link
Contributor

@jlaur jlaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! I have added a few small questions (all optional, nothing that blocks this PR).

* Change switch statement to switch expression
* Rewrite some test assertions using assertThat()
* Use assertNotNull() is certain cases to get rid of compiler warnings

Signed-off-by: David Pace <dev@davidpace.de>
@jlaur jlaur merged commit c2613c7 into openhab:main Aug 13, 2024
5 checks passed
@jlaur jlaur added this to the 4.3 milestone Aug 13, 2024
@david-pace david-pace deleted the 17176-long-polling branch August 14, 2024 03:32
@david-pace
Copy link
Member Author

Thank you very much for your reviews, @lsiepel and @jlaur 👍 Also big thanks to @mike-bike for the great testing support ❤️

@mike-bike
Copy link

@david-pace, many thanks for your effort to develop the code and to the others for reviewing. I am glad providing a little help by testing.
When will the new binding become available as part of the normal openHAB package? How could I find out?

@david-pace
Copy link
Member Author

david-pace commented Aug 16, 2024

I recommend to subscribe to the openHAB releases here, then you will always be notified if a new milestone build or release is published.

There also was a openHAB community thread containing some release planning information, but I can't find it anymore. @jlaur, could you please share the link again?

@jlaur
Copy link
Contributor

jlaur commented Aug 16, 2024

There also was a openHAB community thread containing some release planning information, but I can't find it anymore. @jlaur, could you please share the link again?

Maybe this?

https://community.openhab.org/t/openhab-4-2-milestone-builds/154315/5

It was specific for 4.2 though. There will be a new thread for 4.3 milestone builds. It's usually quite predictable, so expect the release build around Christmas. 🙂

@mike-bike
Copy link

mike-bike commented Aug 16, 2024 via email

@jlaur
Copy link
Contributor

jlaur commented Aug 16, 2024

Thanks, that’s fine guys. I will monitor thing availability on every update and upgrade to development JAR provided by Davied.

If you (@mike-bike, @david-pace) consider this bug critical (for sure it seems critical when it occurs, the question is how often it occurs and if it does for all users?) - and @david-pace would assess the risk of causing regressions low - we could consider cherry-picking it into 4.2.x, WDYT?

@david-pace
Copy link
Member Author

This seems like a good candidate to be cherry-picked to 4.2.x.

Unfortunately I can only speculate how often it occurs. The only way we could reproduce the error manually was a corner case in which @mike-bike deleted devices in the Bosch App. But it seems to occur sporadically in other cases. I found the NPE once in the logs of the past 3 weeks, but without indication why it was triggered. I suspect that it was also reported in the community here, but I cannot be 100% sure because we did not find any stack traces in the logs.

@mike-bike, are you aware of any other ways to prokove the NullPointerException or to kill the long polling systematically?

Regarding the regression probability - I would consider it to be low.

@jlaur jlaur changed the title [boschshc] Fix NPE during deserialization, make long polling more robust [boschshc] Fix NullPointerException during deserialization, making long polling more robust Aug 17, 2024
@jlaur
Copy link
Contributor

jlaur commented Aug 17, 2024

This seems like a good candidate to be cherry-picked to 4.2.x.

Unfortunately there are conflicts, so it cannot be cherry-picked. We can still backport to 4.2.x, but you would then have to create a new PR targeting branch 4.2.x with the changes.

@david-pace
Copy link
Member Author

ok, I will try to provide a PR as soon as I have some time. Is there some guide/tutorial how to push PRs towards maintenance branches?

@jlaur
Copy link
Contributor

jlaur commented Aug 17, 2024

ok, I will try to provide a PR as soon as I have some time. Is there some guide/tutorial how to push PRs towards maintenance branches?

I don't think so. You just need to branch out from 4.2.x and target that branch also when creating the PR.

@jlaur jlaur changed the title [boschshc] Fix NullPointerException during deserialization, making long polling more robust [boschshc] Fix NullPointerException during deserialization, make long polling more robust Aug 17, 2024
@mike-bike
Copy link

@mike-bike, are you aware of any other ways to prokove the NullPointerException or to kill the long polling systematically?

Sorry, I do not have identified any reproducible method to enforce the NullPointerException. Deleting devices seem to be covered by the current code. I have not found any further NullPointerException in the logs of last 7 days either on 4.2.x prod nor the 4.3x test environment.

@mike-bike
Copy link

I appreciate your effort to get the fix into the 4.2.x version. That would be great, but I am happy to manually load the binding after an upgrade or restart until 4.3 arrives.
Is it safe to assume, that the fix is available in the 4.3 snapshots?

@lsiepel
Copy link
Contributor

lsiepel commented Aug 18, 2024

I appreciate your effort to get the fix into the 4.2.x version. That would be great, but I am happy to manually load the binding after an upgrade or restart until 4.3 arrives. Is it safe to assume, that the fix is available in the 4.3 snapshots?

Yes, and it will also be in next 4.3 milestone.
IMO the change users will get this NPE is rather small, i can imagine it is not worth the time, but it is up to you @david-pace

david-pace added a commit to david-pace/openhab-addons that referenced this pull request Aug 19, 2024
(openhab#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
@david-pace
Copy link
Member Author

david-pace commented Aug 19, 2024

It was easy to solve the merge conflict and I created a PR for 4.2.x here 👍

david-pace added a commit to david-pace/openhab-addons that referenced this pull request Aug 20, 2024
(openhab#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
jlaur pushed a commit that referenced this pull request Aug 20, 2024
…ust (#17289)

(#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
digitaldan pushed a commit to digitaldan/openhab-addons that referenced this pull request Aug 29, 2024
…ust (openhab#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
pgfeller pushed a commit to pgfeller/openhab-addons that referenced this pull request Sep 29, 2024
…ust (openhab#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
Signed-off-by: Patrik Gfeller <patrik.gfeller@proton.me>
joni1993 pushed a commit to joni1993/openhab-addons that referenced this pull request Oct 15, 2024
…ust (openhab#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
matchews pushed a commit to matchews/openhab-addons that referenced this pull request Oct 18, 2024
…ust (openhab#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
chilobo pushed a commit to chilobo/openhab-addons that referenced this pull request Nov 20, 2024
…ust (openhab#17190)

* Fix NPE while deserializing service data JSON objects
* Catch exceptions while handling long poll results
* Correct @type name for user-defined states
* Add unit tests and enhance existing unit tests

Signed-off-by: David Pace <dev@davidpace.de>
Signed-off-by: Christian Koch <christian@koch-bensheim.de>
@jlaur jlaur added the patch A PR that has been cherry-picked to a patch release branch label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on patch A PR that has been cherry-picked to a patch release branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[boschshc] Long polling fails after a few hours
4 participants