Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telegram] Rule execution aborted after strange SocketTimeoutException after switch to OH3 #11102

Closed
foobar26 opened this issue Aug 13, 2021 · 23 comments
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@foobar26
Copy link

I use the telegramAction.sendTelegram() in several of my rules. When I was running on OH2 everything worked fine. After I switched to OH3, I rarely get scripts not being fully executed and being aborted when the telegramAction is fired. I suspect the issue within the telegram binding, because I added loggings before and after the telegramAction and can see that the logging after it is not executed.

This is how the rule looks like:


   logInfo("Shadowing", "Start shadowing")
   val telegramAction = getActions("telegram","telegram:telegramBot:Telegram_Bot")
   telegramAction.sendTelegram(XXX, "Start shadowing")
   logInfo("Shadowing", "Telegram sent")


This produces the following log (as said in rare cases):

2021-08-11 08:37:25.807 [INFO ] [.openhab.core.model.script.Shadowing] Start shadowing
2021-08-11 08:38:40.367 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'shadowing-7' failed: java.net.SocketTimeoutException: timeout in shadowing

I don't see any stack trace or anything more than that in the logs.
Internet connectivity at that time was existing. OH is running on a server directly connected to the internet, so also no WIFI or something in between which might cause network issues.

Expected Behavior

rule not to be aborted, detailed error report

Current Behavior

rule being aborted after the error occurs. Telegram message not sent out.

Possible Solution

In some older posts I found there was someone providing a beta version of the binding with a changed connect timeout. I didn't find a way to configure connect timeouts for recent versions of the binding. It might help to increase the timeout maybe.

Steps to Reproduce (for Bugs)

Hard to reproduce even for me. This happens one times out of one hundred calls of the rule I would guess.

Context

In the concrete case above I just want to send Telegram messages when I start shadowing my rooms and pulling down shutters. I have a light sensor running which is monitored by OH and this triggers the shadowing after some threshold is reached (brightness over some time).
But I have a bunch of other rules running with different intent and the problem occurs randomly in all rules in very rare cases. But this makes rule execution unreliable.

Your Environment

openHAB 3.1.0 (stable)
Telegram Binding 3.1.0
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)

@foobar26 foobar26 added the bug An unexpected problem or unintended behavior of an add-on label Aug 13, 2021
@Skinah
Copy link
Contributor

Skinah commented Aug 18, 2021

From looking at this quickly SocketTimeoutException means the remote connection was made, but at some point the data stopped, so this could be a telegram server having an issue. Another cause could be your spamming the server so make sure your below one message a second and 20 messages a minute. I think this will be the cause as I did a search on the library that this binding uses and this was the only report in the projects issues.

If you can confirm it is sending too many messages that is the cause it may be possible to at least update a message in your logs that this is occurring. EDIT: not possible to catch it as the library does not throw most exceptions and catches generic exception with no handling. See https://github.com/pengrad/java-telegram-bot-api/blob/master/library/src/main/java/com/pengrad/telegrambot/impl/TelegramBotClient.java

The binding has a 75 second connection and read timeout set so that should be plenty unless such a long time trips how how a rule can stay running for.

I have never seen this issue before, but I do have debouncing on my motion sensors that trigger the messages and I only use the send video feature of the binding currently.

@foobar26
Copy link
Author

Hi Skinah,
thanks for your feedback.
I'm definitely not spamming the Telegram server. It's just one message that is being send and this is done about 4-5 times a day max.
Since I didn't change my rules after migrating from OH2 to OH3 also makes me think it has nothing to do with the rules itself, because on OH2 sending Telegram messages worked reliably.
What irritates me most is that there is no stack trace, so I'm not able to identify the root cause of the issue. Normally I would assume that the Telegram bundle would send a full stack trace to the log.
Is there anything more I could provide you with to identify the root cause?

@J-N-K
Copy link
Member

J-N-K commented Aug 21, 2021

@Skinah IMO thing actions should not throw exceptions. If the underlying library is not properly handling the exceptions, the binding should do that. We also catch Exception from Jetty, because there is no more specific type available.

@foobar26
Copy link
Author

Just an idea: Would it be possible to queue a message to be sent if the connection to the Telegram server is unavailable and send the message as soon as it recovers?
What bothers me most is that the rule is aborted if the bundle is unable to send the message because of an unavailable connection.

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/telegram-sendtelegramanswer-works-but-error-in-log-file-null-in-xxx/126039/21

@Skinah
Copy link
Contributor

Skinah commented Sep 6, 2021

OK I have made 2 changes for you that will need to be tested since I can not reproduce this here. As per @J-N-K post I have done a try/catch block for Exception so this should work and also give a WARN with more details when it happens again. I say WHEN it happens again as the exception will occur if you have a network drop out or the telegram server has an issue after the binding successfully connects to the server. However the rule should continue and finish without getting terminated, and we should get more info. I also increased the timeout from 10 seconds to 75 seconds.

There is also an exception listener setup in the initlise() of the handler. This may be taking the thing OFFLINE, so when this happens do you see the thing go offline in the event.log? There may be a more detailed message in the event.log

localBot.setUpdatesListener(this::handleUpdates, this::handleExceptions,
                getGetUpdatesRequest(config.getLongPollingTime()));

As for your questions about queuing messages, lets fix the rule getting aborted first. The actions will return a FALSE if they fail, so the rule can be made to try again very easily with this returned boolean.

@Skinah
Copy link
Contributor

Skinah commented Sep 6, 2021

Jar to try can be downloaded from http://pcmus.com/openhab/telegram/

@foobar26
Copy link
Author

foobar26 commented Sep 6, 2021

Awesome, thanks for the beta version. Will install this evening and let you know when it happens again!

@foobar26
Copy link
Author

foobar26 commented Sep 6, 2021

Just realized that you provided a OH 3.2 bundle. I'm currently running on 3.1 stable and would rather not update to a dev branch. Would you mind to send me a 3.1 compatible bundle with your changes? Thanks a lot in advance!!

@Skinah
Copy link
Contributor

Skinah commented Sep 7, 2021

That jar will work fine on 3.1 stable, the advantage of openHAB is you can mix and match bindings with most cores if there is an issue and you don't even need to reboot! Just uninstall the binding first, then unzip and place the jar into the addons folder. Everything should then just go back to how it was.

@foobar26
Copy link
Author

foobar26 commented Sep 7, 2021

Ok perfect! Thought I can just throw it in the addon-folder, but forgot to uninstall the old one. Works now. Will report back with further results...

@stefanheijnen
Copy link

Recently I noticed I didn't get all of my Telegram messages. My logs and Google brought me here. I've also installed the jar. Will let you know how things turn out.

@stefanheijnen
Copy link

I still get an error for some telegram notifications, while slightly different:

2021-09-12 08:01:16.359 [WARN ] [gram.internal.action.TelegramActions] - Exception occured whilst sending message:java.net.SocketTimeoutException: timeout
2021-09-12 08:02:31.360 [WARN ] [gram.internal.action.TelegramActions] - Exception occured whilst sending message:java.net.SocketTimeoutException: timeout

@Skinah
Copy link
Contributor

Skinah commented Sep 13, 2021

Then the changes are working, as the goal was to stop the rules and actions from crashing and a warning to appear so the below code can be used....
I am not sending any messages here that do not have attachments (camera) and these are handled with different code so I am not ruling out a bug as your cause, but the odds are higher it is a connection or server issue causing it. I'll setup some extended uses here for when my sprinklers turn on and use that to run some tests but it could be months before I know for sure. Asking others on the forum may confirm that it is no bug and you need to look at your network. What you can do is use the rule to check the Boolean result and try resending, it would be interesting to see if you then get a warn that the server can not be reached or if you get a second socket timeout in a row. Since there are 75 second timeouts, even this probably does not mean much as any short issues may be solved by the time the second try reaches 75 seconds later.

val telegramAction = getActions("telegram","telegram:telegramBot:Telegram_Bot")
var result = telegramAction.sendTelegram(XXX, "Start shadowing")
if (result==false) ...... code here to resend...

@foobar26
Copy link
Author

I can confirm that the rules are executed now if the problem occurs.
This morning the issue occurred again in a rule sending out a telegram message:

2021-09-13 08:42:59.049 [WARN ] [gram.internal.action.TelegramActions] - Exception occured whilst sending message:java.net.SocketTimeoutException: timeout
So the worst problem is solved.
Now I would really love to have this queuing in place or some kind of retry.

However, I doubt that it is a network problem on my end. The network was up and running at the time. I had downtime 30 minutes earlier, but I don't see how that could be related.

@Skinah
Copy link
Contributor

Skinah commented Sep 13, 2021

You can implement your own retry as per my last post. Do some testing and let me know what you find.

@foobar26
Copy link
Author

Yes I know, good workaround.
But I would have to copy this mechanism to a lot of rules across several files. Therefore I would love to get this feature by the bundle.

@foobar26
Copy link
Author

Don't know if this is interesting. What I realized when looking into the event log is that the telegram-Thing changed to OFFLINE earlier today, but then went back to ONLINE. That was a few hours before the above mentioned timeout occurred.
At that exact time of the timeout no change of the Thing was reported. This proves for me that there was no network issue.

2021-09-13 06:14:44.655 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'telegram:telegramBot:Telegram_Bot' changed from ONLINE to
 OFFLINE (COMMUNICATION_ERROR): java.net.SocketTimeoutException: timeout
2021-09-13 06:14:54.656 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'telegram:telegramBot:Telegram_Bot' changed from OFFLINE (COMMUNICATION_ERROR): java.net.SocketTimeoutException: timeout to ONLINE

Could it still be related to the downtime of the network and the thing changing from ONLINE to OFFLINE and back? Maybe there is some kind of "ongoing" connection that is used but not available any longer?

@Skinah
Copy link
Contributor

Skinah commented Sep 13, 2021

It is not a work around. That is how the binding was designed and written to be used that way. Since I only know less than 5% of how this binding and library works I am not able to comment much on what is possible till I gain more experience.

@foobar26
Copy link
Author

OK, sure. Sorry, I thought you are one of the main contributors to the bundle.
And of course my proposal is clearly an enhancement and not related to this bug.

@foobar26
Copy link
Author

foobar26 commented Nov 9, 2021

The original issue (rule aborts) is fixed, so I close this issue.

@foobar26 foobar26 closed this as completed Nov 9, 2021
@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/telegram-sendtelegramquery-not-working-after-long-inactive-time/136353/1

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/telegram-sendtelegramquery-not-working-after-long-inactive-time/136353/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

No branches or pull requests

5 participants