-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bot does not recover properly when using infinity_polling #1058
Comments
Bot is threaded or not threaded? |
BTW. Could you please try to use the most recent branch: |
The way it looks right now is it that I can't replicate the problem while testing, I will check with the production system tomorrow. |
I tested around, looks like the problem is not from the collision but from something else, I did not spot the 20min difference in the logs. I will investigate further, looks like there are two problems on a long running instance (>30min), might be similar to #1057, but I could not confirm yet. Problem 1 is causing restarts without any hints in the log, Problem 2 leads to a recursion error and then restarts as seen below: Im still on the current release instead of the newest git version (need to resolve some conflicts first). Probably another cause, but no recovery as a result either. Will try to test with the newest version the next few days.
|
Okay, installed and tested it, but had to remove the egg. It still occurs, the newest version still contains these (2?) bugs with the same effect that restarting is an endless loop. @Badiboy any other ideas? |
@ModischFabrications I'll try to check, had no time yet. But I will. :) |
Hey, do you have an estimate on when you can look into it? No pressure, I'm just evaluating when my bot is available to be deployed permanently. |
3.7.5 is released several days ago. You can try general update to be sure it's the latest. |
Sorry, seems like a misunderstanding. I could not reproduce the error from the collision either, but had it on a long running instance (>30min). #1058 (comment) described it best it think. Your test setup seems to be very similar, have you tried to let it run for longer? Thanks for the thorough investigation! I will test the newest version now and update this comment as soon as I hit the bug. -- Update:
|
This is the problem 1? I don't see any problem here. It's very hard to catch what IS THE PROBLEM. Problem 1, problem 2, cannot reproduce, not this problem... |
Sorry, it's a mix-up of language barrier, wrong assumptions (that it was caused by a collision) and problems that are difficult to trace (>30min runtime). The collision is not a problem anymore, infinity_polling handles that restart. The ConnectionResetError is also circumvented by using infinity_polling, which I am using, so it shouldn't be part of the problem either. Let me try to explain the problem better:
I'm not sure I understand that correctly. Do you mean you added more logging and the problem should be more visible now while testing with the newest version? Would it have been easier to follow if I had fixed the wrong assumptions at the start instead of appending new discoveries? I am still testing to see if it comes up again in *.5. |
Hmmm, seems I see it also. I'll go deeper. |
Seems that I got it. Try the GIT version. |
Awesome! Looks like the recursion error is fixed in the new version. Version is '3.7.5.u2' btw, but it worked, so no problem there. Long running instances still encounter a ConnectionReset on the first message after a long runtime/break and drop the affected message even with the newer version, but this is a new problem in of itself and deserves a new issue:
For clarification: The restart itself works, the dropped message is the problem. |
Version u2 appears later with aniother fix, so it's ok :) Regarding this:
Currently I have a bot running for ~20 hours in polling mode (usually I use webhooks): faced no problems yet... |
The closed connection exception is only triggered when sending a message to the bot. This is how I trigger it:
Should I post this as it's own issue? This issue is unrelated to the recovery from the start and others looking for the same problem won't find it here. |
I know that some users face the problem, but I cannot see it myself. Can you run bot with "DEBUG" logging level? It should log traceback on exception now, may be we'll see something helpful there... |
This is the error with a stack trace, feel free to tell me if you need more. It went on with more "another exception occurred", but this should capture the essential parts. |
I see here that error occurs not when library queries the updates, but when your bot answering to user. So to avoid reply lost when sending message you can catch thhis exception and re-send the message. If you want to go deeper to the reasons. When library sends any request it creates the requests session and stores it. When you send next request library uses existing session for optimization. It seems that for rarely used bots and the Session may die while waiting. I suppose it may depend on various operationg system settings: for some cases it happens, for some it does not. As far as this problem occurs for some library users, I think it may be reasonable to create an option for rarely used bots which will force re-creation of session on every request. It should help for cases like yours. If you think it will help for your case - we can try implementing this. |
Good eye! Session timeout makes sense, this also explains why the error occurs only when answering after a long delay. For reference: This error happened both on Win10 and Raspbian. I can only agree with your way of thought that fixing it will improve stability, especially for newcomers. As for the flag for a forced recreation of sessions on every message: Recreating for every request will fix the problem, but will introduce a tradeoff decision between stability and performance, which might hinder dynamic upscaling with usage. If I understand the situation correctly, there are ways to combine both without compromising: I haven't looked into the code to evaluate these options, so please take these suggestions with a grain of salt. Option B feels like the cleanest imho. I would appreciate either solution. |
#1077
I went exactly to the same point. Let me know if that help you to find your way. :) |
Good news: I set the session time to 5min and it seems like the ConnectionReset is gone. Bad news: From time to time the bot shows a collision with another instance without me actually starting another one. This happened on u2 as well, I just thought it was human error. I will doublecheck that it isn't a human error, nonetheless all the other fixes are looking good.
|
Oh crap, I see the problem now, the bot ID and secret token is printed in plaintext in the logs. I requested a new token, this will fix both a forgotten running instance and someone else trying to misuse my token. Im going to test it with a long running instance tomorrow, but I think all those bugs should be resolved now :) |
Yep, I don't like this. "We need to go deeper" (c)
Please check: did you really set it for 5 minutes (SESSION_TIME_TO_LIVE = 300)? Not to 5? |
I checked, seems to be correct.
I'm pretty certain that it was a human error, requesting a new token (due to the assumed leak from the log) seems to have fixed it. I will test more extensively tomorrow, but everything should be resolved. |
OK, thank you. Seems that we are very close to the victory. :) |
I haven't found any more occurrences of the bugs, which means we were victorious at last 🎉 I am happy that we were able to find and fix all these issues. I will send in another pull request with additions (bot reference and apihelper settings) once the new version is released and I was able to finish my bot. Feel free to close this issue. |
Gratz. 👍 |
THX!! it save my day. |
What version of pyTelegramBotAPI are you using? 3.7.4
What OS are you using? Raspbian
What version of python are you using? 3.6.9
Testing out
bot.infinity_polling()
as recommended in #1057 I noticed a bug in the recovery from exceptions:It seems like the bot does not recover completely from a collision from another bot instance. It crashes and restarts successfully, which is nice, but it seems to get stuck in a restart loop every 3s (sleep time on recovery). I would guess there is an implicit state that carries over the restart that still contains the cancel instruction, but I haven't found anything obvious.
This bug can be bypassed by not starting another instance with the same token, but I assume that the same bug will also strike with other exceptions, unrelated to this cause.
Reproduction:
bot.infinity_polling()
and logging.INFOlog instance 1:
log instance 2:
PS: Happy new year and thanks for the fast responses!
The text was updated successfully, but these errors were encountered: