Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irc: timeout and logs #2041

Merged
merged 8 commits into from
Apr 27, 2021
Merged

Conversation

Exirel
Copy link
Contributor

@Exirel Exirel commented Mar 16, 2021

Description

In an attempt to fix a bug (see #2038) I added more logs (not sure it will be that useful in the end) and changed slightly how the bot manage its PING and timeout:

  • remember the last time a PING was sent
  • check for PING more often
  • make the PING interval configurable (core.timeout_ping_interval)
  • check for timeout more often (but keep the same delta)
  • clear the buffer on timeout and IRC error

My current test didn't allow me to check if it actually solves the issue, but the change could help with that and it's still better than before anyway.

In theory, I could have removed the os._exit in sopel.cli.run, because locally it works quite nicely without it now, but that will be for another time.

I assign this PR to @dgw as he needs to install this branch on a test instance, and see how it behaves for a few days (hopefully, without any new issue, or at least with something more detailed).

Checklist

  • I have read CONTRIBUTING.md
  • I can and do license this contribution under the EFLv2
  • No issues are reported by make qa (runs make quality and make test)
  • I have tested the functionality of the things this change touches

@Exirel Exirel added this to the 7.1.0 milestone Mar 16, 2021
@Exirel Exirel requested a review from dgw March 16, 2021 23:29
@Exirel Exirel changed the title Backend coretasks timeout and logs irc: timeout and logs Mar 16, 2021
@dgw
Copy link
Member

dgw commented Mar 17, 2021

I'm going to put this into production on both my own bot and the bot in #sopel @ freenode.

While I started a review, I can't take the time right now to go through everything I want regarding the phrasing of log output, so I'll hold code feedback until later. The actual logic change looks ready to test, though.

@Exirel
Copy link
Contributor Author

Exirel commented Mar 26, 2021

Last commit should be the proper solution, by calling handle_close instead of close or close_when_done, the bot will now close the connection itself, and shutdown properly (timeout scheduler, job scheduler, and plugin shutdown routines).

Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some accumulated comments from my reviews in between poking our "official" instance with various revisions of this PR to see what happens. It's mostly nitpicking the wording of log messages, but I did have a couple naming/logic suggestions in there.

@Exirel
Copy link
Contributor Author

Exirel commented Apr 9, 2021

So, how are we doing so far?

@dgw
Copy link
Member

dgw commented Apr 9, 2021

Main Sopel instance has been very happy running a slightly outdated version of this PR. I think it's ready to go; want to squash out the fixup commits before I do the final review? Or are you nervous that I'll find more nitpicks? 😁

@Exirel
Copy link
Contributor Author

Exirel commented Apr 10, 2021

Or are you nervous that I'll find more nitpicks?

I'm sure you will anyway. 😛

@Exirel Exirel force-pushed the backend-coretasks-timeout-and-logs branch from a9580e7 to 7e2c810 Compare April 10, 2021 11:21
@Exirel
Copy link
Contributor Author

Exirel commented Apr 10, 2021

OK! I did squash & rebase, then I added a new commit: the last commit contains 2 modifications:

  1. switch from timeout / 2.4 to timeout * 0.45, making it easier to reason about the ratio used (1/2.4 is about 41%)
  2. updating the documentation for timeout_ping_interval so people can understand how it works and why the default value of 0

I think it's all good now! Well, until you have to fix my grammar again. 😁

Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are you nervous that I'll find more nitpicks?

I'm sure you will anyway. 😛

Damn, you were right. But no nitpicks had to do with grammar! 🥳🎉

@dgw
Copy link
Member

dgw commented Apr 11, 2021

Didn't look at the commit log while re-reviewing, but couldn't 107f516 be squashed into d837259? Looks like a fixup to me.

Exirel and others added 8 commits April 24, 2021 09:38
Co-authored-by: dgw <dgw@technobabbl.es>
Co-authored-by: dgw <dgw@technobabbl.es>
Initially, I thought that "handle_close" was called once the socket was
closed. That is not true: it it called when the socket must be closed. I
misunderstood the asynchat/asyncore interface.

Now, on a timeout, the bot will:

* close the connection
* shutdown the timeout scheduler
* shutdown the job scheduler
* call every shutdown routines

I'm glad I dig further into that problem, thanks to the time dedicated
by our users to test our dev version, and for their feedback!
…G interval

Co-authored-by: dgw <dgw@technobabbl.es>
@Exirel Exirel force-pushed the backend-coretasks-timeout-and-logs branch from eef7169 to 6325966 Compare April 24, 2021 07:54
@Exirel
Copy link
Contributor Author

Exirel commented Apr 24, 2021

Looks like a fixup to me.

Totally. All rebased & squashed.

Copy link
Member

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@dgw dgw merged commit 69927f2 into sopel-irc:master Apr 27, 2021
@Exirel Exirel deleted the backend-coretasks-timeout-and-logs branch May 1, 2021 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bot hangs on channel without restarting or quitting - Server timeout detected after 45360.59964s; closing.
2 participants