Error: "Address already in use" when Sopel reconnects with webhooks enabled #70

dgw · 2020-07-27T22:15:14Z

We've been seeing OSError: [Errno 98] Address already in use on the "official" Sopel instance when something interrupts its ZNC connection. (Later, failure of the webhook server to bind at setup() during reconnection leads to AttributeError: 'NoneType' object has no attribute 'shutdown' the next time Sopel exits or reconnects, since this plugin's shutdown() routine doesn't correctly check if the server object is valid before trying to shut it down.)

Sopel reconnecting is very similar to what happens during stop-start or restart: It shutdown()s all plugins, waits a bit, then setup()s all the plugins again. The difference is, it's the same process running those setup()s again instead of a new one. But interestingly, it's not possible to trigger the same behavior by starting sopel, doing Ctrl-C, and immediately starting sopel again.

It seems like the listener socket (from Bottle) is getting stuck in TIME_WAIT for a few minutes because the local end initiated the closure (this is just part of how TCP works). When the whole process terminates, the leftover socket is probably freed by the OS—but not when the process stays running, i.e. during a reconnection.

I'm not quite done researching the flag SO_REUSEADDR and its possible (side-)effects or other considerations, but it might help. Probably would require a custom Bottle backend to set up the option.

The text was updated successfully, but these errors were encountered:

HumorBaby · 2020-08-08T05:34:52Z

Oops... didn't see this issue before opening #72 😬

You bring up an interesting point I didn't really consider:

since this plugin's shutdown() routine doesn't correctly check if the server object is valid before trying to shut it down

While #72 fixes the issue of actually un-binding the socket during the plugin's shutdown(), it does't really address the point above… do you think it's worth including along side the fix in #72? I just wanted to point it out to bring it to the front of your/our attention as your review #72 (or decide to close in favor of other options, e.g., SO_REUSEADDR).

dgw · 2020-08-08T06:13:29Z

I'd call them "separate but related". Fixing things so that the server never fails to bind theoretically makes it unnecessary to check if the object is valid before trying to shut it down… but "never" is a strong word, and no matter how many checks there are, someone is going to break it somehow in the wild. It's just a matter of time.

So yeah, feel free to toss in an if server is not None (that's pseudocode!) check somewhere in the same PR, just in case it still fails to bind at some point. (Maybe later we can consider an automatic retry feature. Like at most n retries spaced m minutes apart, triggered if the server fails to bind at setup() time.)

server_close() is a solution I never discovered, because the chain of superclasses between Bottle and socketserver.BaseServer (where it's defined, AFAICT) is long and, uh, Twisted. (Sorry… but not really. 😁) It might work just fine on its own for the problem we discovered on the "official" Sopel instance, and you should feel free to pip install -e a modified version of this plugin to test it out since it runs on your infra. The GitHub notifications there are just "nice to have", since everyone with an actual interest in following development activity has GitHub email notifications turned on anyway.

In fact, I'd be perfectly happy if server_close() solves the problem on its own. There are some considerations for using SO_REUSEADDR that I'm not sure I've explored enough to fully understand the possible effects on this plugin's operation. And if I don't have to think about those at all, it leaves more time for working on the main bot code. 😉

dgw added the bug label Jul 27, 2020

dgw added this to the 0.4.0 milestone Jul 27, 2020

HumorBaby mentioned this issue Aug 8, 2020

webhook: fix error on reconnect #72

Merged

dgw linked a pull request Oct 19, 2020 that will close this issue

webhook: fix error on reconnect #72

Merged

dgw closed this as completed in #72 Oct 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: "Address already in use" when Sopel reconnects with webhooks enabled #70

Error: "Address already in use" when Sopel reconnects with webhooks enabled #70

dgw commented Jul 27, 2020

HumorBaby commented Aug 8, 2020

dgw commented Aug 8, 2020

Error: "Address already in use" when Sopel reconnects with webhooks enabled #70

Error: "Address already in use" when Sopel reconnects with webhooks enabled #70

Comments

dgw commented Jul 27, 2020

HumorBaby commented Aug 8, 2020

dgw commented Aug 8, 2020