You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been seeing OSError: [Errno 98] Address already in use on the "official" Sopel instance when something interrupts its ZNC connection. (Later, failure of the webhook server to bind at setup() during reconnection leads to AttributeError: 'NoneType' object has no attribute 'shutdown' the next time Sopel exits or reconnects, since this plugin's shutdown() routine doesn't correctly check if the server object is valid before trying to shut it down.)
Sopel reconnecting is very similar to what happens during stop-start or restart: It shutdown()s all plugins, waits a bit, then setup()s all the plugins again. The difference is, it's the same process running those setup()s again instead of a new one. But interestingly, it's not possible to trigger the same behavior by starting sopel, doing Ctrl-C, and immediately starting sopel again.
It seems like the listener socket (from Bottle) is getting stuck in TIME_WAIT for a few minutes because the local end initiated the closure (this is just part of how TCP works). When the whole process terminates, the leftover socket is probably freed by the OS—but not when the process stays running, i.e. during a reconnection.
I'm not quite done researching the flag SO_REUSEADDR and its possible (side-)effects or other considerations, but it might help. Probably would require a custom Bottle backend to set up the option.
The text was updated successfully, but these errors were encountered:
Oops... didn't see this issue before opening #72 😬
You bring up an interesting point I didn't really consider:
since this plugin's shutdown() routine doesn't correctly check if the server object is valid before trying to shut it down
While #72 fixes the issue of actually un-binding the socket during the plugin's shutdown(), it does't really address the point above… do you think it's worth including along side the fix in #72? I just wanted to point it out to bring it to the front of your/our attention as your review #72 (or decide to close in favor of other options, e.g., SO_REUSEADDR).
I'd call them "separate but related". Fixing things so that the server never fails to bind theoretically makes it unnecessary to check if the object is valid before trying to shut it down… but "never" is a strong word, and no matter how many checks there are, someone is going to break it somehow in the wild. It's just a matter of time.
So yeah, feel free to toss in an if server is not None (that's pseudocode!) check somewhere in the same PR, just in case it still fails to bind at some point. (Maybe later we can consider an automatic retry feature. Like at most n retries spaced m minutes apart, triggered if the server fails to bind at setup() time.)
server_close() is a solution I never discovered, because the chain of superclasses between Bottle and socketserver.BaseServer (where it's defined, AFAICT) is long and, uh, Twisted. (Sorry… but not really. 😁) It might work just fine on its own for the problem we discovered on the "official" Sopel instance, and you should feel free to pip install -e a modified version of this plugin to test it out since it runs on your infra. The GitHub notifications there are just "nice to have", since everyone with an actual interest in following development activity has GitHub email notifications turned on anyway.
In fact, I'd be perfectly happy if server_close() solves the problem on its own. There are some considerations for using SO_REUSEADDR that I'm not sure I've explored enough to fully understand the possible effects on this plugin's operation. And if I don't have to think about those at all, it leaves more time for working on the main bot code. 😉
dgw
linked a pull request
Oct 19, 2020
that will
close
this issue
We've been seeing
OSError: [Errno 98] Address already in use
on the "official" Sopel instance when something interrupts its ZNC connection. (Later, failure of the webhook server to bind atsetup()
during reconnection leads toAttributeError: 'NoneType' object has no attribute 'shutdown'
the next time Sopel exits or reconnects, since this plugin'sshutdown()
routine doesn't correctly check if the server object is valid before trying to shut it down.)Sopel reconnecting is very similar to what happens during stop-start or restart: It
shutdown()
s all plugins, waits a bit, thensetup()
s all the plugins again. The difference is, it's the same process running thosesetup()
s again instead of a new one. But interestingly, it's not possible to trigger the same behavior by startingsopel
, doing Ctrl-C, and immediately startingsopel
again.It seems like the listener socket (from Bottle) is getting stuck in
TIME_WAIT
for a few minutes because the local end initiated the closure (this is just part of how TCP works). When the whole process terminates, the leftover socket is probably freed by the OS—but not when the process stays running, i.e. during a reconnection.I'm not quite done researching the flag
SO_REUSEADDR
and its possible (side-)effects or other considerations, but it might help. Probably would require a custom Bottle backend to set up the option.The text was updated successfully, but these errors were encountered: