-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds ability to set timeouts for a knxdclient.KNXDConnection #2
Conversation
Welcome to Codecov 🎉Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment. Thanks for integrating Codecov - We've got you covered ☂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, Thank you very much for your contribution! :)
I like the general approach and I noticed that I have had the unaddressed TODO comment to add a timeout for the open_group_socket method all along.
I've left a few comments, because I'd like to have the architecture, how the timeouts are implemented, a bit different. After writing the comments, I'm now wondering whether we need this timeout at all: Shouldn't the TCP connection be closed, when your server reboots, such that we get an asyncio.IncompleteReadError
and the run()
coroutine raises an exception? If that doesn't work, maybe we should try to configure the TCP keepalive on the network socket?
We'd still need some internal Event to let the other coroutines fail with some Exception as well (see my inline comment for more details), instead of waiting infinitely for a packet on a broken socket connection. But it feels cleaner to rely on the operating system's detection of connection loss than applying our own (duplicated) timeout logics.
if self._timeout is not None: | ||
read_task = self._read_raw_knxpacket() | ||
data = await asyncio.wait_for(read_task, self._timeout) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cool to have some kind of "keep alive" feature here, i.e. trigger some request to KNXD every few seconds or so, so that we can be sure to receive a packet from KNXD regularly. Then we don't have to rely on KNX bus activity to avoid false timeouts and can chose an even shorter timeout value.
I will take a look at the KNXD protocol spec to see if there is a function that we can (mis)use as a keepalive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Unfortunately, according to the BCUSDK docs about EIBD's/KNXD's interface, there is no appropriate function to use as a keepalive, that reliably sends a response packet, once we opened the 'group socket' connection.
I'm somehow unconvinced to rely on KNX bus traffic to avoid running into timeouts.
Have you tested whether the event notification mechanism, you implemented today, is sufficient to detect the connection loss and stop the receive loop, when your KNXD server reboots – without the timeout here in the run()
coroutine (or with a very very long timeout)?
If it doesn't work: Can you test whether setting up TCP keepalive for the socket here allows us to detect the connection loss reliably?
sock = self._writer.get_extra_info("socket")
# The following seems to depend on your OS and is specific for Linux. See https://stackoverflow.com/q/12248132/10315508
sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, after_idle_sec)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, interval_sec)
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, max_fails)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the mechanism manually by connecting to a Debian docker image with KNXD installed and used knxtool
to simulate some traffic which then I stopped and it seems to work.
It's possible that there is some stuff about coroutines / tasks that goes over my head as I am seeing the CI/CD failing.
The definitive test will come in 5h and 20 minutes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested whether the event notification mechanism, you implemented today, is sufficient to detect the connection loss and stop the receive loop, when your KNXD server reboots – without the timeout here in the run() coroutine (or with a very very long timeout)?
It was enough as I kept receiving data throughout the night! 🎉
But I see how setting the TCP props on the socket will also be of use.
Hi, thanks a lot for your feedback. I am newish writing python. My background is more TypeScript / Kotlin during these last years. I lack knowledge on how async/await actually works in python so I just attempted to make it work. In regards this:
Well – Your judgement will be better than mine. My programming background is mostly in other languages so I just don't know the python internals about async/await. I just figured that the "program got stuck because the queue was not closed yet the connection got dropped". I'll go over your comments when I get the chance and ping you once the changes have been submitted. Again; thank you for your time and feedback and I'll take this as far as I can :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reworking your code. That looks really good to me! :)
I'd like to test if we now need the timeout mechanism at all (see my new inline comment). Apart from that, you should take a look at the failing unittest and the MyPy complaints. Then, I'm fine with merging the PR.
We'll have the official test some time from now :), once that happens I'll consider the new approval to be working. And you're right. CI/CD is failing now so I'll take a look at that tomorrow |
I can confirm it did the trick! |
test/test_client.py
Outdated
@@ -148,7 +148,7 @@ async def test_connection_failure(self) -> None: | |||
await connection.open_group_socket() | |||
finally: | |||
await connection.stop() | |||
with suppress(asyncio.CancelledError, ConnectionAbortedError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a ConnectionAbortedError
is raised here (in an explicit stop()
, I would consider this a bug, so we should not ignore the exception.
EDIT: Sorry, I misread the diff. You already removed the suppressing here again. Then, everything is fine.
knxdclient/client.py
Outdated
return await reader.readexactly(length) | ||
if self._reader is None or self._reader.at_eof(): | ||
raise ConnectionError("No connection to KNXD has been established yet or the previous connection's " | ||
"StreamReader is at EOF") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work when manually closing the connection via close()
such that no exception is raised in this case?
Then, we can remove the timeout parameter, can't we? |
I took the liberty to do some small improvements to the code style and add a test for the new behaviour. Please take a look, if this still is in your sense; then I will merge the PR. |
Good afternoon, I've been away from the computer for a bit. Thank you for your time to make these improvements and looking at the changes. The other day I was answering your comment but I did not hit send in the end; shut down the machine and forget about everything.
I meant that I just tried with my changes – didn't try with the TCP keepalive options |
First of all, thank you for creating this library. It's nice and easy to use.
Issue
I have a server running
knxd
that reboots once a day and I've noticed that if one usesiterate_group_telegrams
andknxd
reboots it leaves the coroutine hanging; awaiting for a message that will never come since the connection was dropped. This is caused because there were no timeouts in place.Suggested fix
I've added an optional parameter to the class
KNXDConnection
that can be used as a second argument toasyncio.wait_for
and it is used in places where we are reading fromknxd
. The timeout errors will allow us to break out of thewhile True
polling statements.If the
timeout
isNone
old behavior is used instead.Thank you for taking a look at this.
Miquel