Adds ability to set timeouts for a knxdclient.KNXDConnection #2

mrctrifork · 2023-11-27T12:49:43Z

First of all, thank you for creating this library. It's nice and easy to use.

Issue

I have a server running knxd that reboots once a day and I've noticed that if one uses iterate_group_telegrams and knxd reboots it leaves the coroutine hanging; awaiting for a message that will never come since the connection was dropped. This is caused because there were no timeouts in place.

Suggested fix

I've added an optional parameter to the class KNXDConnection that can be used as a second argument to asyncio.wait_for and it is used in places where we are reading from knxd. The timeout errors will allow us to break out of the while True polling statements.

If the timeout is None old behavior is used instead.

Thank you for taking a look at this.

Miquel

codecov-commenter · 2023-11-27T18:34:40Z

Welcome to Codecov 🎉

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered ☂️

mhthies

First, Thank you very much for your contribution! :)

I like the general approach and I noticed that I have had the unaddressed TODO comment to add a timeout for the open_group_socket method all along.

I've left a few comments, because I'd like to have the architecture, how the timeouts are implemented, a bit different. After writing the comments, I'm now wondering whether we need this timeout at all: Shouldn't the TCP connection be closed, when your server reboots, such that we get an asyncio.IncompleteReadError and the run() coroutine raises an exception? If that doesn't work, maybe we should try to configure the TCP keepalive on the network socket?

We'd still need some internal Event to let the other coroutines fail with some Exception as well (see my inline comment for more details), instead of waiting infinitely for a packet on a broken socket connection. But it feels cleaner to rely on the operating system's detection of connection loss than applying our own (duplicated) timeout logics.

knxdclient/client.py

mhthies · 2023-11-27T19:19:17Z

knxdclient/client.py

+                if self._timeout is not None:
+                    read_task = self._read_raw_knxpacket()
+                    data = await asyncio.wait_for(read_task, self._timeout)
+                else:


It would be cool to have some kind of "keep alive" feature here, i.e. trigger some request to KNXD every few seconds or so, so that we can be sure to receive a packet from KNXD regularly. Then we don't have to rely on KNX bus activity to avoid false timeouts and can chose an even shorter timeout value.

I will take a look at the KNXD protocol spec to see if there is a function that we can (mis)use as a keepalive.

Hm. Unfortunately, according to the BCUSDK docs about EIBD's/KNXD's interface, there is no appropriate function to use as a keepalive, that reliably sends a response packet, once we opened the 'group socket' connection.

I'm somehow unconvinced to rely on KNX bus traffic to avoid running into timeouts.

Have you tested whether the event notification mechanism, you implemented today, is sufficient to detect the connection loss and stop the receive loop, when your KNXD server reboots – without the timeout here in the run() coroutine (or with a very very long timeout)?
If it doesn't work: Can you test whether setting up TCP keepalive for the socket here allows us to detect the connection loss reliably?

sock = self._writer.get_extra_info("socket") # The following seems to depend on your OS and is specific for Linux. See https://stackoverflow.com/q/12248132/10315508 sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, after_idle_sec) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, interval_sec) sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, max_fails)

I tested the mechanism manually by connecting to a Debian docker image with KNXD installed and used knxtool to simulate some traffic which then I stopped and it seems to work.

It's possible that there is some stuff about coroutines / tasks that goes over my head as I am seeing the CI/CD failing.

The definitive test will come in 5h and 20 minutes :)

Have you tested whether the event notification mechanism, you implemented today, is sufficient to detect the connection loss and stop the receive loop, when your KNXD server reboots – without the timeout here in the run() coroutine (or with a very very long timeout)?

It was enough as I kept receiving data throughout the night! 🎉

But I see how setting the TCP props on the socket will also be of use.

mrctrifork · 2023-11-28T09:00:41Z

Hi, thanks a lot for your feedback.

I am newish writing python. My background is more TypeScript / Kotlin during these last years. I lack knowledge on how async/await actually works in python so I just attempted to make it work.

In regards this:

After writing the comments, I'm now wondering whether we need this timeout at all: Shouldn't the TCP connection be closed, when your server reboots, such that we get an asyncio.IncompleteReadError and the run() coroutine raises an exception? If that doesn't work, maybe we should try to configure the TCP keepalive on the network socket?

Well – Your judgement will be better than mine. My programming background is mostly in other languages so I just don't know the python internals about async/await. I just figured that the "program got stuck because the queue was not closed yet the connection got dropped".

I'll go over your comments when I get the chance and ping you once the changes have been submitted.

Again; thank you for your time and feedback and I'll take this as far as I can :)

mhthies

Thanks for reworking your code. That looks really good to me! :)

I'd like to test if we now need the timeout mechanism at all (see my new inline comment). Apart from that, you should take a look at the failing unittest and the MyPy complaints. Then, I'm fine with merging the PR.

mrctrifork · 2023-11-28T20:43:46Z

As I've stated in this thread

We'll have the official test some time from now :), once that happens I'll consider the new approval to be working.

And you're right. CI/CD is failing now so I'll take a look at that tomorrow

mrctrifork · 2023-11-29T08:39:26Z

I can confirm it did the trick!

mhthies · 2023-11-29T18:43:08Z

test/test_client.py

@@ -148,7 +148,7 @@ async def test_connection_failure(self) -> None:
            await connection.open_group_socket()
        finally:
            await connection.stop()
-            with suppress(asyncio.CancelledError, ConnectionAbortedError):


~~If a ConnectionAbortedError is raised here (in an explicit stop(), I would consider this a bug, so we should not ignore the exception.~~

EDIT: Sorry, I misread the diff. You already removed the suppressing here again. Then, everything is fine.

mhthies · 2023-11-29T18:48:01Z

knxdclient/client.py

-        return await reader.readexactly(length)
+        if self._reader is None or self._reader.at_eof():
+            raise ConnectionError("No connection to KNXD has been established yet or the previous connection's "
+                                  "StreamReader is at EOF")


Does this work when manually closing the connection via close() such that no exception is raised in this case?

mhthies · 2023-11-29T18:52:48Z

I can confirm it did the trick!

Then, we can remove the timeout parameter, can't we?
We can also keep it, but than we should document in the docstring of the __init__method (via an @param timeout ... line) that it should not actually be required and if it is set, it will require regular KNX bus activity.

…n formatting

mhthies · 2023-11-30T19:20:19Z

I took the liberty to do some small improvements to the code style and add a test for the new behaviour. Please take a look, if this still is in your sense; then I will merge the PR.

mrctrifork · 2023-12-02T15:46:59Z

Good afternoon, I've been away from the computer for a bit. Thank you for your time to make these improvements and looking at the changes.

The other day I was answering your comment but I did not hit send in the end; shut down the machine and forget about everything.

Then, we can remove the timeout parameter, can't we?
We can also …

I meant that I just tried with my changes – didn't try with the TCP keepalive options

NanoSpicer added 5 commits November 27, 2023 13:00

Adds the ability to pass a timeout to the connection

aa34d5e

Replace complex logic into a multicatch

eba2006

Adds fix for python 3.11

b7837b0

Include null-check

42f4e45

Update readme

d1c7cca

mhthies requested changes Nov 27, 2023

View reviewed changes

NanoSpicer added 2 commits November 28, 2023 10:02

Annotates return type of "_read_raw_knxpacket"

7467f1e

Notify coroutines from iterate_group_telegrams

4a81f2c

mrctrifork requested a review from mhthies November 28, 2023 09:34

NanoSpicer added 3 commits November 28, 2023 10:42

Apply same mechanism on open_group_socket

7451cdb

Make task name shorter

a1cccfe

Fix: Bug where it would exit automatically

859ecf2

mhthies approved these changes Nov 28, 2023

View reviewed changes

NanoSpicer added 4 commits November 29, 2023 09:48

Assers self._reader is not None

aa2aa16

Suppress the connection aborted exceptions

a453745

Reset the _run_exited flag

71c959a

Replace dupplicated assertion with checks in _read_raw_knxpacket

ca9b4f4

mhthies reviewed Nov 29, 2023

View reviewed changes

NanoSpicer and others added 6 commits November 30, 2023 09:53

Adds docstring

324c8dc

Adds docstring and removes too many blank lines between methods

b2b6bf8

Fix: Linting

c38e417

Fix: Replace reader checking with not null assertion

15b2f98

Improve documentation of timeout parameter and exceptions

26e06a6

Improve code style, using string literal merging and logger's built-i…

b5882cc

…n formatting

mhthies added 4 commits November 30, 2023 20:11

Fix stale asyncio Tasks from parallel waiting

73c0aa5

Add test for timeout handling

9e2e56c

Fix tests if no knxd is present on host

43e7c0e

Improve code style

34e21ff

mhthies merged commit 08d82c6 into mhthies:master Dec 2, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds ability to set timeouts for a knxdclient.KNXDConnection #2

Adds ability to set timeouts for a knxdclient.KNXDConnection #2

mrctrifork commented Nov 27, 2023

codecov-commenter commented Nov 27, 2023

mhthies left a comment •

edited

Loading

mhthies Nov 27, 2023

mhthies Nov 28, 2023

mrctrifork Nov 28, 2023

mrctrifork Nov 29, 2023

mrctrifork commented Nov 28, 2023

mhthies left a comment

mrctrifork commented Nov 28, 2023

mrctrifork commented Nov 29, 2023

mhthies Nov 29, 2023 •

edited

Loading

mhthies Nov 29, 2023

mhthies commented Nov 29, 2023

mhthies commented Nov 30, 2023

mrctrifork commented Dec 2, 2023

Adds ability to set timeouts for a knxdclient.KNXDConnection #2

Adds ability to set timeouts for a knxdclient.KNXDConnection #2

Conversation

mrctrifork commented Nov 27, 2023

Issue

Suggested fix

codecov-commenter commented Nov 27, 2023

Welcome to Codecov 🎉

mhthies left a comment • edited Loading

Choose a reason for hiding this comment

mhthies Nov 27, 2023

Choose a reason for hiding this comment

mhthies Nov 28, 2023

Choose a reason for hiding this comment

mrctrifork Nov 28, 2023

Choose a reason for hiding this comment

mrctrifork Nov 29, 2023

Choose a reason for hiding this comment

mrctrifork commented Nov 28, 2023

mhthies left a comment

Choose a reason for hiding this comment

mrctrifork commented Nov 28, 2023

mrctrifork commented Nov 29, 2023

mhthies Nov 29, 2023 • edited Loading

Choose a reason for hiding this comment

mhthies Nov 29, 2023

Choose a reason for hiding this comment

mhthies commented Nov 29, 2023

mhthies commented Nov 30, 2023

mrctrifork commented Dec 2, 2023

mhthies left a comment •

edited

Loading

mhthies Nov 29, 2023 •

edited

Loading