Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit tasks cancel #383

Merged
merged 23 commits into from
Nov 19, 2024
Merged

Conversation

Lurker00
Copy link

@Lurker00 Lurker00 commented Nov 7, 2024

I've started this set of changes when I realized that TuyaDevice._call_on_close permanently grows. Consider a WiFi battery powered T&H sensor that wakes up and sends measurements every 20 minutes, 3 times per hour. If it is configured with sleep time of 6 hours (21600 seconds), it generates 3 _shutdown_entities tasks per hour, 72 items in the TuyaDevice._call_on_close per day. Add here one _new_entity_handler per successful connect and _async_reconnect to re-connect, to have 3x72=216 items per day from one sensor alone, not counting other devices that may disconnect and re-connect. This is a memory leak.

Moreover, with this consideration, after 6 hours, 18 _shutdown_entities async tasks pilled up in the tasks queue, and then the queue always has at least 18 tasks to serve, from only one sensor. This is a waste of queue resources.

When I've implemented one TuyaDevice member per async task (_task_reconnect, _task_shutdown_entities, _unsub_new_entity) instead of _call_on_close to explicitly stop them in the close(), I've found out that cancel() does not actually stops a task which calls await asyncio.pause! asyncio.CancelledError is raised for a task only if await asyncio.pause is called in a try block, directly or indirectly. It means, for low power devices, _shutdown_entities never stopped, and _async_reconnect stopped with a delay, during LocalTuya restart or shutdown. This could cause unexpected misbehavior. Probably, this was the reason why the original #363 code didn't work for you. So, self._sleep() was implemented and tested to confirm that now all the tasks stop instantly.

Please remember that all the async tasks are now running from the same loop, meaning, e.g. when close() is running, other tasks in the queue hang on await statements. But when close() itself executes await, any other task may continue running until either its end, or another await. That's why I've inserted several and not self._is_closing checks into _make_connection(), to avoid redundant work while waiting for the _task_connect to stop. Alternatively, except asyncio.CancelledError/return could be added to each and every try block, but it would make more lines of code.

I've renamed _connect_task -> _task_connect to have uniform names.

Frankly, I never saw _new_entity_handler being called, and I don't quite understand the conditions for it to be called. But I believe it's enough to set it only once, like other HASS callbacks.

@xZetsubou
Copy link
Owner

xZetsubou commented Nov 9, 2024

Also instead of waiting close one by one for tasks why not cancel them all then wait. e.g.

        tasks = [self._task_reconnect, self._task_shutdown_entities, self._task_connect] 
        for task in tasks:
            task.cancel()

        await asyncio.gather(*tasks)

@Lurker00
Copy link
Author

Lurker00 commented Nov 9, 2024

Also instead of waiting close one by one for tasks why not cancel them all then wait. e.g.

        tasks = [self._task_reconnect, self._task_shutdown_entities, self._task_connect] 
        for task in tasks:
            task.cancel()

        await asyncio.gather(*tasks)

How it would deal with None values? A device successfully connected from the first attempt has None for all of them. Also, it is essential to completely stop _task_connect before cancelling other tasks, because _task_connect can start _task_reconnect, and _task_reconnect can start _task_shutdown_entities. So, the current order guarantees full stop.

Actually, they work almost instantly. No real performance to gain.

@Lurker00
Copy link
Author

Now I believe it is done. I even recalled about disconnected sub-devices that remove themselves from their gateways! I think the goal of this PR is met.

@Lurker00
Copy link
Author

Here is what I'm currently testing. I'm against making def connected(self) inconsistent and misleading by including the check for _is_closing: explicit checks in places where it is really required.

I can't provide a proof that the check I added
https://github.com/Lurker00/hass-localtuya/blob/709b1050e4c29e96827b149eb0cba8ae49c0e25c/custom_components/localtuya/coordinator.py#L192-L193
may actually happen, but decided that better to check than to regret, because not gateway.connected may mean self._interface is None.

@Lurker00
Copy link
Author

Lurker00 commented Nov 19, 2024

Testing BLE device detach-attach, when LocalTuya self-restarted due to local key changed for detached sub-device, I've got

2024-11-19 16:50:28.543 WARNING (MainThread) [homeassistant.config_entries] Unloading localtuya (localtuya) config entry. Task <Task pending name='None localtuya localtuya 944c1616039a1d1ec6b9745179bff74e' coro=<TuyaDevice.close() running at /config/custom_components/localtuya/coordinator.py:364> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()]> cb=[set.remove(), set.remove()]> did not complete in time

in the log. From the previous records I see that the sub-device finished updating its local key while other devices were closing. Then it should sleep for scale * RECONNECT_INTERVAL seconds, and, obviously, the sleep was not cancelled.

So, I'm still curious when a sleep is cancelled and when it does not... I'd return the _sleep with the try, like all the cancel() examples have.

This is a minor issue that need more investigation, and, I believe, does not prevent merging existing changes.

Made a fix based on this test (forgotten return).

@Lurker00
Copy link
Author

Wih the commit 08ec9e1 the problem with detached BLE sub-device has gone! So, I've done the same with _shutdown_entities to be on the safe side for sure. Also minor related changes in _make_connection.

Now it looks and acts good for my taste.

@xZetsubou
Copy link
Owner

xZetsubou commented Nov 19, 2024

So, I'm still curious when a sleep is cancelled and when it does not... I'd return the _sleep with the try, like all the cancel() examples have.

From my understanding I think this occurs in "while" loops so usually when creating a while loop the whole code should be wrapped inside try block.

I think this is good now, thank you 😸

* Merged abort connection w/ pending tasks.
* added cancled asyncio in pytuya connect.
* rename subdevice_state var/func
@xZetsubou xZetsubou merged commit 6a902e6 into xZetsubou:master Nov 19, 2024
2 checks passed
xZetsubou pushed a commit that referenced this pull request Nov 22, 2024
* abort_connect() shall be called at the end: revert #383 abort interface close before sub-devices

* subdevice_state_updated calls

* Prevent out of order commands

* Better serialization of writes

* Check for _is_closing before going to sleep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants