Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent task cancellation from propagating to ASH #628

Merged
merged 3 commits into from
Jun 14, 2024

Conversation

puddly
Copy link
Contributor

@puddly puddly commented Jun 13, 2024

Root cause of home-assistant/core#119424.

During device joining, zigpy cancels a scheduled initialization task if the device re-joins during initialization (this is pretty common). Unfortunately, this cancellation propagates all the way down to the ASH sending task, causing the TX sequence number to increment without waiting for an acknowledgement. There is currently a firmware bug with EmberZNet and while ASH can support multiple pending frames at a time, in reality the stack crashes if the number is greater than one 😄.

This fix prevents an ASH send from being cancelled by using asyncio.shield, which schedules it in a task. Incrementing the TX sequence number after a frame has been sent will have similar issues because our last send may not have been ACKed. An alternative to this approach would be to avoid using coroutines entirely for sending and make the ASH protocol implementation rely on event loop callbacks for timeouts.

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.72%. Comparing base (09cf7ce) to head (e000846).

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #628   +/-   ##
=======================================
  Coverage   99.72%   99.72%           
=======================================
  Files          75       75           
  Lines        5002     5016   +14     
=======================================
+ Hits         4988     5002   +14     
  Misses         14       14           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tube0013
Copy link
Contributor

2 hours in this is working well, no more NCP failures.

@TheJulianJES TheJulianJES merged commit 3657cf3 into zigpy:dev Jun 14, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants