Skip to content

Conversation

@eliteprox
Copy link
Collaborator

@eliteprox eliteprox commented Aug 25, 2025

This pull request introduces significant improvements to error handling, background task management, and stream lifecycle events in the pytrickle library and its example usage. The changes focus on making error callbacks consistently asynchronous, improving background task cleanup, and ensuring that stream stop events are handled gracefully. Additionally, the protocol and client classes now propagate errors and shutdown events more reliably, and global exception handling for asyncio is improved to suppress expected errors during shutdown.

Error handling and callback improvements:

  • All error callbacks are now required to be asynchronous functions, simplifying error propagation and handling throughout the codebase. (pytrickle/__init__.py [1] pytrickle/client.py [2] [3] [4] [5]
  • The protocol and client now propagate protocol shutdown and subscription end events to the client via the error callback, and the client distinguishes between clean shutdown and error conditions. (pytrickle/client.py [1] [2] [3]

Background task management and cleanup:

  • Added unified tracking and cleanup of background tasks in TrickleComponent, with automatic removal and exception handling on task completion. (pytrickle/base.py [1] [2]
  • The example video processor now starts background tasks only when the event loop is running, and cleans them up when the stream stops, using the new on_stream_stop callback. (examples/process_video_example.py [1] [2] [3]

Asyncio and shutdown robustness:

  • Introduced a global asyncio exception handler to suppress expected aiohttp connection reset errors during shutdown, reducing noise in logs. (pytrickle/base.py [1] pytrickle/protocol.py [2] [3]
  • Improved the data sending loop in the client to respond to both stop and error events, ensuring reliable shutdown and error handling. (pytrickle/client.py [1] [2]

Protocol task error handling:

  • Protocol subscribe and publish tasks are now wrapped with a generic error-handling wrapper, ensuring that any exceptions are logged and propagated via error callbacks. (pytrickle/protocol.py [1] [2] [3]

These changes make the library more robust, easier to debug, and safer to use in production environments by improving error visibility and resource cleanup.

@eliteprox eliteprox linked an issue Aug 25, 2025 that may be closed by this pull request
@eliteprox eliteprox force-pushed the fix/trickle-pub-sub-component-name branch from ff709de to 0619a40 Compare August 25, 2025 16:16
@eliteprox eliteprox marked this pull request as ready for review August 25, 2025 16:16
@eliteprox eliteprox force-pushed the fix/trickle-pub-sub-component-name branch 2 times, most recently from 9cb577d to f134054 Compare August 25, 2025 18:51
add missing `component_name` to pub/sub for trickle health state
protocol: track background publisher task
@eliteprox eliteprox force-pushed the fix/trickle-pub-sub-component-name branch from f134054 to 108928c Compare August 26, 2025 19:39
logger.info("Stopping protocol due to client loops ending")

# Call the optional on_stream_stop callback before stopping protocol
if hasattr(self.frame_processor, 'on_stream_stop') and self.frame_processor.on_stream_stop:
Copy link
Contributor

@ad-astra-video ad-astra-video Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frame_processor does not have on_stream_stop right?

Copy link
Collaborator Author

@eliteprox eliteprox Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it's not an abstract method of frame_processor. It's appended by StreamProcessor on initialization like other callbacks registered to StreamProcessor.

StreamProcessor accepts an on_stream_stop callback as a parameter, similar to param_updater. _InternalFrameProcessor extends the abstract FrameProcessor class and stores the on_stream_stop callback as an attribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to include it I think we should add it to FrameProcessor similar to the error_callback. It seems strange to call something from TrickleClient that is setup in a higher level abstraction.

Let me know if I am missing something here tho.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, on_stream_stop is more of an event triggered by the client, rather than a method called by the FrameProcessor. I added it as an abstract class now so it is available to implement in frame processors f347e41

Client still needs to call on_stream_stop at this point in the protocol shutdown sequence. There is no other coordination with FrameProcessors currently afaik.

if self.frame_processor.on_stream_stop:
try:
await self.frame_processor.on_stream_stop()
logger.info("Stream stop callback executed successfully")
except Exception as e:
logger.error(f"Error in stream stop callback: {e}")
await self.protocol.stop()

Copy link
Collaborator Author

@eliteprox eliteprox Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to call something from TrickleClient that is setup in a higher level abstraction.

Yeah, it sounds off because TrickleClient was originally intended as a class for interacting with trickle protocol directly in a multimedia context. In a consumer context it's an internal component so the term client is a bit misleading.

I think once we reorganize classes into package namespaces this will be more clear

try:
await asyncio.wait_for(self.stop_event.wait(), timeout=self.send_data_interval)
break # Stop event was set, exit loop
done, pending = await asyncio.wait(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about just waiting for stop_event here?

Will error_event cause channels to close down before this could execute one last time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 2703cff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the waiting for stop/error to a function. Can you test to confirm still same behavior?

Copy link
Collaborator Author

@eliteprox eliteprox Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and it works great, no issues found in stopping the publisher. Ran a high-rate publisher test and got ~120 msg/s which was expected. One message did fail, but did not catch why, likely due to message size limit from batching

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could possibly improve this by allowing _wait_for_interval to raise it's own error

@eliteprox eliteprox changed the title fix: Resolve ConnectionResetError in publisher during client teardown fix: protocol publisher fails to report error and keeps running, blocking client Aug 27, 2025
@eliteprox eliteprox changed the title fix: protocol publisher fails to report error and keeps running, blocking client fix: protocol publisher fails to report error Aug 27, 2025
@eliteprox eliteprox requested a review from pschroedl August 28, 2025 18:44
@eliteprox eliteprox merged commit 1258cf9 into main Aug 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

runner error state when urls close down

4 participants