Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EventHubs] Buffered Producer back-pressure handling strategy #23909

Closed
yunhaoling opened this issue Apr 8, 2022 · 0 comments
Closed

[EventHubs] Buffered Producer back-pressure handling strategy #23909

yunhaoling opened this issue Apr 8, 2022 · 0 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Event Hubs Messaging Messaging crew
Milestone

Comments

@yunhaoling
Copy link
Contributor

yunhaoling commented Apr 8, 2022

Issue: Back-pressure

How should the EventHubProducerClient in buffered mode behave against the scenario of enqueuing events when the buffer is full/not having enough room for the coming events.

(Originally from comment in PR)

Anna's comment:

What's interesting is that triggering this flush will be blocking....
I wonder if the background worker should be monitoring the buffer size and trigger a background flush when this size is hit - rather than waiting until the next put_events request to see that it's full and block until it's flushed?
What if I didn't set a max_wait_time, only a max_buffer_length - but then only add an event once per minute? We wouldn't check if the buffer was full until a minute after it had been full?
Looks like this can't happen as both max_wait_time and max_buffer_length must both have values - though depending on those values, we could end up waiting before we realize the buffer has been sitting full since the last event was added.

sample cases:

max_buffer_length is 100, there's 50 events in the buffer, and 60 coming


Scope

  1. the timeout behavior
  2. the back-pressure handling strategy

Current implementation/behavior:

when EventHubProducerClient works in buffered mode, and.send_event/batch is called with timeout, timeout is used to control the enqueue operation, not the send operation.

if there're fewer slots in the buffer than the incoming events, flush first with the given timeout, then check timeout, if times out then we raise error, else we put events into buffer.

pseudocode:

if there_is_room_in_buffer:
    enqueue_events()
else:
    flush(timeout)  # clear the buffer
    if check_time_out_after_flush:
        raise timeout
    else:
        enqueue_events()

Other option

  1. In general, we need a way for EventHubProducerClient.send_event/batch in buffered mode to flush the buffer queue.
    if not via proactive call, we could use flag/condition to decouple send_event/batch and the flush operation.

we could add another flag and reuse the check_max_wait_time_future thread for flushing, it's almost the same effect as the current implementation, but decouple flush from enqueue_events. but this way, the timing is not perfectly controlled as check_max_wait_time_future will sleep periodically or we change the sleep period simultaneously.

self._need_flush = threading.Event()  # or Condition

if there_is_room_in_buffer:
    enqueue_events()
else:
    try:
        self._need_flush.set()
        self._not_full.wait()
        if check_time_out_after_flush:
            raise timeout
        else:
            enqueue_events()
    finally:
        self._need_flush = False
  1. we introduce throttling mechanism.

having another background task monitoring the load of the buffer, if it's 70% full, then we flush

  1. hybrid 1 + 2

option2 still doesn't handle well with corner cases like if the buffer is 65% full, however the incoming events will take more 35% of the capacify, in this scenario, we still need flush.


Summary

the proactive flush should be good to start with, it's easy to explain and the implementation straightforward.
However, depending on the changing customer requirements, we might want to consider other options.

@ghost ghost added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 8, 2022
@azure-sdk azure-sdk added Client This issue points to a problem in the data-plane of the library. Event Hubs needs-team-triage Workflow: This issue needs the team to triage. labels Apr 8, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 8, 2022
@yunhaoling yunhaoling added the Messaging Messaging crew label Apr 8, 2022
@yunhaoling yunhaoling removed the needs-team-triage Workflow: This issue needs the team to triage. label Apr 8, 2022
@yunhaoling yunhaoling added this to the Backlog milestone Apr 13, 2022
@kashifkhan kashifkhan modified the milestones: Backlog, [2022] May Apr 26, 2022
@lmazuel lmazuel modified the milestones: [2022] May, [2022] June May 16, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Event Hubs Messaging Messaging crew
Projects
None yet
Development

No branches or pull requests

4 participants