-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Event Hubs] Prefetch events #26065
[Event Hubs] Prefetch events #26065
Conversation
API change check APIView has identified API level changes in this PR and created following API reviews. |
|
||
### Other Changes | ||
|
||
- Use Rhea's prefetch window to prefetch events from the service. This improves the performance of the receiver by reducing the number of round trips to the service. The default prefetch window is 3 * `maxBatchSize` events. This can be configured by setting the `prefetchCount` option on the `subscribe` method on `EventHubConsumerClient`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when the user closes the consumer/subscription as soon as the prefetch receives the events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a problem in service-bus since the delivery count would be updated, probably have no effects in event-hubs, would like to understand more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The receiver will be discarded including the prefetched list of events. This is ok in Event Hubs because when you receive from a partition, you start reading from a specific event position and not necessarily from the last event fetched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no event position is specified, the reading will start from the earliest event in the partition.
@@ -123,10 +123,12 @@ testWithServiceTypes((serviceVersion) => { | |||
await receiver1.connect({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this prefetch
count support related to the user issue?
I thought the user issue was about asking for a minimum number of messages?
Also, I don't think this was a feature in 5.8.0, how was rolling back to 5.8.0 was fixing the issue for the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not related. This PR improves performance by prefetching events so chances are the received batch size will be larger than before. We still don't want to wait until we receive maxBatchSize
because that is against the library goal of maximizing throughput.
Now there is a question for you, how can we analyze the performance characteristic of this change? More specifically, can we show the average/median batch size before and after the change? can we show how often the user callback was called per minute or second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these are good metrics to track, how often the user callback is called is already being tracked in the perf test.
We would need to update the framework/test to track the batch size.
Let's work on testing this, so we don't miss anything obvious. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a stress test run for the checkpoint store test ..
Red ❤️
is the code from this PR, and Blue 💙
is the 5.10.0 version.
@deyaaeldeen and I were discussing...
And we speculate the bigger spikes in the CPU usage to be when the credits are added for the "new events to be received". With the new pre-fetch method, the spikes are a lot smaller and less dense.
The memory usage graph also validates the prefetch's improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @HarshaNalluru!
Also, the performance analysis shows that the size median of the received batches increased as well as max and mean throughput.
…ting (#26095) ## Updates to the tests following #26065 1. default test duration is set to 2 days for stress tests 2. Added a new "log-median-batch-size" option for the perf test - `log-median-batch-size` is a boolean that logs more information related to the batch size, such as median, max, average, etc. - It can be useful when relevant code is updated. - Being introduced in the perf test when the prefetch feature was added to Event Hubs. --------- Co-authored-by: Deyaaeldeen Almahallawi <dealmaha@microsoft.com>
Updates the partition receiver to use Rhea's prefetch window instead of manually fetching the exact number of events needed. It also adds a `subscribe` option, `prefetchCount`, that controls the max number of events to prefetch. Live run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=2814550&view=results Related issue: Azure#26055
…ting (Azure#26095) ## Updates to the tests following Azure#26065 1. default test duration is set to 2 days for stress tests 2. Added a new "log-median-batch-size" option for the perf test - `log-median-batch-size` is a boolean that logs more information related to the batch size, such as median, max, average, etc. - It can be useful when relevant code is updated. - Being introduced in the perf test when the prefetch feature was added to Event Hubs. --------- Co-authored-by: Deyaaeldeen Almahallawi <dealmaha@microsoft.com>
### Packages impacted by this PR @Azure/event-hubs ### Issues associated with this PR #27253 ### Describe the problem that is addressed by this PR #27253 (comment) ### What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen? ### Are there test cases added in this PR? _(If not, why?)_ To be tested using stress testing framework. UPDATE: The results are in and it is confirmed there is no more space leak! ### Provide a list of related PRs _(if any)_ #26065 ### Command used to generate this PR:**_(Applicable only to SDK release request PRs)_ ### Checklists - [x] Added impacted package name to the issue description - [ ] Does this PR needs any fixes in the SDK Generator?** _(If so, create an Issue in the [Autorest/typescript](https://github.com/Azure/autorest.typescript) repository and link it here)_ - [x] Added a changelog (if necessary)
Updates the partition receiver to use Rhea's prefetch window instead of manually fetching the exact number of events needed. It also adds a
subscribe
option,prefetchCount
, that controls the max number of events to prefetch.Live run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=2814550&view=results
Related issue: #26055