-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventHub: consumer memory issues processing event backlog #27253
Comments
Thanks for filing this issue. Did you try any other versions? such as 5.9? Does setting the prefetch count help? |
We only tried versions:
We are not defining the We did not test version Thanks for your support. |
I'm following up on @thomasstoermer issue: |
Have you been able to reproduce the issue? |
@thomasstoermer thanks for your patience, we did reproduce the issue. Will post the investigation updates once we have anything more significant. |
Hi @thomasstoermer and @ManuelTreu, I would like to provide you with a brief update regarding the ongoing issue. Upon further investigation, @HarshaNalluru and I have identified that the root cause of the problem lies in the enabling of automatic flow management in Rhea, as detailed in this GitHub pull request: Prefetch Events. To clarify, Rhea currently forwards events to the consumer client as soon as they are received. Enabling automatic flow management in Rhea results in forwarding all events to the client, as soon as they're received. Consequently, if the processEvents handler is unable to process incoming events promptly, they accumulate in the client's internal queue, leading to a memory leak. As a temporary solution, you should be able to upgrade to v5.10.0 without encountering issues since this feature was introduced in v5.11.0. In the meantime, I will consult with our team to determine the best course of action to address this issue. My initial recommendation is to implement custom prefetching in the client itself to ensure that the internal queue does not become overloaded. Again, thanks for your patience and I'll keep you posted. |
Thanks for the update @deyaaeldeen and provided details 👍 Unfortunately, v5.10.0 will not work for us, because of the other previous issue: #25928 |
### Packages impacted by this PR @Azure/event-hubs ### Issues associated with this PR #27253 ### Describe the problem that is addressed by this PR #27253 (comment) ### What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen? ### Are there test cases added in this PR? _(If not, why?)_ To be tested using stress testing framework. UPDATE: The results are in and it is confirmed there is no more space leak! ### Provide a list of related PRs _(if any)_ #26065 ### Command used to generate this PR:**_(Applicable only to SDK release request PRs)_ ### Checklists - [x] Added impacted package name to the issue description - [ ] Does this PR needs any fixes in the SDK Generator?** _(If so, create an Issue in the [Autorest/typescript](https://github.com/Azure/autorest.typescript) repository and link it here)_ - [x] Added a changelog (if necessary)
@azure/event-hub@5.11.3 has been released and it fixes this issue 😊 Please let me know if you have any questions! |
Hi @thomasstoermer. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation. |
5.11.1
,5.11.2
node:18.17.1-bullseye-slim
4.7.4
Describe the bug
We observe event hub consumer client issues in case of processing a large event backlog on several partitions. The client starts to consume high amount of memory until reaching limits and triggering pod restarts in Kubernetes (limit already increased to 4GB and still restarting).
The issue is resolved by using old version
@azure/event-hubs: 5.8.0
running stable without any memory problem.To Reproduce
Steps to reproduce the behavior:
setTimeout(x)
). Because a production client might e.g. wait for DB operation to complete.Expected behavior
We expect the consumer client will run stable in case of processing a huge event backlog on several partitions - as it did with the old version
5.8.0
. Otherwise the event hub client cannot be operated stable in production.Additional context
loadBalancingOptions: { strategy: "greedy" }
{ maxWaitTimeInSeconds: 0.5, maxBatchSize: 100}
The text was updated successfully, but these errors were encountered: