EventHub: consumer memory issues processing event backlog #27253

thomasstoermer · 2023-09-26T16:27:44Z

Package Name: @azure/event-hubs
Package Version: 5.11.1, 5.11.2
Operating system:
nodejs
- version: Docker: node:18.17.1-bullseye-slim
browser
- name/version:
typescript
- version: 4.7.4
Is the bug related to documentation in
- README.md
- source code documentation
- SDK API docs on https://docs.microsoft.com

Describe the bug
We observe event hub consumer client issues in case of processing a large event backlog on several partitions. The client starts to consume high amount of memory until reaching limits and triggering pod restarts in Kubernetes (limit already increased to 4GB and still restarting).
The issue is resolved by using old version @azure/event-hubs: 5.8.0 running stable without any memory problem.

To Reproduce
Steps to reproduce the behavior:

Event Hub with multiple partitions (32).
Stopping Event Hub consumer and producing backlog of events on all partitions (e.g. 100,000 events per partition, event body >500 Bytes).
Start one Event Hub consumer with checkpointstore processing all partitions.
Important: The consumer test client must simulate delay for event processing (e.g. setTimeout(x)). Because a production client might e.g. wait for DB operation to complete.
Observe heavy memory consumption until limits reached.

Expected behavior
We expect the consumer client will run stable in case of processing a huge event backlog on several partitions - as it did with the old version 5.8.0. Otherwise the event hub client cannot be operated stable in production.

Additional context

Consumer is created with:
- loadBalancingOptions: { strategy: "greedy" }
- Checkpointstore on Azure storage
- SubscribeOptions: { maxWaitTimeInSeconds: 0.5, maxBatchSize: 100}

The text was updated successfully, but these errors were encountered:

deyaaeldeen · 2023-09-28T15:48:41Z

Thanks for filing this issue. Did you try any other versions? such as 5.9? Does setting the prefetch count help?

thomasstoermer · 2023-09-28T16:40:25Z

We only tried versions:

5.11.1 ❌
5.11.2 ❌
5.8.0 ✅

We are not defining the prefetchCount, but as far as I see it will get a default based on the maxBatchSize. But we can run another test with defining the prefetchCount.

We did not test version 5.9.0, because it had another memory issue fixed in the past.

Thanks for your support.

ManuelTreu · 2023-09-29T12:05:56Z

I'm following up on @thomasstoermer issue:
We have now tested it with 5.9.0 and have not found any memory problems.
We have also tested version 5.11.2 with prefetchCount = 50 and 100 and still see Out Of Memory problems.

thomasstoermer · 2023-10-30T18:37:17Z

Have you been able to reproduce the issue?

HarshaNalluru · 2023-10-30T22:11:24Z

@thomasstoermer thanks for your patience, we did reproduce the issue.

Will post the investigation updates once we have anything more significant.

deyaaeldeen · 2023-11-03T00:44:02Z

Hi @thomasstoermer and @ManuelTreu,

I would like to provide you with a brief update regarding the ongoing issue. Upon further investigation, @HarshaNalluru and I have identified that the root cause of the problem lies in the enabling of automatic flow management in Rhea, as detailed in this GitHub pull request: Prefetch Events.

To clarify, Rhea currently forwards events to the consumer client as soon as they are received. Enabling automatic flow management in Rhea results in forwarding all events to the client, as soon as they're received. Consequently, if the processEvents handler is unable to process incoming events promptly, they accumulate in the client's internal queue, leading to a memory leak.

As a temporary solution, you should be able to upgrade to v5.10.0 without encountering issues since this feature was introduced in v5.11.0. In the meantime, I will consult with our team to determine the best course of action to address this issue. My initial recommendation is to implement custom prefetching in the client itself to ensure that the internal queue does not become overloaded.

Again, thanks for your patience and I'll keep you posted.

thomasstoermer · 2023-11-03T12:51:21Z

Thanks for the update @deyaaeldeen and provided details 👍

Unfortunately, v5.10.0 will not work for us, because of the other previous issue: #25928

### Packages impacted by this PR @Azure/event-hubs ### Issues associated with this PR #27253 ### Describe the problem that is addressed by this PR #27253 (comment) ### What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen? ### Are there test cases added in this PR? _(If not, why?)_ To be tested using stress testing framework. UPDATE: The results are in and it is confirmed there is no more space leak! ### Provide a list of related PRs _(if any)_ #26065 ### Command used to generate this PR:**_(Applicable only to SDK release request PRs)_ ### Checklists - [x] Added impacted package name to the issue description - [ ] Does this PR needs any fixes in the SDK Generator?** _(If so, create an Issue in the [Autorest/typescript](https://github.com/Azure/autorest.typescript) repository and link it here)_ - [x] Added a changelog (if necessary)

deyaaeldeen · 2023-11-07T21:59:04Z

@azure/event-hub@5.11.3 has been released and it fixes this issue 😊 Please let me know if you have any questions!

github-actions · 2023-11-07T21:59:40Z

Hi @thomasstoermer. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

deyaaeldeen added this to Azure SDK for Event Hubs Sep 26, 2023

deyaaeldeen self-assigned this Sep 26, 2023

github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Sep 26, 2023

HarshaNalluru self-assigned this Sep 27, 2023

HarshaNalluru mentioned this issue Oct 2, 2023

[Event Hubs] Investigating memory leak #27303

Closed

deyaaeldeen mentioned this issue Nov 3, 2023

[Event Hubs] Improve prefetching #27647

Merged

3 tasks

deyaaeldeen added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Nov 7, 2023

github-actions bot removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Nov 7, 2023

deyaaeldeen moved this to Done in Azure SDK for Event Hubs Nov 7, 2023

xirzec closed this as completed Nov 14, 2023

github-actions bot locked and limited conversation to collaborators Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EventHub: consumer memory issues processing event backlog #27253

EventHub: consumer memory issues processing event backlog #27253

thomasstoermer commented Sep 26, 2023

deyaaeldeen commented Sep 28, 2023

thomasstoermer commented Sep 28, 2023

ManuelTreu commented Sep 29, 2023 •

edited

Loading

thomasstoermer commented Oct 30, 2023

HarshaNalluru commented Oct 30, 2023

deyaaeldeen commented Nov 3, 2023 •

edited

Loading

thomasstoermer commented Nov 3, 2023

deyaaeldeen commented Nov 7, 2023

github-actions bot commented Nov 7, 2023

EventHub: consumer memory issues processing event backlog #27253

EventHub: consumer memory issues processing event backlog #27253

Comments

thomasstoermer commented Sep 26, 2023

deyaaeldeen commented Sep 28, 2023

thomasstoermer commented Sep 28, 2023

ManuelTreu commented Sep 29, 2023 • edited Loading

thomasstoermer commented Oct 30, 2023

HarshaNalluru commented Oct 30, 2023

deyaaeldeen commented Nov 3, 2023 • edited Loading

thomasstoermer commented Nov 3, 2023

deyaaeldeen commented Nov 7, 2023

github-actions bot commented Nov 7, 2023

ManuelTreu commented Sep 29, 2023 •

edited

Loading

deyaaeldeen commented Nov 3, 2023 •

edited

Loading