-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Questions about pulsar broker direct OOM #12169
Comments
ping @merlimat @codelipenghui @lhotari PTAL |
@wenbingshen Do you have a chance to test with 2.8.1 ? That contains quite a few fixes, just to see if there's a difference.
I assume you are intentionally testing an overload situation?
That is probably expected if there's such a high load on the system. One possibility to protect from overload is to configure rate limiters on the system. |
@lhotari Thank you very much for your reply. I don’t know much about bookkeeper's back pressure mechanism and related parameters. I will learn this content later. In fact, here, the question I want to understand is:
|
The issue had no activity for 30 days, mark with Stale label. |
@wenbingshen this seems to match the problem description of #14861 |
@lhotari You are right, same problem. |
Describe the bug
A clear and concise description of what the bug is.
Pulsar and bookkeeper version:
pulsar-2.8.0 and pulsar-2.8.0 built-in bookkeeper
cluster with 5 brokers and 5 bookies
In order to figure out the reason for the OOM of the pulsar broker's direct memory, I tested different scenarios and got some different results.
After analyzing the pulsar broker heap dump, a large number of PendingAddOp instances have not been recycled or destroyed.
As shown in the figure below, I suspect that a large number of entry requests written to bookie have not received all the WQ responses, which makes PendingAddOp unable to be recycled or destroyed.
Therefore, I use maxMessagePublishBufferSizeInMB to limit the traffic handled by the broker according to #7406 and #6178.
But next is my test results:
The next questions also are related to #9562
My question is, whether maxMessagePublishBufferSizeInMB is configured or not,
as long as AQ=WQ, direct memory is normal,
as long as AQ<WQ, direct memory will appear OOM,
this may be related to bookie’s processing logic, but how does maxMessagePublishBufferSizeInMB work?
Except that the EWA configuration ratio is different, all tests use the same configuration and only include writing, no consumption
workloads yaml
topics: 1
partitionsPerTopic: 2
messageSize: 1024
payloadFile: "payload/payload-1Kb.data"
subscriptionsPerTopic: 0
consumerPerSubscription: 0
producersPerTopic: 2
producerRate: 880000000
consumerBacklogSizeGB: 0
testDurationMinutes: 60
The text was updated successfully, but these errors were encountered: