Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-12169: [BUG] Questions about pulsar broker direct OOM #3090

Open
sijie opened this issue Sep 24, 2021 · 1 comment
Open

ISSUE-12169: [BUG] Questions about pulsar broker direct OOM #3090

sijie opened this issue Sep 24, 2021 · 1 comment

Comments

@sijie
Copy link
Member

sijie commented Sep 24, 2021

Original Issue: apache#12169


Describe the bug
A clear and concise description of what the bug is.

Pulsar and bookkeeper version:
pulsar-2.8.0 and pulsar-2.8.0 built-in bookkeeper
cluster with 5 brokers and 5 bookies

In order to figure out the reason for the OOM of the pulsar broker's direct memory, I tested different scenarios and got some different results.

After analyzing the pulsar broker heap dump, a large number of PendingAddOp instances have not been recycled or destroyed.

As shown in the figure below, I suspect that a large number of entry requests written to bookie have not received all the WQ responses, which makes PendingAddOp unable to be recycled or destroyed.

image

Therefore, I use maxMessagePublishBufferSizeInMB to limit the traffic handled by the broker according to apache#7406 and apache#6178.

But next is my test results:

  1. The broker is configured with maxMessagePublishBufferSizeInMB=512, EW A=3:3:2, OOM still occurs after the pressure test
  2. The broker configures maxMessagePublishBufferSizeInMB=512, and tests EW A=3:3:3, 3:2:2, and 2:2:2 respectively. After the pressure test, the direct memory is normal
  3. The broker configures maxMessagePublishBufferSizeInMB=2048, test EW A=3:3:3 and 3:2:2, after the pressure test, the direct memory is normal
  4. The broker configuration keeps maxMessagePublishBufferSizeInMB as the default value, the default is 1/2 of the maximum allocated off-heap memory (8/2=4GB in the test), test EW A=3:3:3 and 3:2:2, pressure test The off-heap memory is normal
  5. The broker configures maxMessagePublishBufferSizeInMB=-1, closes current limiting measures, tests EW A=3:3:3 and 3:2:2, the memory is normal after the pressure test
  6. The broker configures maxMessagePublishBufferSizeInMB=-1, closes current limiting measures, tests EW A=3:3:2, OOM occurs after the pressure test

The next questions also are related to apache#9562

My question is, whether maxMessagePublishBufferSizeInMB is configured or not,
as long as AQ=WQ, direct memory is normal,
as long as AQ<WQ, direct memory will appear OOM,
this may be related to bookie’s processing logic, but how does maxMessagePublishBufferSizeInMB work?

Except that the EWA configuration ratio is different, all tests use the same configuration and only include writing, no consumption

workloads yaml

topics: 1
partitionsPerTopic: 2
messageSize: 1024
payloadFile: "payload/payload-1Kb.data"
subscriptionsPerTopic: 0
consumerPerSubscription: 0
producersPerTopic: 2
producerRate: 880000000
consumerBacklogSizeGB: 0
testDurationMinutes: 60

@sijie sijie added the type/bug label Sep 24, 2021
@github-actions
Copy link

github-actions bot commented Mar 1, 2022

The issue had no activity for 30 days, mark with Stale label.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant