Bookkeeper Direct Memory Consumption is High #19663
Replies: 7 comments 11 replies
-
|
Beta Was this translation helpful? Give feedback.
-
I recommend scanning the 2.10.x release notes to see if there have been any fixes related to direct memory issues since the 2.10.1 release. Perhaps upgrading to a newer version of Pulsar would fix this issue if it was introduced from the upgrade. |
Beta Was this translation helpful? Give feedback.
-
Enabling the Netty leak detector is a way to find out where things leak with Netty buffers (which use direct memory). Passing |
Beta Was this translation helpful? Give feedback.
-
I see many request to non existing namespace coming to my broker cluster, I am checking with internal team on it already. |
Beta Was this translation helpful? Give feedback.
-
Hi @lhotari I am also seeing this error in my broker logs Bookie node cpu, heap and direct memory was normal at the moment. |
Beta Was this translation helpful? Give feedback.
-
Hi @sourabhaggrawal, thank you for raising this issue. I have a few questions.
|
Beta Was this translation helpful? Give feedback.
-
Hi @hangc0276 , apologies for the late reply.
|
Beta Was this translation helpful? Give feedback.
-
We have observed that in bookkeeper Direct Memory which was set to 4GB for a long time (2-3 years), is getting consumed fully (may be after upgrading to 2.10.1) , we noticed this 3 months back thats when we upgraded the cluster to 2.10.1.
We went ahead and beefed up the server with double capacity and allocated 20GB of Direct Memory (because thats where it got stabilised). But now we see that even 20GB was also getting fully consumed and bookkeeper kept restarting
We also have a problem of our ledger disk getting full and not deleting the expired ledgers or not reclaiming the free space and I believe Direct Memory could be a reason of this problem too. Because GC also make use of Memory to delete ledgers.
I have attached a screenshot of DirectMemory of one node and the pattern is same for other nodes too.
Any leads on what could be the reason and how to fix it ?
Beta Was this translation helpful? Give feedback.
All reactions