Use pooled direct memory allocator when decoding Pulsar entry to Kafka records #673

BewareMyPower · 2021-08-23T07:01:05Z

Motivation

When a Pulsar entry is decoded to Kafka record in ByteBufUtils#decodePulsarEntryToKafkaRecords, a NIO buffer whose initial capacity is 1 MB will be allocated from heap memory. Therefore, each time an entry is read, 1 MB heap memory will be allocated. Then the heap memory will increase very quickly and GC will happen frequently.

Kafka MemoryRecordsBuilder uses its underlying ByteBufferOutputStream field as the internal buffer whose capacity can be increased in write method. Even if a direct buffer was allocated by Netty's pooled direct memory allocator and its underlying ByteBuffer was passed to ByteBufferOutputStream's constructor, if the reallocation happened, the new buffer could still be allocated from heap memory.

Modification

This PR adds a DirectBufferOutputStream class that inherits from ByteBufferOutputStream and overrides some methods that can be called in MemoryRecordsBuilder. This class uses Pulsar's default ByteBufAllocator to allocate memory. The other methods' behaviors are the same with ByteBufferOutputStream.

A unit test is added to verify that the MemoryRecordsBuilder will build the same records no matter the underlying ByteBufferOutputStream is ByteBufferOutputStream or DirectBufferOutputStream. Three cases are tested in this test:

The initial capacity is less than the size of records header, in this case, position(int) method will be called to increase the capacity.
The initial capacity is greater than both the size of records header and the total size of records.
The initial capacity is greater than the size of records header but less than the total size of records, in this case, write() method will increase the capacity automatically.

Then, a DirectBufferOutputStream instance is passed to MemoryRecordsBuilder's constructor in ByteBufUtils#decodePulsarEntryToKafkaRecords and the return value's type is changed to DecodeResult because we need to release the ByteBuf later.

BewareMyPower · 2021-08-23T11:54:12Z

Here're the comparison between KoP before this PR and KoP after this PR

…a records (#673) ### Motivation When a Pulsar entry is decoded to Kafka record in `ByteBufUtils#decodePulsarEntryToKafkaRecords`, a NIO buffer whose initial capacity is 1 MB will be allocated from heap memory. Therefore, each time an entry is read, 1 MB heap memory will be allocated. Then the heap memory will increase very quickly and GC will happen frequently. Kafka `MemoryRecordsBuilder` uses its underlying `ByteBufferOutputStream` field as the internal buffer whose capacity can be increased in `write` method. Even if a direct buffer was allocated by Netty's pooled direct memory allocator and its underlying `ByteBuffer` was passed to `ByteBufferOutputStream`'s constructor, if the reallocation happened, the new buffer could still be allocated from heap memory. ### Modification This PR adds a `DirectBufferOutputStream` class that inherits from `ByteBufferOutputStream` and overrides some methods that can be called in `MemoryRecordsBuilder`. This class uses Pulsar's default `ByteBufAllocator` to allocate memory. The other methods' behaviors are the same with `ByteBufferOutputStream`. A unit test is added to verify that the `MemoryRecordsBuilder` will build the same records no matter the underlying `ByteBufferOutputStream` is `ByteBufferOutputStream` or `DirectBufferOutputStream`. Three cases are tested in this test: 1. The initial capacity is less than the size of records header, in this case, `position(int)` method will be called to increase the capacity. 2. The initial capacity is greater than both the size of records header and the total size of records. 3. The initial capacity is greater than the size of records header but less than the total size of records, in this case, `write()` method will increase the capacity automatically. Then, a `DirectBufferOutputStream` instance is passed to `MemoryRecordsBuilder`'s constructor in `ByteBufUtils#decodePulsarEntryToKafkaRecords` and the return value's type is changed to `DecodeResult` because we need to release the `ByteBuf` later.

BewareMyPower added 3 commits August 22, 2021 01:23

Add DirectBufferOutputStream to reuse Pulsar's ByteBufAllocator

57ac2a1

Use DirectBufferOutputStream to decode Pulsar entry

3f980c4

Fix description of buffer() method

db39e3f

BewareMyPower added the type/enhancement Indicates an improvement to an existing feature label Aug 23, 2021

BewareMyPower self-assigned this Aug 23, 2021

BewareMyPower requested a review from jiazhai as a code owner August 23, 2021 07:01

BewareMyPower requested review from codelipenghui and Demogorgon314 August 23, 2021 07:01

Fix checkstyle

4d8056f

BewareMyPower changed the title ~~Use pooled direct memory allocator when decoding Pulsar entry to Kafka records~~ [WIP] Use pooled direct memory allocator when decoding Pulsar entry to Kafka records Aug 23, 2021

Fix test failures when entryFormat=kafka

084aa7b

BewareMyPower changed the title ~~[WIP] Use pooled direct memory allocator when decoding Pulsar entry to Kafka records~~ Use pooled direct memory allocator when decoding Pulsar entry to Kafka records Aug 23, 2021

Demogorgon314 approved these changes Aug 23, 2021

View reviewed changes

BewareMyPower merged commit 4de6212 into streamnative:master Aug 24, 2021

BewareMyPower deleted the bewaremypower/direct-buffer-decode branch August 24, 2021 03:16

BewareMyPower mentioned this pull request Aug 27, 2021

KoP 2.9.0 Plan #581

Open

9 tasks

BewareMyPower mentioned this pull request Sep 8, 2021

PIP 94: Message converter at broker level apache/pulsar#11962

Closed

wenbingshen mentioned this pull request Oct 7, 2021

[FEATURE] Support kafka LogValidator validate inner records and compression codec when handle producer request with entryFormat=kafka #791

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pooled direct memory allocator when decoding Pulsar entry to Kafka records #673

Use pooled direct memory allocator when decoding Pulsar entry to Kafka records #673

BewareMyPower commented Aug 23, 2021

BewareMyPower commented Aug 23, 2021

Use pooled direct memory allocator when decoding Pulsar entry to Kafka records #673

Use pooled direct memory allocator when decoding Pulsar entry to Kafka records #673

Conversation

BewareMyPower commented Aug 23, 2021

Motivation

Modification

BewareMyPower commented Aug 23, 2021