Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ManagedLedger] Compress managed ledger info #11490

Merged
merged 8 commits into from
Aug 4, 2021

Conversation

gaoran10
Copy link
Contributor

@gaoran10 gaoran10 commented Jul 28, 2021

Motivation

Currently, the ManagedLedgerInfo contains offload context info, if there are too many ledgers in one ManagedLedger, a Zookeeper ZNode data will increase, it's hard for Zookeeper to manage these data. We could compress the ManagedLedgerInfo data, this will decrease the data size.

For example, if one ManagedLedgerInfo contains 30000 ledgers and each LedgerInfo contains offload context, the uncompressed data size is about 6MB, after compress with ZSTD, the compression data size could be decreased to about 1.3MB.

Modifications

Add a ManagedLedgerInfoMetadata before ManagedLedgerInfo, the data structure as below.

[MAGIC_NUMBER] (2) + [METADATA_SIZE] (4) + [METADATA_PAYLOAD] + [MANAGED_LEDGER_INFO_PAYLOAD]

Add a new configuration managedLedgerInfoCompressionType to control ManagedLedgerInfo compression type, if not set this configuration, then don't compress ManagedLedgerInfo data.

Migration

When reading data from the Zookeeper, check the magic number of the data, if the head data match the magic number then try to parse metadata and uncompress data, if encounter errors or not match fall back to parse ManagedLedgerInfo directly.

Verifying this change

Verify compress and uncompress ManagedLedgerInfo data.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (no)

Documentation

For committer

For this PR, do we need to update docs?

  • If yes,

    • if you update docs in this PR, label this PR with the doc label.

    • if you plan to update docs later, label this PR with the doc-required label.

    • if you need help on updating docs, create a follow-up issue with the doc-required label.

  • If no, label this PR with the no-need-doc label and explain why.

@sijie sijie added this to the 2.9.0 milestone Jul 28, 2021
@sijie sijie added component/storage type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages labels Jul 28, 2021
@@ -186,6 +186,11 @@ message BrokerEntryMetadata {
optional uint64 index = 2;
}

message ManagedLedgerInfoMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaoran10 We should move it to MLDataFormats.proto? The PulsarApi.proto is used for the broker and client interaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll move it.

String path = PREFIX + ledgerName;
store.put(path, serializedMlInfo, Optional.of(stat.getVersion()))
store.put(path, compressLedgerInfo(mlInfo, CompressionType.ZSTD), Optional.of(stat.getVersion()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compression type should be configurable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@@ -47,6 +54,15 @@
private final MetadataStore store;
private final OrderedExecutor executor;

public static final short magicManagedLedgerInfoMetadata = 0x0b9c;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final short magicManagedLedgerInfoMetadata = 0x0b9c;
public static final short MAGIC_MANAGED_LEDGER_INFO_METADATA = 0x0b9c;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this.

2. Add configuration managedLedgerInfoCompressionType to control ManagedLedgerInfo compression type.
@gaoran10 gaoran10 changed the title [WIP] [ManagedLedger] Compress managed ledger info [ManagedLedger] Compress managed ledger info Jul 29, 2021

public ManagedLedgerInfo parseManagedLedgerInfo(byte[] data) throws InvalidProtocolBufferException {
ByteBuf byteBuf = PulsarByteBufAllocator.DEFAULT.buffer(data.length, data.length);
byteBuf.writeBytes(data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can use wrappedBuffer to reduce one copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll fix this.

metadataByteBuf.writeBytes(mlInfoMetadata.toByteArray());

ByteBuf originalByteBuf = PulsarByteBufAllocator.DEFAULT.buffer(originalBytes.length, originalBytes.length);
originalByteBuf.writeBytes(originalBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can use wrappedBuffer to reduce one copy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll fix this.


byte[] dataBytes = new byte[compositeByteBuf.readableBytes()];
compositeByteBuf.readBytes(dataBytes);
return dataBytes;
Copy link
Contributor

@315157973 315157973 Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these ByteBufs need to be released?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release the compositeByteBuf will encounter error o.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1.

MLDataFormats.ManagedLedgerInfoMetadata.parseFrom(metadataBytes);

long unpressedSize = metadata.getUnpressedSize();
ByteBuf decodeByteBuf = CompressionCodecProvider.getCompressionCodec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a release here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
     * Decompress a buffer.
     *
     * <p>The buffer needs to have been compressed with the matching Encoder.
     *
     * @param encoded
     *            the compressed content
     * @param uncompressedSize
     *            the size of the original content
     * @return a ByteBuf with the compressed content. The buffer needs to be released by the receiver
     * @throws IOException
     *             if the decompression fails
     */
    ByteBuf decode(ByteBuf encoded, int uncompressedSize) throws IOException;

we need to release the ByteBuf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this.

@@ -47,11 +53,27 @@
private final MetadataStore store;
private final OrderedExecutor executor;

public static final short MAGIC_MANAGED_LEDGER_INFO_METADATA = 0x0b9c;
private CompressionType compressionType = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initiate to be None instead of null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better.

@@ -124,3 +124,8 @@ message ManagedCursorInfo {
// Store which index in the batch message has been deleted
repeated BatchedEntryDeletionIndexInfo batchedEntryDeletionIndexInfo = 7;
}

message ManagedLedgerInfoMetadata {
required string compressionType = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use compression enum type instead of String?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CompressionType in the PulsarApi.proto, if we want to use the enum CompressionType, we need import PulsarApi.proto, maybe we could use string type here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just copy the same enum type here, or even declare it as int and use the other enum type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks we'd better to copy the enum type to the MLDataFormats.proto because the managed ledger module can be a independent component.

@@ -1601,6 +1601,11 @@
private String managedLedgerDataReadPriority = OffloadedReadPriority.TIERED_STORAGE_FIRST
.getValue();

@FieldContext(category = CATEGORY_STORAGE_ML,
doc = "ManagedLedgerInfo compression type, option values (LZ4, ZLIB, ZSTD, SNAPPY). \n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add NONE compression value type to doc, and default value set to be NONE instead of null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll fix this.

/**
* ManagedLedgerInfo compression type. If the compression type is null or invalid, don't compress data.
*/
private String managedLedgerInfoCompressionType = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better use enum type to strict compression type info specific type instead of string. I'm not sure whether it's easy to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the configuration managedLedgerInfoCompressionType of the ServiceConfiguration to set this field, maybe we could convert String to ComprssionType in MetaStoreImpl.

MLDataFormats.ManagedLedgerInfoMetadata.parseFrom(metadataBytes);

long unpressedSize = metadata.getUnpressedSize();
ByteBuf decodeByteBuf = CompressionCodecProvider.getCompressionCodec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
     * Decompress a buffer.
     *
     * <p>The buffer needs to have been compressed with the matching Encoder.
     *
     * @param encoded
     *            the compressed content
     * @param uncompressedSize
     *            the size of the original content
     * @return a ByteBuf with the compressed content. The buffer needs to be released by the receiver
     * @throws IOException
     *             if the decompression fails
     */
    ByteBuf decode(ByteBuf encoded, int uncompressedSize) throws IOException;

we need to release the ByteBuf


ByteBuf originalByteBuf = PulsarByteBufAllocator.DEFAULT.buffer(originalBytes.length, originalBytes.length);
originalByteBuf.writeBytes(originalBytes);
ByteBuf encodeByteBuf = CompressionCodecProvider.getCompressionCodec(compressionType).encode(originalByteBuf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
     * Compress a buffer.
     *
     * @param raw
     *            a buffer with the uncompressed content. The reader/writer indexes will not be modified
     * @return a new buffer with the compressed content. The buffer needs to be released by the receiver
     */
    ByteBuf encode(ByteBuf raw);

Need to be released.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

What is the upgrade story?

@@ -47,11 +53,27 @@
private final MetadataStore store;
private final OrderedExecutor executor;

public static final short MAGIC_MANAGED_LEDGER_INFO_METADATA = 0x0b9c;
private CompressionType compressionType = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better.

@eolivelli
Copy link
Contributor

@codelipenghui this is a new feature, with a significant impact, as you cannot rollback if you enable this feature.

I believe that we cannot cherry pick this feature to 2.8.x release line.

We should cherry pick to released versions only bug fixes and small features that have a low impact.

does this make sense to you ?

@sijie
Copy link
Member

sijie commented Jul 30, 2021

@eolivelli: the upgrade requires two steps:

  1. upgrade the binary without enabling this feature.
  2. after all brokers are upgraded, enable this feature.

Because it is a 2-steps upgrade story, we can add the code to 2.8.1 because this feature would be turned off. With the code available in 2.8.1, it provides a smooth path to enable this feature.

@eolivelli
Copy link
Contributor

With the code available in 2.8.1, it provides a smooth path to enable this feature.
makes sense to me

we have to document it carefully, because it will ease switching from 2.8.1 to 2.9.0,
but then the problem goes down to the upgrade from 2.8.0 to 2.8.1: you have to take care of enabling this feature by upgrading to 2.8.1 all the brokers and then enable the flag.

if you feel strong then I am not against cherry picking this into 2.8.x.
I just wanted to point out that we should start porting less "new features" to released branches, in order to preserve stability as much as possible, but this is not the right thread for discussing this point.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as soon as all the existing comments have been addressed

@sijie sijie added doc-required Your PR changes impact docs and you will update later. release/note-required labels Jul 30, 2021
@sijie
Copy link
Member

sijie commented Jul 30, 2021

we have to document it carefully, because it will ease switching from 2.8.1 to 2.9.0,
but then the problem goes down to the upgrade from 2.8.0 to 2.8.1: you have to take care of enabling this feature by upgrading to 2.8.1 all the brokers and then enable the flag.

Marked it for doc-required and releasenote-required.

@codelipenghui codelipenghui requested a review from 315157973 August 2, 2021 15:54
Copy link
Contributor

@315157973 315157973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change their positions so as to avoid NPE ?

metadata.getCompressionType().equals(CompressionType.ZLIB.name())

change to

CompressionType.ZLIB.name().equals(...)

@gaoran10
Copy link
Contributor Author

gaoran10 commented Aug 3, 2021

Should we change their positions so as to avoid NPE ?

metadata.getCompressionType().equals(CompressionType.ZLIB.name())

change to

CompressionType.ZLIB.name().equals(...)

Good catch, the parameter compressionType of the metadata is required, but using CompressionType.ZLIB.name() is better, I'll change it.

@@ -47,9 +54,31 @@
private final MetadataStore store;
private final OrderedExecutor executor;

public static final short MAGIC_MANAGED_LEDGER_INFO_METADATA = 0x0b9c;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to use a magic number with the most significant bit set to 1 because that should be guaranteed to be a non-valid protobuf sequence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change this.

try {
finalCompressionType = CompressionType.valueOf(compressionType);
} catch (Exception e) {
log.warn("Failed to get compression type {}, disable managedLedgerInfo compression, error msg: {}.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the compression type is not valid we should just throw exception instead of trying to handle and disable compression. This could lead to situation in which someone thinks compression is enabled but it really is failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll throw the exception.

@@ -124,3 +124,8 @@ message ManagedCursorInfo {
// Store which index in the batch message has been deleted
repeated BatchedEntryDeletionIndexInfo batchedEntryDeletionIndexInfo = 7;
}

message ManagedLedgerInfoMetadata {
required string compressionType = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could just copy the same enum type here, or even declare it as int and use the other enum type


message ManagedLedgerInfoMetadata {
required string compressionType = 1;
required int32 unpressedSize = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
required int32 unpressedSize = 2;
required int32 uncompressedSize = 2;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -47,9 +54,31 @@
private final MetadataStore store;
private final OrderedExecutor executor;

public static final short MAGIC_MANAGED_LEDGER_INFO_METADATA = 0x0b9c;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to use a magic number with the most significant bit set to 1 because that should be guaranteed to be a non-valid protobuf sequence

2. change compression magic number.
3. add a unit test to verify compression could work well.
@gaoran10
Copy link
Contributor Author

gaoran10 commented Aug 3, 2021

We should try to use a magic number with the most significant bit set to 1 because that should be guaranteed to be a non-valid protobuf sequence

@merlimat I'm not sure about how to generate a non-valid protobuf sequence. I take a look at some docs, the basic protobuf encode format is field_number + wire_type + value, currently, the wire type range is 0-5, so I use a code 111. Because I want to use a short value to present the magic number, so the MSB couldn't be 1, could you provide some suggestions? Or we could use a magic number > 111 1111 1111 1111, an int value?

@merlimat merlimat merged commit 4361b6d into apache:master Aug 4, 2021
@gaoran10 gaoran10 deleted the ledger-metadata-compression branch August 5, 2021 00:38
codelipenghui pushed a commit that referenced this pull request Aug 5, 2021
* compress managed ledger info

* 1. Move the `ManagedLedgerInfoMetadata` to `MLDataFormats.proto`.
2. Add configuration managedLedgerInfoCompressionType to control ManagedLedgerInfo compression type.

* use ByteBuf wrap bytes array, release ByteBuf if needed.

* make the compressionType as a final field

* fix comment

* 1. throw exception if using a invalid compression type.
2. change compression magic number.
3. add a unit test to verify compression could work well.

* change compression magic number.

* fix test

(cherry picked from commit 4361b6d)
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Aug 5, 2021
codelipenghui pushed a commit that referenced this pull request Aug 5, 2021
* compress managed ledger info

* 1. Move the `ManagedLedgerInfoMetadata` to `MLDataFormats.proto`.
2. Add configuration managedLedgerInfoCompressionType to control ManagedLedgerInfo compression type.

* use ByteBuf wrap bytes array, release ByteBuf if needed.

* make the compressionType as a final field

* fix comment

* 1. throw exception if using a invalid compression type.
2. change compression magic number.
3. add a unit test to verify compression could work well.

* change compression magic number.

* fix test

(cherry picked from commit 4361b6d)
@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Aug 5, 2021
hangc0276 pushed a commit that referenced this pull request Aug 6, 2021
…configuration doc (#11563)

### Modifications
Add a new configuration managedLedgerInfoCompressionType in broker configuration doc.

Related to #11490
LeBW pushed a commit to LeBW/pulsar that referenced this pull request Aug 9, 2021
* compress managed ledger info

* 1. Move the `ManagedLedgerInfoMetadata` to `MLDataFormats.proto`.
2. Add configuration managedLedgerInfoCompressionType to control ManagedLedgerInfo compression type.

* use ByteBuf wrap bytes array, release ByteBuf if needed.

* make the compressionType as a final field

* fix comment

* 1. throw exception if using a invalid compression type.
2. change compression magic number.
3. add a unit test to verify compression could work well.

* change compression magic number.

* fix test
LeBW pushed a commit to LeBW/pulsar that referenced this pull request Aug 9, 2021
…configuration doc (apache#11563)

### Modifications
Add a new configuration managedLedgerInfoCompressionType in broker configuration doc.

Related to apache#11490
@Anonymitaet Anonymitaet removed the doc-required Your PR changes impact docs and you will update later. label Aug 17, 2021
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
* compress managed ledger info

* 1. Move the `ManagedLedgerInfoMetadata` to `MLDataFormats.proto`.
2. Add configuration managedLedgerInfoCompressionType to control ManagedLedgerInfo compression type.

* use ByteBuf wrap bytes array, release ByteBuf if needed.

* make the compressionType as a final field

* fix comment

* 1. throw exception if using a invalid compression type.
2. change compression magic number.
3. add a unit test to verify compression could work well.

* change compression magic number.

* fix test
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
…configuration doc (apache#11563)

### Modifications
Add a new configuration managedLedgerInfoCompressionType in broker configuration doc.

Related to apache#11490
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.7 Archived: 2.7 is end of life cherry-picked/branch-2.8 Archived: 2.8 is end of life release/2.7.4 release/2.8.1 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants