Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-146: ManagedCursorInfo compression #14529

Closed
nodece opened this issue Mar 2, 2022 · 1 comment · Fixed by #14542
Closed

PIP-146: ManagedCursorInfo compression #14529

nodece opened this issue Mar 2, 2022 · 1 comment · Fixed by #14542
Labels

Comments

@nodece
Copy link
Member

nodece commented Mar 2, 2022

Discussion thread: https://lists.apache.org/thread/j92bzsby9n2ozc9gcw5psgcy2026l1wm

Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

Goal

Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the ManagedCursorInfo.

Implementation

CursorInfo compression format

[MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] + [MANAGED_CURSOR_INFO_PAYLOAD]

  • MAGIC_NUMBER
    Use 0x4778, it is the same as the magic number of ledger info.

  • METADATA
    Add a named ManagedCursorInfoMetadata message to MLDataFormats.proto

message ManagedCursorInfoMetadata {
    required CompressionType compressionType = 1;
    required int32 uncompressedSize = 2;
}

CursorInfo compression and decompression design

Currently, these compressions types have been defined and implemented by Pulsar, we only need to deal with compression and decompression of the ManagedCursorInfo data:

  • Get CursorInfo from the metadata store
    We will check the cursor data header, if it is compressed, we will parse the bytes data by compressed format, otherwise we will parse the cursor data directly by the original way.

  • Add/Update CursorInfo to the metadata store
    The default is to use compression if the compression type is specified, otherwise we will put this data to the metadata store directly.

CursorInfo compression type configuration

Add managedCursorInfoCompressionType in org.apache.pulsar.broker.ServiceConfiguration and org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig.

Compatibility

  1. The compression is disabled by default
  2. Data can be upgraded safely or downgraded. When enabled, we can migrate the old data to new data with compression metadata in updating action. When disabled, we can revert this data to the previous version in update action and can parse the compressed data by the compression metadata in getting action
@nodece nodece added the type/PIP label Mar 2, 2022
@nodece nodece changed the title PIP-146 ManagedCursorInfo compression PIP-146: ManagedCursorInfo compression Mar 2, 2022
@nodece
Copy link
Member Author

nodece commented Mar 16, 2022

This proposal has 3 (+1) bindings and 0 (-1) and has stayed open for at least 48 hours:
https://lists.apache.org/thread/kvpostlbqty87cdjgt7vrfozn3qz2j1t

@nodece nodece closed this as completed Mar 16, 2022
codelipenghui pushed a commit that referenced this issue Apr 19, 2022
Fixes #14529 

### Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.


### Modifications

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this issue Apr 20, 2022
Fixes apache#14529 

### Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.


### Modifications

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication
codelipenghui pushed a commit to codelipenghui/incubator-pulsar that referenced this issue Jun 2, 2022
Fixes apache#14529

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication

(cherry picked from commit 4398733)
codelipenghui pushed a commit that referenced this issue Jun 2, 2022
Fixes #14529

### Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

### Modifications

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication

(cherry picked from commit 4398733)
nicoloboschi pushed a commit to datastax/pulsar that referenced this issue Jun 6, 2022
Fixes apache#14529

### Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

### Modifications

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication

(cherry picked from commit 4398733)
(cherry picked from commit 70c7794)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant