Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

ISSUE-14529: PIP-146: ManagedCursorInfo compression #3850

Closed
sijie opened this issue Mar 2, 2022 · 0 comments
Closed

ISSUE-14529: PIP-146: ManagedCursorInfo compression #3850

sijie opened this issue Mar 2, 2022 · 0 comments
Labels

Comments

@sijie
Copy link
Member

sijie commented Mar 2, 2022

Original Issue: apache#14529


Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

Goal

Support use the LZ4/ZLIB/ZSTD/SNAPPY to compress the ManagedCursorInfo.

Implementation

  • Cursor compression format
[MAGIC_NUMBER] + [METADATA_SIZE] + [METADATA_PAYLOAD] + [MANAGED_CURSOR_INFO_PAYLOAD]
  • MAGIC_NUMBER
    Ox4779

  • METADATA
    Add a named ManagedCursorInfoMetadata message to MLDataFormats.proto:

message ManagedCursorInfoMetadata {
    required CompressionType compressionType = 1;
    required int32 uncompressedSize = 2;
}

Currently, these compressions have been supported, we only need to deal with compression and decompression of the ManagedCursorInfo data:

  • Get CursorInfo from the metadata store
    We will check the cursor data header, if it is compressed, we will parse the bytes data by compressed format, otherwise by the original way.

  • Add/Update CursorInfo to the metadata store
    The default is to use compression if the compression type is specified.

@sijie sijie added the PIP label Mar 2, 2022
@sijie sijie closed this as completed Mar 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant