You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current we support to set compression on ManagedLedger and ManagedCursor metadata
by managedLedgerInfoCompressionType and managedCursorInfoCompressionType
About metadata
How current compression works?
Bytes format when compression applied. [MAGIC_NUMBER](2) + [METADATA_SIZE](4) + [METADATA_PAYLOAD] + [MANAGED_LEDGER_INFO_PAYLOAD]
When the metadata reader try to decode the raw metadata bytes, if the magic number MAGIC_MANAGED_INFO_METADATA in the first short bytes occurs, it means that the reader should decode the bytes using the compression header to decode.
// If present, it signals the managed ledger has been
// terminated and this was the position of the last
// committed entry.
// No more entries can be written.
optionalNestedPositionInfoterminatedPosition=2;
repeatedKeyValueproperties=3;
}
The metadata of ManagedLedger is mainly composed by a list of LedgerInfo to describe pulsar topic storage state.
This metadata can be small when topic is in the light traffic, when user set the topic retention policy to store more message, the managed ledger metadata will become very big.
If most of the metadata size for topic is small enough, the compression won't help too much.
The metadata of ManagedCursor is the subscription state of a pulsar subscription.
When user always use accumlateAcknowledge api this state is also small enough,
but when user use Individual ack or enable batchIndexAck the cursor state will become very big.
It depends on the user logic.
The same as metadataLedger metadata, when small enough this won't help too much.
Motivation
Sometimes the metadata is too small to compress. so a size based config is needed.
Goals
In Scope
Add config to set size threshold for compression on metadata.
If the setting is set, only compress the metadata if the persistent size if above the setting.
High Level Design
Add config in ServerConfiguration and this parameter will be set in ManagedLedgerConfig
When persistent metadata the config will affect the compression logic.
Background knowledge
Current config
Current we support to set compression on ManagedLedger and ManagedCursor metadata
by
managedLedgerInfoCompressionType
andmanagedCursorInfoCompressionType
About metadata
How current compression works?
Bytes format when compression applied.
[MAGIC_NUMBER](2) + [METADATA_SIZE](4) + [METADATA_PAYLOAD] + [MANAGED_LEDGER_INFO_PAYLOAD]
pulsar/managed-ledger/src/main/proto/MLDataFormats.proto
Lines 128 to 144 in 3b3bc5e
When the metadata reader try to decode the raw metadata bytes, if the magic number
MAGIC_MANAGED_INFO_METADATA
in the first short bytes occurs, it means that the reader should decode the bytes using the compression header to decode.Related code:
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/MetaStoreImpl.java
Lines 335 to 450 in 3b3bc5e
ManagedLedger metadata
pulsar/managed-ledger/src/main/proto/MLDataFormats.proto
Lines 54 to 72 in fafadee
The metadata of
ManagedLedger
is mainly composed by a list of LedgerInfo to describe pulsar topic storage state.This metadata can be small when topic is in the light traffic, when user set the topic retention policy to store more message, the managed ledger metadata will become very big.
If most of the metadata size for topic is small enough, the compression won't help too much.
ManagedCursor metadata
pulsar/managed-ledger/src/main/proto/MLDataFormats.proto
Lines 113 to 135 in fafadee
The metadata of
ManagedCursor
is the subscription state of a pulsar subscription.When user always use
accumlateAcknowledge
api this state is also small enough,but when user use Individual ack or enable batchIndexAck the cursor state will become very big.
It depends on the user logic.
The same as metadataLedger metadata, when small enough this won't help too much.
Motivation
Sometimes the metadata is too small to compress. so a size based config is needed.
Goals
In Scope
Add config to set size threshold for compression on metadata.
managedLedgerInfoCompressionThresholdInBytes
managedCursorInfoCompressionThresholdInBytes
And we think the default value should be 16KB.
If the setting is set, only compress the metadata if the persistent size if above the setting.
High Level Design
Add config in
ServerConfiguration
and this parameter will be set inManagedLedgerConfig
When persistent metadata the config will affect the compression logic.
Configuration
New configurations:
managedLedgerInfoCompressionThresholdInBytes = 16 * 1024;
managedCursorInfoCompressionThresholdInBytes = 16 * 1024;
Backward & Forward Compatability
Revert
The compression logic is transparent. if remove the config nothing changed.
Links
PRs that introduce metadata compression:
Mailing List discussion thread: https://lists.apache.org/thread/6930c74m31rflrql9y3dpjmm0sbccqkb
Mailing List voting thread: https://lists.apache.org/thread/79t4zp9hl78vd1brbb85x1x6k1j71v44
The text was updated successfully, but these errors were encountered: