Skip to content

Latest commit

 

History

History
158 lines (109 loc) · 8.58 KB

pip-384.md

File metadata and controls

158 lines (109 loc) · 8.58 KB

PIP-384: ManagedLedger interface decoupling

Background knowledge

Apache Pulsar uses a component called ManagedLedger to handle persistent storage of messages.

The ManagedLedger interfaces and implementation were initially tightly coupled, making it difficult to introduce alternative implementations or improve the architecture. This PIP documents changes that have been made in the master branch for Pulsar 4.0. Pull Requests #22891 and #23311 have already been merged. This work happened after lazy consensus on the dev mailing list based on the discussion thread "Preparing for Pulsar 4.0: cleaning up the Managed Ledger interfaces". There is one remaining PR #23313 at the time of writing this document. The goal of this PIP is to document the changes in this area for later reference.

Key concepts:

Motivation

The current ManagedLedger implementation faces several challenges:

  1. Tight coupling: The interfaces are tightly coupled with their implementation, making it difficult to introduce alternative implementations.

  2. Limited flexibility: The current architecture doesn't allow for easy integration of different storage systems or optimizations.

  3. Dependency on BookKeeper: The ManagedLedger implementation is closely tied to BookKeeper, limiting options for alternative storage solutions.

  4. Complexity: The tight coupling increases the overall complexity of the system, making it harder to maintain, test and evolve.

  5. Limited extensibility: Introducing new features or optimizations often requires changes to both interfaces and implementations.

Goals

In Scope

  • Decouple ManagedLedger interfaces from their current implementation.
  • Introduce a ReadOnlyManagedLedger interface.
  • Decouple OpAddEntry and LedgerHandle from ManagedLedgerInterceptor.
  • Enable support for multiple ManagedLedgerFactory instances.
  • Decouple BookKeeper client from ManagedLedgerStorage.
  • Improve overall architecture by reducing coupling between core Pulsar components and specific ManagedLedger implementations.
  • Prepare the groundwork for alternative ManagedLedger implementations in Pulsar 4.0.

Out of Scope

  • Implementing alternative ManagedLedger storage backends.
  • Changes to external APIs or behaviors.
  • Comprehensive JavaDocs for the interfaces.

High Level Design

  1. Decouple interfaces from implementations:

    • Move required methods from implementation classes to their respective interfaces.
    • Update code to use interfaces instead of concrete implementations.
  2. Introduce ReadOnlyManagedLedger interface:

    • Extract this interface to decouple from ReadOnlyManagedLedgerImpl.
    • Adjust code to use the new interface where appropriate.
  3. Decouple ManagedLedgerInterceptor:

    • Introduce AddEntryOperation and LastEntryHandle interfaces.
    • Adjust ManagedLedgerInterceptor to use these new interfaces.
  4. Enable multiple ManagedLedgerFactory instances:

    • Modify ManagedLedgerStorage interface to support multiple "storage classes".
    • Implement BookkeeperManagedLedgerStorageClass for BookKeeper support.
    • Update PulsarService and related classes to support multiple ManagedLedgerFactory instances.
    • Add "storage class" to persistence policy part of the namespace level or topic level policies.
  5. Decouple BookKeeper client:

    • Move BookKeeper client creation and management to BookkeeperManagedLedgerStorageClass.
    • Update ManagedLedgerStorage interface to remove direct BookKeeper dependencies.

Detailed Design

Interface Decoupling

  1. Update ManagedLedger interface:

    • Add methods from ManagedLedgerImpl to the interface.
    • Remove dependencies on implementation-specific classes.
  2. Update ManagedLedgerFactory interface:

    • Add necessary methods from ManagedLedgerFactoryImpl.
    • Remove dependencies on implementation-specific classes.
  3. Update ManagedCursor interface:

    • Add required methods from ManagedCursorImpl.
    • Remove dependencies on implementation-specific classes.
  4. Introduce ReadOnlyManagedLedger interface:

    • Extract methods specific to read-only operations.
    • Update relevant code to use this interface where appropriate.
  5. Decouple ManagedLedgerInterceptor:

    • Introduce AddEntryOperation interface for beforeAddEntry method.
    • Introduce LastEntryHandle interface for onManagedLedgerLastLedgerInitialize method.
    • Update ManagedLedgerInterceptor to use these new interfaces.

Multiple ManagedLedgerFactory Instances

  1. Update ManagedLedgerStorage interface:

    • Add methods to support multiple storage classes.
    • Introduce getManagedLedgerStorageClass method to retrieve specific storage implementations.
  2. Implement BookkeeperManagedLedgerStorageClass:

    • Create a new class implementing ManagedLedgerStorageClass for BookKeeper.
    • Move BookKeeper client creation and management to this class.
  3. Update PulsarService and related classes:

    • Modify to support creation and management of multiple ManagedLedgerFactory instances.
    • Update configuration to allow specifying different storage classes for different namespaces or topics.

BookKeeper Client Decoupling

  1. Update ManagedLedgerStorage interface:

    • Remove direct dependencies on BookKeeper client.
    • Introduce methods to interact with storage without exposing BookKeeper specifics.
  2. Implement BookkeeperManagedLedgerStorageClass:

    • Encapsulate BookKeeper client creation and management.
    • Implement storage operations using BookKeeper client.
  3. Update relevant code:

    • Replace direct BookKeeper client usage with calls to ManagedLedgerStorage methods.
    • Update configuration handling to support BookKeeper-specific settings through the new storage class.

Public-facing Changes

Configuration

  • Add new configuration option to specify default ManagedLedger "storage class" at broker level.

API Changes

  • No major changes to external APIs are planned.
  • The only API change is to add managedLedgerStorageClassName to PersistencePolicies which can be used by a custom ManagedLedgerStorage to control the ManagedLedgerFactory instance that is used for a particular namespace or topic.

Backward & Forward Compatibility

The changes are internal and don't affect external APIs or behaviors. Backward compatibility is fully preserved in Apache Pulsar.

Security Considerations

The decoupling of interfaces and implementation doesn't introduce new security concerns.

Links