Apache Pulsar uses a component called ManagedLedger to handle persistent storage of messages.
The ManagedLedger interfaces and implementation were initially tightly coupled, making it difficult to introduce alternative implementations or improve the architecture. This PIP documents changes that have been made in the master branch for Pulsar 4.0. Pull Requests #22891 and #23311 have already been merged. This work happened after lazy consensus on the dev mailing list based on the discussion thread "Preparing for Pulsar 4.0: cleaning up the Managed Ledger interfaces". There is one remaining PR #23313 at the time of writing this document. The goal of this PIP is to document the changes in this area for later reference.
Key concepts:
- ManagedLedger: A component that handles the persistent storage of messages in Pulsar.
- BookKeeper: The default storage system used by ManagedLedger.
- ManagedLedgerStorage interface: A factory for configuring and creating the
ManagedLedgerFactory
instance. ManagedLedgerStorage.java source code - ManagedLedgerFactory interface: Creates and manages ManagedLedger instances. ManagedLedgerFactory.java source code
- ManagedLedger interface: Handles the persistent storage of messages in Pulsar. ManagedLedger.java source code
- ManagedCursor interface: Handles the persistent storage of Pulsar subscriptions and related message acknowledgements. ManagedCursor.java source code
The current ManagedLedger implementation faces several challenges:
-
Tight coupling: The interfaces are tightly coupled with their implementation, making it difficult to introduce alternative implementations.
-
Limited flexibility: The current architecture doesn't allow for easy integration of different storage systems or optimizations.
-
Dependency on BookKeeper: The ManagedLedger implementation is closely tied to BookKeeper, limiting options for alternative storage solutions.
-
Complexity: The tight coupling increases the overall complexity of the system, making it harder to maintain, test and evolve.
-
Limited extensibility: Introducing new features or optimizations often requires changes to both interfaces and implementations.
- Decouple ManagedLedger interfaces from their current implementation.
- Introduce a ReadOnlyManagedLedger interface.
- Decouple OpAddEntry and LedgerHandle from ManagedLedgerInterceptor.
- Enable support for multiple ManagedLedgerFactory instances.
- Decouple BookKeeper client from ManagedLedgerStorage.
- Improve overall architecture by reducing coupling between core Pulsar components and specific ManagedLedger implementations.
- Prepare the groundwork for alternative ManagedLedger implementations in Pulsar 4.0.
- Implementing alternative ManagedLedger storage backends.
- Changes to external APIs or behaviors.
- Comprehensive JavaDocs for the interfaces.
-
Decouple interfaces from implementations:
- Move required methods from implementation classes to their respective interfaces.
- Update code to use interfaces instead of concrete implementations.
-
Introduce ReadOnlyManagedLedger interface:
- Extract this interface to decouple from ReadOnlyManagedLedgerImpl.
- Adjust code to use the new interface where appropriate.
-
Decouple ManagedLedgerInterceptor:
- Introduce AddEntryOperation and LastEntryHandle interfaces.
- Adjust ManagedLedgerInterceptor to use these new interfaces.
-
Enable multiple ManagedLedgerFactory instances:
- Modify ManagedLedgerStorage interface to support multiple "storage classes".
- Implement BookkeeperManagedLedgerStorageClass for BookKeeper support.
- Update PulsarService and related classes to support multiple ManagedLedgerFactory instances.
- Add "storage class" to persistence policy part of the namespace level or topic level policies.
-
Decouple BookKeeper client:
- Move BookKeeper client creation and management to BookkeeperManagedLedgerStorageClass.
- Update ManagedLedgerStorage interface to remove direct BookKeeper dependencies.
-
Update ManagedLedger interface:
- Add methods from ManagedLedgerImpl to the interface.
- Remove dependencies on implementation-specific classes.
-
Update ManagedLedgerFactory interface:
- Add necessary methods from ManagedLedgerFactoryImpl.
- Remove dependencies on implementation-specific classes.
-
Update ManagedCursor interface:
- Add required methods from ManagedCursorImpl.
- Remove dependencies on implementation-specific classes.
-
Introduce ReadOnlyManagedLedger interface:
- Extract methods specific to read-only operations.
- Update relevant code to use this interface where appropriate.
-
Decouple ManagedLedgerInterceptor:
- Introduce AddEntryOperation interface for beforeAddEntry method.
- Introduce LastEntryHandle interface for onManagedLedgerLastLedgerInitialize method.
- Update ManagedLedgerInterceptor to use these new interfaces.
-
Update ManagedLedgerStorage interface:
- Add methods to support multiple storage classes.
- Introduce getManagedLedgerStorageClass method to retrieve specific storage implementations.
-
Implement BookkeeperManagedLedgerStorageClass:
- Create a new class implementing ManagedLedgerStorageClass for BookKeeper.
- Move BookKeeper client creation and management to this class.
-
Update PulsarService and related classes:
- Modify to support creation and management of multiple ManagedLedgerFactory instances.
- Update configuration to allow specifying different storage classes for different namespaces or topics.
-
Update ManagedLedgerStorage interface:
- Remove direct dependencies on BookKeeper client.
- Introduce methods to interact with storage without exposing BookKeeper specifics.
-
Implement BookkeeperManagedLedgerStorageClass:
- Encapsulate BookKeeper client creation and management.
- Implement storage operations using BookKeeper client.
-
Update relevant code:
- Replace direct BookKeeper client usage with calls to ManagedLedgerStorage methods.
- Update configuration handling to support BookKeeper-specific settings through the new storage class.
- Add new configuration option to specify default ManagedLedger "storage class" at broker level.
- No major changes to external APIs are planned.
- The only API change is to add
managedLedgerStorageClassName
toPersistencePolicies
which can be used by a customManagedLedgerStorage
to control the ManagedLedgerFactory instance that is used for a particular namespace or topic.
The changes are internal and don't affect external APIs or behaviors. Backward compatibility is fully preserved in Apache Pulsar.
The decoupling of interfaces and implementation doesn't introduce new security concerns.
- Initial mailing List discussion thread: Preparing for Pulsar 4.0: cleaning up the Managed Ledger interfaces
- Merged Pull Request #22891: Replace dependencies on PositionImpl with Position interface
- Merged Pull Request #23311: Decouple ManagedLedger interfaces from the current implementation
- Implementation Pull Request #23313: Decouple Bookkeeper client from ManagedLedgerStorage and enable multiple ManagedLedgerFactory instances
- Mailing List PIP discussion thread: https://lists.apache.org/thread/rtnktrj7tp5ppog0235t2mf9sxrdpfr8
- Mailing List PIP voting thread: https://lists.apache.org/thread/4jj5dmk6jtpq05lcd6dxlkqpn7hov5gv