Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Remote Index] Design the CompositeStore combining remote and local store in tandem with the HybridDirectory #7764

Closed
kotwanikunal opened this issue May 25, 2023 · 4 comments
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@kotwanikunal
Copy link
Member

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

  • N/A

Additional context

@kotwanikunal kotwanikunal added enhancement Enhancement or improvement to existing feature or request untriaged Indexing & Search and removed untriaged labels May 25, 2023
@kotwanikunal kotwanikunal self-assigned this Jun 6, 2023
@kotwanikunal kotwanikunal moved this from Todo to In Progress in Concurrent Search Jun 8, 2023
@kotwanikunal
Copy link
Member Author

kotwanikunal commented Jun 22, 2023

Design - Remote Capable Store

1. Overview

  • The Store provides access to files utilized by an index shard within OpenSearch.

  • Every shard in OpenSearch has a dedicated local store which is utilized to access the Lucene directory - the lowest level of file abstraction used to read and write indices to the storage on the node.

  • The Remote Store feature added an additional store for index shards, which can be utilized as a mechanism for storing the shard metadata and segment files remotely, and recovering these files as needed.

  • The store’s operation is currently restricted to a single directory which can be a local directory or a remote directory and currently operate as independent store(s) utilized by a shard.

2. Problem Statement

  • Future phases (3, 4, 5) of the storage roadmap need enhanced Store capabilities for remote readable and writable indices within OpenSearch.
    • The current independent nature of the local and remote store/directory needs to be redesigned to support both the Store implementations to work in tandem.
  • Also, the direct exposure of Directory from Store prohibits it from being implementation agnostic from the Directory perspective - making the Store act like a wrapper around the specific Directory implementation rather than acting as a provider and consumer of data source for the shard.
    • This will cause issues when the Store needs to switch between remote or local directory usages seamlessly.

3. Goals

  1. Add capability within the Store to enable readable and writable remote indices i.e. enable support for both local and remote directory within the same Store implementation.
    1. Ensure that the store is capable of performing reads and writes on local and/or remote directory within a single implementation.
  2. Refactor store and shard recovery/open/close operations to be performed from within the Store implementation, reducing a direct exposure of the directory to the Store consumers.

4. Glossary

  • Store: A store is an OpenSearch construct which provides access to the Directory and files stored on the file system for a given shard, as well as provides access to the metadata about the shard.
  • Remote Store: A remote store is similar to the store in the nature of data and metadata it stores and provides, with a simple difference that the directory points to a non-local, cloud based folder within a Repository.
  • Directory: A directory is a Lucene construct which represents a single level folder structure which stores the actual files for a Lucene index, represented by a shard in OpenSearch.
  • Recovery: The process that a shard undergoes to get initialized from local storage or a peer node or the repository, making it capable to serve search as well as index further documents.

5. Proposed solution

The solution revolves around refactoring the Store class into the structure depicted in the diagram below.

Store-Refactoring

Phase 1: introduce the store interface and remote implementation

  • Create an interface for the Store as described above.
    • The common functions between the SimpleStore and CompositeStore will have a default implementation within the Store interface.
  • Introduce a new SimpleStore
    • Move existing Store logic within the SimpleStore to perform the local index operations as is.
    • Mark the class as final and prevent any additional changes to the class going forward.
  • Introduce a new CompositeStore
    • Initially, the CompositeStore will simply delegate the operations to SimpleStore till the future phases have additional implementation baked in to the store.
  • Update the IndexShard to consume one of the new Stores
    • Based on the index metadata, wire in either the SimpleStore or the CompositeStore within the IndexShard.

Phase 2: introduce support for recover, open + T.B.D within the store interface

  • Refactor the recovery business logic to be performed and processed within the Store
    • Currently, local and remote recovery are triggered from within the IndexShard, and are eventually delegated to StoreRecovery, which fetches the directory from the Store to perform raw operations.
    • Going forward, Store should have the capability to directly perform the recover operation based on the current implementation and index metadata.
  • Refactor the shard initialization (open) to be handled by the Store instance instead of passing out the directory
    • Similar to the operation described above
    • Perform engine/shard initialization by passing out the needed parameters from the Store instead of directly providing access to the Directory
  • Further reduce access to the Directory outside the Store
    • T.B.D

Pros:

  • The local only index store is unaffected by the changes for the future phases.
  • Unwanted directory access is reduced, preventing any unintended consequences to the local or remote directory.
  • The pluggable nature of Store makes it easier to add additional implementations, if needed.

Cons:

  • Directory usage minimization might need a lot more refactoring to make the Store usage safer.

6. Other Approaches

6.1 Add a remote directory to the existing store

  • A new variable remoteDirectory can be added to the existing store implementation, with internal enhancements on the exposed functions to use the appropriate directory based on the index configuration (remote first/local first)
  • This approach will also consolidate the RemoteStore and Store implementation into a single, CompositeStore

6.1.1 Pros

  • Simple refactoring - will only add in a new variable to the existing Store

6.1.2 Cons

  • Additional remote store functionality can cause unintended consequences to the existing Store implementation
  • Goal 2 described above will not be met. The directory usages will continue to be exposed as they are currently

7. Rollout plan

Refactoring Phase 1

  • Create an interface for the Store
  • Move existing logic from the Store class into SimpleStore
  • Create a stub only CompositeStore which performs SimpleStore operations (for now)

Refactoring Phase 2

  • Reduce directory usages outside the Store itself
    • Logically pull in the methods which use the directory as a part of the Store interface - like recover/open etc.
    • Refactor the recover method
      • Add the recover method to the Store interface
      • Add implementation for recover within the CompositeStore and SimpleStore
    • Refactor the open method
    • T.B.D

8. Backwards compatibility

  • The Store class is marked as ***@opensearch.internal***** ** **. Public facing compatibilities should be unaffected.

9. Appendix

Appendix A: FAQ

@kotwanikunal kotwanikunal added the discuss Issues intended to help drive brainstorming and decision making label Jun 22, 2023
@kotwanikunal kotwanikunal moved this from In Progress to Done in Concurrent Search Jun 22, 2023
@kotwanikunal
Copy link
Member Author

cc: @Bukhtawar

@Bukhtawar
Copy link
Collaborator

Thanks @kotwanikunal for drafting this. I was wondering if we could call the extensions more generically like SimpleStore and CompositeStore and call the instance LocalOnlyStore instead.

@kotwanikunal
Copy link
Member Author

Closing out the issue. Please re-open or drop in any comments/feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
Development

No branches or pull requests

2 participants