Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Stronger directory abstraction for segment storage #13075

Open
sachinpkale opened this issue Apr 4, 2024 · 6 comments
Open
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Remote

Comments

@sachinpkale
Copy link
Member

sachinpkale commented Apr 4, 2024

Is your feature request related to a problem? Please describe

RemoteDirectory abstraction was created to upload and download segments to and from the configured remote store. This abstraction was created to make it consistent with existing Directory interface that Lucene uses for segment operations (create, read, delete). But the RemoteDirectory abstraction is incomplete as it has no knowledge of FsDirectory implementation that handles segment operations for local store. IndexShard contains two instances of Store: store which contains FsDirectory instance for local disk and remoteStore which contains RemoteDirectory instance for remote store. Except the common IndexShard parent, store and remoteStore do not know anything about each other. Sync between these two stores is scattered across various code flows. This makes the entire abstraction leaky and error-prone. As we plan to add more features on top of remote backed storage (1. Searchable Remote Index 2. Writeable Warm), we need to come up with stronger directory abstraction to avoid non-maintainable code.

Describe the solution you'd like

Ideally, Store should encapsulate all the segment storage related constructs and corresponding syncs between these constructs. For the other operations like indexing or search, storage should be seen as a black box and can be accessed with provided interface. This also aligns with broader modularity vision with the next step of abstracting out storage as a separate module. For this RFC, we limit the discussion on segment storage abstraction only.

We propose to provide all the segment storage abstractions in the form of Directory. We call it OpenSearchDirectory.
OpenSearchDirectory will have two components:

  • Cache - This is again represented in the form of Directory. Various cache implementations can be provided based on the usage pattern like write-through, write-behind, write-around etc. Cache can be optional as well.
  • Storage - Directory implementation (FsDirectory, RemoteSegmentStoreDirectory)

On top of existing Directory interface, the OpenSearchDirectory implementation will provide further abstractions like stats around remote store interaction, whether a segment file present in cache or storage or both etc. The actual sync between cache and storage would be hidden from the core. The OpenSearch core should not be aware whether a given segment is getting served from a cache or from storage as long as the right directory implementations for cache and storage are used.

Based on the use case, the role of cache and storage can be changed. For some known use cases, we can define cache and storage components as given below:

  • OpenSearch cluster without remote store - Cache is not defined and Storage is local disk
  • Remote Backed Storage - Cache is local disk (with no evictions) and Storage is S3.
  • Warm Index - Cache is local disk (with LRU evictions) and Storage is S3.

Related component

Storage:Remote

Describe alternatives you've considered

There were few alternative approaches proposed mostly to tackle the same problem:

  1. [Searchable Remote Index] Design the CompositeStore combining remote and local store in tandem with the HybridDirectory #7764
  2. [Remote Store] Inject remote store in IndexShard instead of RemoteStoreRefreshListener #3703 (comment)

Additional context

ToDo:

  1. Add more details on existing segment upload/download flow and how it is tightly coupled with core
  2. Add details around the end goal state
@sachinpkale sachinpkale added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 4, 2024
@sachinpkale sachinpkale changed the title [Feature Request] Stronger directory abstraction for segment storage using remote store [Feature Request] Stronger directory abstraction for segment storage Apr 4, 2024
@mch2
Copy link
Member

mch2 commented Apr 4, 2024

Thanks @sachinpkale I think there is definitely some much needed refactoring here.

How does a new Directory implementation here solve the issue of multiple Stores in IndexShard. Are you planning to move the remote store specific functionality currently in IndexShard to this new directory? ex. syncSegmentsFromRemoteSegmentStore? Where would these fit with directory API?

From what I can tell the only use of the remoteStore from within IndexShard is to fetch its remoteDirectory instance and do things with it. We could maybe extend the existing (somewhat already bloated) public API in store?

How does the OpenSearchDirectory fit with the plans for a CompositeDirectory? The injection of a cache & other dir implementations sounds very similar to the intent there.

@sachinpkale
Copy link
Member Author

How does a new Directory implementation here solve the issue of multiple Stores in IndexShard

With new directory, we don't need multiple stores in IndexShard. One store instance that contains OpenSearchDirectory instance.

Are you planning to move the remote store specific functionality currently in IndexShard to this new directory? ex. syncSegmentsFromRemoteSegmentStore? Where would these fit with directory API?

Yes, syncSegmentsFromRemoteSegmentStore as well as logic in RemoteStoreRefreshListener will move to new directory. (answering second question at the last)

From what I can tell the only use of the remoteStore from within IndexShard is to fetch its remoteDirectory instance and do things with it. We could maybe extend the existing (somewhat already bloated) public API in store?

We can do this but the tight coupling of core and storage remains. For example, in replica promotion or recovery flows, we have if conditions added at multiple places to check if the index is remote store enabled or not. With the new directory, we want to remove these checks as well. So, core's interaction with segment storage does not change with or without remote store.

How does the OpenSearchDirectory fit with the plans for a CompositeDirectory? The injection of a cache & other dir implementations sounds very similar to the intent there.

Yes, it is similar on pattern where it contains 2 directories but it doesn't handle sync between cache and storage. Also, it does not provide any extensions to cache so only one type of cache can be present. But CompositeDirectory itself can be evolved into OpenSearchDirectory by adding right set of abstractions.

Where would remote store specific functionality fit with directory API?

We don't have clear answers to it but this is what I have thought about:

  • We need a way to tell OpenSearchDirectory to sync data from cache to storage and vice versa.
  • Cache to Storage sync is same as segment upload flow.
  • Storage to Cache sync is same as segment download flow.

Cache to Storage Sync options:

  • Upload at each write to cache (Write through cache) - whenever a file is written to cache, we write it to storage but we create various tmp files before refresh that are renamed later. This could mean un-necessary writes to remote.
  • Upload only after refresh - This is what we have today. To achieve this using directory implementation, we either use one of the existing interface methods or introduce a new one.
    • Using existing interface: IMO, we can re-use sync method from directory to trigger segment upload but sync is called only at flush and not on refresh. So, we need to make changes to core to call Directory.sync() post each refresh. But sync also handles fsync of actual segment files that we don't want to trigger on each refresh. So, OpenSearchDirectory needs to be smart enough to not call fsync if both cache and storage are configured (this logic still needs some more thinking)
    • Using new interface: We introduce new method to OpenSearchDirectory, called syncCacheToStorage (or something similar). This will be called from core post each refresh. We don't have to add if-else in core to call this. Irrespective of use-case: DocRep, Remote backed storage, Writeable warm, this method will be called and if cache is not defined, it will be a no-op.

Storage to Cache Sync:

  • There could be a new method introduced in OpenSearchDirectory, say init() which will be called whenever we need to pull data from source of truth. Examples are replication flow, failover flow, recovery flow etc.

@sachinpkale
Copy link
Member Author

With OpenSearchDirectory, we are exploring the feasibility of OpenSearch core interacting with only one interface: OpenSearchDirectory and all the existing and future use cases around segment storage would be encapsulated within OpenSearchDirectory interface. Even though this issue talks about remote store based use cases, OpenSearchDirectory can be used where remote store is not used.

@ankitkala
Copy link
Member

Thanks Sachin for the writeup. I definitely agree on keeping a single composite store object which a single directory.

How does the OpenSearchDirectory fit with the plans for a CompositeDirectory? The injection of a cache & other dir implementations sounds very similar to the intent there.

Yes, it is similar on pattern where it contains 2 directories but it doesn't handle sync between cache and storage. Also, it does not provide any extensions to cache so only one type of cache can be present. But CompositeDirectory itself can be evolved into OpenSearchDirectory by adding right set of abstractions.

Yes, That was the intent behind the current design for CompositeDirectory. Though it doesn't handle cache & storage sync it can definitely be evolved in this direction.

Storage to Cache Sync:

  • There could be a new method introduced in OpenSearchDirectory, say init() which will be called whenever we need to pull data from source of truth. Examples are replication flow, failover flow, recovery flow etc.

Just to add, I think OpenSearchDirectory should also be able to operate in Read only v/s write mode(or master/slave) for primary & replicas. Directionality of sync can be depending on how directory is configured. And we should be able to flip the behavior at runtime.

@sachinpkale
Copy link
Member Author

@thanks @ankitkala for reviewing.

I think OpenSearchDirectory should also be able to operate in Read only v/s write mode(or master/slave) for primary & replicas. Directionality of sync can be depending on how directory is configured. And we should be able to flip the behavior at runtime.

I am not sure if I understand it completely. Directory should not be knowing if it is a part of primary or replica, right?

@ankitkala
Copy link
Member

@thanks @ankitkala for reviewing.

I think OpenSearchDirectory should also be able to operate in Read only v/s write mode(or master/slave) for primary & replicas. Directionality of sync can be depending on how directory is configured. And we should be able to flip the behavior at runtime.

I am not sure if I understand it completely. Directory should not be knowing if it is a part of primary or replica, right?

Correct. I just meant that directory should still be able to distinguish whether the remote is writable or not (i.e. replica's directory shouldn't be able to write).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Remote
Projects
Status: New
Status: 🆕 New
Development

No branches or pull requests

3 participants