-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Stronger directory abstraction for segment storage #13075
Comments
Thanks @sachinpkale I think there is definitely some much needed refactoring here. How does a new Directory implementation here solve the issue of multiple Stores in IndexShard. Are you planning to move the remote store specific functionality currently in IndexShard to this new directory? ex. From what I can tell the only use of the remoteStore from within IndexShard is to fetch its remoteDirectory instance and do things with it. We could maybe extend the existing (somewhat already bloated) public API in store? How does the OpenSearchDirectory fit with the plans for a CompositeDirectory? The injection of a cache & other dir implementations sounds very similar to the intent there. |
With new directory, we don't need multiple stores in IndexShard. One store instance that contains
Yes,
We can do this but the tight coupling of core and storage remains. For example, in replica promotion or recovery flows, we have if conditions added at multiple places to check if the index is remote store enabled or not. With the new directory, we want to remove these checks as well. So, core's interaction with segment storage does not change with or without remote store.
Yes, it is similar on pattern where it contains 2 directories but it doesn't handle sync between cache and storage. Also, it does not provide any extensions to cache so only one type of cache can be present. But
We don't have clear answers to it but this is what I have thought about:
Cache to Storage Sync options:
Storage to Cache Sync:
|
With OpenSearchDirectory, we are exploring the feasibility of OpenSearch core interacting with only one interface: |
Thanks Sachin for the writeup. I definitely agree on keeping a single composite store object which a single directory.
Yes, That was the intent behind the current design for CompositeDirectory. Though it doesn't handle cache & storage sync it can definitely be evolved in this direction.
Just to add, I think |
@thanks @ankitkala for reviewing.
I am not sure if I understand it completely. Directory should not be knowing if it is a part of primary or replica, right? |
Correct. I just meant that directory should still be able to distinguish whether the remote is writable or not (i.e. replica's directory shouldn't be able to write). |
Is your feature request related to a problem? Please describe
RemoteDirectory
abstraction was created to upload and download segments to and from the configured remote store. This abstraction was created to make it consistent with existing Directory interface that Lucene uses for segment operations (create, read, delete). But the RemoteDirectory abstraction is incomplete as it has no knowledge of FsDirectory implementation that handles segment operations for local store.IndexShard
contains two instances of Store:store
which contains FsDirectory instance for local disk andremoteStore
which contains RemoteDirectory instance for remote store. Except the common IndexShard parent, store and remoteStore do not know anything about each other. Sync between these two stores is scattered across various code flows. This makes the entire abstraction leaky and error-prone. As we plan to add more features on top of remote backed storage (1. Searchable Remote Index 2. Writeable Warm), we need to come up with stronger directory abstraction to avoid non-maintainable code.Describe the solution you'd like
Ideally,
Store
should encapsulate all the segment storage related constructs and corresponding syncs between these constructs. For the other operations like indexing or search, storage should be seen as a black box and can be accessed with provided interface. This also aligns with broader modularity vision with the next step of abstracting out storage as a separate module. For this RFC, we limit the discussion on segment storage abstraction only.We propose to provide all the segment storage abstractions in the form of Directory. We call it
OpenSearchDirectory
.OpenSearchDirectory will have two components:
On top of existing Directory interface, the OpenSearchDirectory implementation will provide further abstractions like stats around remote store interaction, whether a segment file present in cache or storage or both etc. The actual sync between cache and storage would be hidden from the core. The OpenSearch core should not be aware whether a given segment is getting served from a cache or from storage as long as the right directory implementations for cache and storage are used.
Based on the use case, the role of cache and storage can be changed. For some known use cases, we can define cache and storage components as given below:
Related component
Storage:Remote
Describe alternatives you've considered
There were few alternative approaches proposed mostly to tackle the same problem:
Additional context
ToDo:
The text was updated successfully, but these errors were encountered: