-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Searchable Remote Index #2900
Comments
So the tl;dr is that storage would be backed by remote storage with a cache. This makes sense. The part that I wanted to comment on is: Should we have separate API for creating remote indices or should we extend restore API for it or provide support in both the places? I feel like the end game is a tiered storage API that lets me create/update options on an index that say things like "store this on disk" or "store this in S3" or "store this on disk and replicate to S3 as a backup", .. The other way to look at this could be "configure snapshots" (exists today) and "mounting snapshots as read-only indexes" (aka "create an index backed by snapshots"). In either case, I would work backwards from that end state in terms of actual APIs and try to not introduce anything that we will knowingly want to deprecate. It would be good to read actual API proposals and be able to compare. |
Nice proposal. Few questions/suggestions:
Given the remote storage proposal, would it make sense to support search on that data instead of expecting a snapshot format. May be support both? Also related suggestion: Since the remote storage will also support writes in remote storage, should we avoid the assumption on read-only index from the beginning on read side? Might be OK to start with read only index, but would be great if design is extensible to cover the read-write index also.
If there is continuous sync to the remote storage, we may not need an explicit api to create remote index. However, if we place a restriction that it has to be 'read only' before we read, we might need a way for customer to mark it read only and optionally an api to delete it from hot cluster? I did not understand the second part of the above question "should we extend restore API for it or provide support in both the places" |
The goal is definitely to support both. We don't plan to design and implement this with the assumption that the remote indexes will always be read-only. The ability to search the snapshot format seems like a good incremental feature to deliver on the way to supporting reading from the writable remote storage to be delivered by #1968.
Can you elaborate a bit more on what you mean by this? |
The end goal is to make the search on remote backed index agnostic on underlying data format. OS engine communicates with the data via a Directory layer which can have different implementation based on the underlying storage format to support different formats. Here the plan was to focus on snapshot based format to start with and build other format supports once they are available. But yes the plan is to have extensible design.
Related to read-only index, this is my thought. Currently OpenSearch doesn't support mix of write and read shard for an index. #2578 talks about supporting that in long run where an index can have write only shards and read only shards, then appropriate requests can be routed to those shards. Once we have that capability then the mechanism developed here can be used by read-only shards for serving the search requests for remote indices. Until then we can use this mechanism to power read capability for an index backed by snapshot data. Also once the writable remote storage is available #1968, we can add capability to refresh the newly available data in remote store on reader side.
I may have confused by saying "creating remote indices", what I meant here was a mechanism to resurrect an index backed by snapshot for performing searches. Indices in snapshot will not be part of cluster state and not available for search so we need some mechanism to bring it in cluster state. The mechanism for that can be via a separate API like
Clarified above |
Great to see we are planing to support writable storage also in future. Thanks for the clarifications on "creating remote indices". For getting the indices back to cluster state, there are few considerations while deciding on using restore vs attach.
|
Azure Blob Storage, please. |
Very excited to see this proposal @sohami! I had a few questions to make sure I am understanding it.
Just trying to make sure I understand. The search request wouldn't be performed directly against the remote store? Is this mirroring the remote store to a local read only index? Is the actual index in the remote store also read-only?
On this +1 to what @dblock mentioned. But as a follow up, what are some reasons we would reuse the restore API versus a new one? |
Yes for this use case we don't plan to download all the shard data from remote storage while resurrecting the index to cluster state. However, that can also be extended to provide the support for full download in future if need be for any use case (like for faster search performance) which can be controlled by some index settings. The way I was thinking about restore is from user perspective that it prepares an index for serving the requests. For current hot local storage based index it involves restoring metadata and actual data to local storage. But for the remote index we can have the semantic to restore metadata only and leave data to remote store and prep shards with information to access that data.
This part is not covered in this RFC purposefully as I was expecting that too be handled as part of #1968 which is introducing remote index concept. Here we are focussing on the search capability on indices with data in remote store. Currently for snapshot based remote store backed indices, it is expected that snapshot will already exist and management of that snapshot will be done by the user. For the index, user will just perform attach and then delete to manipulate the existence of index from cluster state perspective. Delete from remote index perspective will be deleting the cluster state + any local cached data for that index. Probably we can document the behavior of delete for local store index and remote index Or we introduce a new API just for remote based index to remove it from cluster state. The full management of remote store index like delete and all will come into picture once we have full multi tier solution built out.
Yes. This is the first step to provide the search functionality on indices backed by remote store. Once we have the support for #1968 completed then we can incrementally add the support for that use case as well. With remote writable storage as well, there may be a use case where user want to now mount that index for search only then the attach mechanism can be used for it.
Not currently, current focus is to add the base functionality and then build various optimization incrementally.
I was thinking that auto attaching model will have better user experience when done via asynchronous search since all the latency of attaching the indices can be consumed behind the asynchronous call. So if we build the capability in core to attach such indices than integration with plugins like async search can be easily done. +1 on the point about the remote index metadata stored not only in repository layer but in some external db or other storage layer and potentially in different format as well. Restore is currently tied to repository layer and to work with snapshot format, so it may make more sense to have a separate attach API for it which can be extended to support these other use cases. |
There won't be complete download of the index data to local store. However, there will be on-demand download of the data during search request and kept in local FileCache. For this RFC we are targeting remote indices backed by snapshot so inherently that will immutable but we do plan to extend it later to remote indices which is mutable as well.
My initial thought was similar to restore, the new API (lets call it attach for now) will also have to perform checks, validations and create index in cluster state. The difference w.r.t regular hot snapshot restore will be for remote indices actual data won't be downloaded. So to avoid maintaining another API probably we can extend the restore API itself. However, @muralikpbhat raise a good point in context of metadata for remote indices available in different external store and not only in repository layer. So it will be worth having separate API which can encapsulate all these different store use cases whereas restore is specific to repository and snapshot layer. Seems like that will be a better experience than overloading a restore API. |
@sohami @muralikpbhat Can we move the API discussion into #2922? I wrote up a couple more concrete examples and would love any feedback you have. @sachinpkale @Bukhtawar Have you thought about the API for creating indexes that will be replicated to remote storage? I'd appreciate your feedback as well. As @dblock suggested we really should work backwards from the end state of where we want to get to eventually in terms of the various remote storage-enabled index APIs. |
Thanks everyone for the feedback. We will keep this issue open for one more week (until 5/13/2022) and then close this discussion. We'll continue tracking the various design and implementation tasks in linked issues. |
@andrross We will be using RemoteDirectory interface to store indexed data to the remote store. Please note that: the Directory interface is not fully opened yet. We have implemented only those methods that are required to store/restore the segments. We will gradually add implementation of more methods. |
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : #6798 Searchable Remote Index : #2900 Implementations Concurrent segment search for aggregations: #7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : #7203 Query cancellation support : #986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : #6798 Searchable Remote Index : #2900 Implementations Concurrent segment search for aggregations: #7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : #7203 Query cancellation support : #986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> (cherry picked from commit 7769682) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : #6798 Searchable Remote Index : #2900 Implementations Concurrent segment search for aggregations: #7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : #7203 Query cancellation support : #986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. (cherry picked from commit 7769682) Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
I have nominated and maintainers have agreed to invite Sorabh Hamirwasia(@sohami) to be a co-maintainer. Sorabh has kindly accepted. Sorabh has led the design and implementation of multiple features like query cancellation, concurrent segment search for aggregations and snapshot inter-operability w/remote storage in OpenSearch. Some significant issues and PRs authored by Sorabh for this effort are as follows: Feature proposals Concurrent segment search for aggregations : opensearch-project#6798 Searchable Remote Index : opensearch-project#2900 Implementations Concurrent segment search for aggregations: opensearch-project#7514 Lucene changes to leaf slices for concurrent search: apache/lucene#12374 Moving concurrent search to core : opensearch-project#7203 Query cancellation support : opensearch-project#986 In total, Sorabh has authored 14 PRs going back to Aug 2021. He also frequently reviews contributions from others and has reviewed nearly 20 PRs in the same time frame. I believe Sorabh will be a valuable addition as a maintainer of OpenSearch and will continue to contribute to the success of the project going forward. Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Today, the data indexed in OpenSearch is stored locally on the storage attached to the instance which limits it to the available instance storage. To store more data, users have to scale their hot tier which is costly. To provide cost savings, users can opt to store data in remote store using the snapshot mechanism in the configured repository. To perform analysis on the data which is stored in the remote store user will need to restore entire index to the hot tier which will again require them to scale the hot tier based on index sizes that needs to be restored and can be costly. This RFC proposes a mechanism to directly create a read-only index backed by the data in remote store in snapshot format and allow performing search requests on it without downloading all the data from remote store. This mechanism will also serve as core building blocks of search path for other remote store project currently in-flight [1][2].
Proposed Solution
We propose to provide capability to search a remote index backed by snapshot in a configured repository of different cloud providers (AWS S3, GCS, etc) supported by OpenSearch. This will be built using the learnings from Ultrawarm technology available in AWS managed OpenSearch service. The index data will not be all completely downloaded to the local store instead will be fetched on demand based on the search request. To provide better performance of repeated search queries, there will be local disk based LRU cache mechanism which will be used to store already downloaded data which will reduce the overhead of round trip to remote store and download latency. Users will need to configure nodes with a pre-defined cache size which can be used for remote indices shard. Note: To provide flexibility a node can be used for placing both remote index shard as well as regular index shard.
We also plan to provide support for configuring a special node attribute which can be used to only allocate shards of these remote indices. This will help to provide resource isolation between hot tier workload and search workloads on remote indices as they will be executed on specific nodes based on shard allocation. It will also help to keep isolation between local storage used for hot indices and cached data for remote indices. To that front, we will need changes in Shard allocation and recovery mechanisms for remote indices shards. Some of the things to consider is that these shards being backed by remote store will not need to perform peer recovery from primary shards as it can be done directly from remote store. These shards data durability will be tied to remote store and not node loss on which shards are assigned.
Apart from functional aspects, there are few performance related items which will be useful to improve search latency on these indices a) Concurrent segment search. The support for this was done as part of [3] with exceptions for aggregations. If this is enabled for hot indices then it should work out of the box for remote indices too, b) Prefetching support for data from remote store that will be required for serving the search request. This optimization will be focused on optimizing aggregation based queries. During execution first the docIds are found and scored (if needed) and then docValues for matched documents are fetched to perform the aggregations at shard level. Fetching these docValues sequentially will be expensive for a cold query. We can built a mechanism to collect the corresponding docValues infos for the matched document ids which will be needed during aggregation phase. Then prefetch those docValues in parallel rather than sequentially (during aggregation phase) from remote store to boost up the performance of search query.
Feedback Requested
Next Steps
Based on feedback from the community we will start on the high level design to figure out the user experience, end to end flow to support remote indices backed by snapshot.
Things Not Covered
This proposal doesn’t intend to cover below items for now:
References
The text was updated successfully, but these errors were encountered: