-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC : Point In Time Search #1147
Comments
@rajkthakur Can you make this issue a little more... fleshed out? I'm just seeing the dummy text. |
Hi @rajkthakur, are you actively working on this issue? if yes, please could you assign it to yourself and add a comment? |
@anasalkouz I am actively working on this issue. |
Can we mark this as a proposal? |
I have done a small POC to check feasibility of the proposed APIs We will not be able to provide a List All Points In Time APIs, as a point in time is not tied to any coordinator node. |
We will provide new information in the Nodes stats api - This will help keep track of point in time related statistics. |
We will provide settings to:
|
similar to scroll, we will provide all points in time via the api by passing |
Currently doing a POC to add an API for PIT disk utilization i.e. segments retained by PIT Ids. I am trying to put out stats similar to cat segments, but for Points In Time. |
This can get expensive. I see the objective is to optimize resource consumption ✔️ . I think this is a specialized use case for archival or time based analysis use cases and not the normal search use case so this should be configured through an Index Scoped setting. Segment replication will heavily change this design as it will push a lot of the PIT burden on the storage layer. |
@nknize (Among other things) PIT would be a replacement for scrolls and hence be the pagination solution for Opensearch, which is also an important use case to keep in mind. We would limit the number of PIT contexts that can be opened via a setting and would also provide an API to provide info about PITs' disk consumption Will look into how-to and benefits of configuring PIT through Index Scoped setting.
Segments retained by PIT/Scrolls will not be replicated I think. Can you plz elaborate the caveats you see? |
👍
A storage engine w/ verisoned backups (e.g., S3 buckets w/ versioning) can be used to restore files from backups. I think this along w/ Lucene sequence IDs enables this feature without having to retain as many historic segments. A storage engine w/o versioning (e.g., local NFS, or smb) could possibly use the segment retention logic provided by this feature. |
Ack. |
@CEHENKLE Can you please create a feature branch |
@rramachand21 is this still aiming for 2.0? |
I'm reading this feature as primarily a replacement/improvement over the scroll API. It seems like the primary use case is for pagination (which is currently solved by the scroll API but does have limitations). This essentially generalizes it a bit and gives semantics similar to snapshot isolation in a traditional database where you can do multiple queries within a transaction and observe a consistent view of the data. @nknize do you have any major concerns moving forward with this feature? I do have a nitpick about the name, though, particularly the |
The inability to share point-in-time contexts is mentioned several times, but what is the use case for sharing these contexts? The pagination use case makes total sense but I don't think that generally requires sharing the context. The ability to restrict maximum keep-alive duration also makes a lot of sense to put a cap on worst-case resource consumption. However, that limit is likely to be in tension with the usefulness of sharing the contexts if they are short-lived, so I'm curious about the use cases that are motivating the share-ability requirement. |
I think the major use case of share-ability is the ability to execute different types of queries and derive better insights on the same consistent view of data. Then some queries might fail or timeout due to various reasons. Once PIT is created retries become simpler as it allows user to resume queries on the same view
+1 on the thought, @andrross does |
@Bukhtawar I definitely like One last naming nitpick, I do prefer spelling it out as opposed to using the acronym "pit", but either way we should be consistent. If we stick with "point_in_time" then that should be used in the search request as well. |
@rajkthakur I'm getting this error |
@loretoparisi AWS OpenSearch will support point in time search in upcoming release. It is not supported in AWS OpenSearch 1.3. Point-in-time search will be supported in OpenSource OpenSearch 2.3.0 release. |
@rajkthakur: I see this issue is labeled for v2.3.0 release, which has code freeze today i.e. Sep 7. I see open backport PRs to 2.x. Can you please prioritize review/merge. |
Hi @rajkthakur, just checking in from the security team to see if there is anything you need finalized from us before the upcoming 2.4 freeze. |
@rajkthakur do you still track this for 2.4 release? code freeze on 11/3 |
@anasalkouz Yes we are tracking this for 2.4 release. Documentation is pending, we will be closing it before 11/3. |
Documentation PR - opensearch-project/documentation-website#1753 |
Opensearch does not currently appear to support `_shard_doc` as part of Point In Time search, so remove references to it from the documentation. Further details: - I don't see any reference to `_shard_doc` in the code on [Opensearch's](https://github.com/opensearch-project/OpenSearch) main branch at time of proposing the change. - ElasticSearch added `_shard_doc` in [7.12](elastic/elasticsearch-net#5337) and it looks like it was not added as part of Opensearch's [Point In Time work](opensearch-project/OpenSearch#1147).
Opensearch does not currently appear to support `_shard_doc` as part of Point In Time search, so remove references to it from the documentation. Further details: - I don't see any reference to `_shard_doc` in the code on [Opensearch's](https://github.com/opensearch-project/OpenSearch) main branch at time of proposing the change. - ElasticSearch added `_shard_doc` in [7.12](elastic/elasticsearch-net#5337) and it looks like it was not added as part of Opensearch's [Point In Time work](opensearch-project/OpenSearch#1147). Signed-off-by: Bryce Seager van Dyk <bryce@vandyk.net.nz>
Opensearch does not currently appear to support `_shard_doc` as part of Point In Time search, so remove references to it from the documentation. Further details: - I don't see any reference to `_shard_doc` in the code on [Opensearch's](https://github.com/opensearch-project/OpenSearch) main branch at time of proposing the change. - ElasticSearch added `_shard_doc` in [7.12](elastic/elasticsearch-net#5337) and it looks like it was not added as part of Opensearch's [Point In Time work](opensearch-project/OpenSearch#1147). Signed-off-by: Bryce Seager van Dyk <bryce@vandyk.net.nz>
Opensearch does not currently appear to support `_shard_doc` as part of Point In Time search, so remove references to it from the documentation. Further details: - I don't see any reference to `_shard_doc` in the code on [Opensearch's](https://github.com/opensearch-project/OpenSearch) main branch at time of proposing the change. - ElasticSearch added `_shard_doc` in [7.12](elastic/elasticsearch-net#5337) and it looks like it was not added as part of Opensearch's [Point In Time work](opensearch-project/OpenSearch#1147). Signed-off-by: Bryce Seager van Dyk <bryce@vandyk.net.nz> (cherry picked from commit 3470787) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Opensearch does not currently appear to support `_shard_doc` as part of Point In Time search, so remove references to it from the documentation. Further details: - I don't see any reference to `_shard_doc` in the code on [Opensearch's](https://github.com/opensearch-project/OpenSearch) main branch at time of proposing the change. - ElasticSearch added `_shard_doc` in [7.12](elastic/elasticsearch-net#5337) and it looks like it was not added as part of Opensearch's [Point In Time work](opensearch-project/OpenSearch#1147). (cherry picked from commit 3470787) Signed-off-by: Bryce Seager van Dyk <bryce@vandyk.net.nz> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Opensearch does not currently appear to support `_shard_doc` as part of Point In Time search, so remove references to it from the documentation. Further details: - I don't see any reference to `_shard_doc` in the code on [Opensearch's](https://github.com/opensearch-project/OpenSearch) main branch at time of proposing the change. - ElasticSearch added `_shard_doc` in [7.12](elastic/elasticsearch-net#5337) and it looks like it was not added as part of Opensearch's [Point In Time work](opensearch-project/OpenSearch#1147). Signed-off-by: Bryce Seager van Dyk <bryce@vandyk.net.nz>
Opensearch does not currently appear to support `_shard_doc` as part of Point In Time search, so remove references to it from the documentation. Further details: - I don't see any reference to `_shard_doc` in the code on [Opensearch's](https://github.com/opensearch-project/OpenSearch) main branch at time of proposing the change. - ElasticSearch added `_shard_doc` in [7.12](elastic/elasticsearch-net#5337) and it looks like it was not added as part of Opensearch's [Point In Time work](opensearch-project/OpenSearch#1147). Signed-off-by: Bryce Seager van Dyk <bryce@vandyk.net.nz>
@dhruv16dhr I've recently attempted again to use PIT using AWS OpenSearch / Kibana 2.5, but I'm getting
the currently installed version of AWS OS is
|
Is your feature request related to a problem? Please describe.
Today, in OpenSearch, if you want to run different queries on the same data set chances are you will get different result as data is constantly changing. However, in real world scenario when analyzing data or trying to provide a consistent user experience to your end users you may want the result from a query not to change while the context remains the same and control when changes should appear in the result set. You want to be able to query the same data set and paginate through the data set expecting consistent result. This is not possible using current available options in OpenSearch.
Opensearch currently supports the following options to achieve pagination, each having a certain limitation:
Describe the solution you'd like
Point in Time allows users to run different queries against the same fixed data set in time. Point in time only takes data into account up until the moment it is created. Hence, none of the resources that are required to return the data from the initial request are modified or deleted. Segments are retained, even though the segment might already have been merged away and is not needed for the live data set. In short, Point in Time Search allows user to maintain a state which can be re-used by different queries in order to achieve consistent results.
Key goals:
APIs
Create Point In Time API
Unlike a Scroll, by creating a dedicated Point in Time, we decouple the context from a single query and make it re-usable across arbitrary search requests by passing the Point in Time Id. We can achieve this by using the Create Point in Time API.
Delete Point In Time API
Point-in-times are automatically closed when the keep_alive is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. We may also delete a Point in Time and free the resources before its keep alive using the Delete Point in Time API.
DELETE /_point_in_time/<id>
List All Active Point In Time API
A useful admin API to have is to list all active Points in Time and their keep-alives.
Using a Point in Time in a search request:
In the search request we pass the point in time id and (optionally) a keep alive to extend the Point In Time. (Passing PIT id in search request is supported in Opensearch)
Search request with PIT ID will not accept indices, preference, routing and indices options as these are already passed at the time of creating a Point In Time.
The text was updated successfully, but these errors were encountered: