, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (e.g. "an," "but," "this") from the query string.
-`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
+
+### Fuzzy query options
+
+Option | Valid values | Description
+:--- | :--- | :---
+`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
+`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
+`fuzzy_max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indexes.
+
+### Synonyms in a multiple terms search
+
+You can also use synonyms with the `terms` query type to search for multiple terms. Use the `auto_generate_synonyms_phrase_query` Boolean field. By default it is set to `true`. It automatically generates phrase queries for multiple term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` when the option is `true` or `ba OR (batting AND average)` when the option is `false`.
+
+To learn more about the multiple terms query type, see [Terms]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/#terms). For more reference information about phrase queries, see the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html).
+
+### Other advanced options
+
+You can also use the following optional query fields to filter your query results.
+
+Option | Valid values | Description
+:--- | :--- | :---
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. The default is 1.0.
-`cutoff_frequency` | Between `0.0` and `1.0` or a positive integer | This value lets you define high and low frequency terms based on number of occurrences in the index. Numbers between 0 and 1 are treated as a percentage. For example, 0.10 is 10%. This value means that if a word occurs within the search field in more than 10% of the documents on the shard, OpenSearch considers the word "high frequency" and deemphasizes it when calculating search score.
Because this setting is *per shard*, testing its impact on search results can be challenging unless a cluster has many documents.
`enable_position_increments` | Boolean | When true, result queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. The default is true.
`fields` | String array | The list of fields to search (e.g. `"fields": ["title^4", "description"]`). If unspecified, defaults to the `index.query.default_field` setting, which defaults to `["*"]`.
-`flags` | String | A `|`-delimited string of [flags](#simple-query-string) to enable (e.g. `AND|OR|NOT`). The default is `ALL`.
-`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
-`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n").
If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
+`flags` | String | A `|`-delimited string of [flags](#simple-query-string) to enable (e.g., `AND|OR|NOT`). The default is `ALL`. You can explicitly set the value for `default_field`. For example, to return all titles, set it to `"default_field": "title"`.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
-`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common terms](#common-terms) queries and `operator` in this table.
+`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See also `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
-`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50.
-`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common terms](#common-terms) queries.
+`max_expansions` | Positive integer | `max_expansions` specifies the maximum number of terms to which the query can expand. The default is 50.
+`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
`phrase_slop` | `0` (default) or a positive integer | See `slop`.
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.
@@ -431,6 +473,9 @@ Option | Valid values | Description
`rewrite` | `constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, top_terms_blended_freqs_N` | Determines how OpenSearch rewrites and scores multi-term queries. The default is `constant_score`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`tie_breaker` | `0.0` (default) to `1.0` | Changes the way OpenSearch scores searches. For example, a `type` of `best_fields` typically uses the highest score from any one field. If you specify a `tie_breaker` value between 0.0 and 1.0, the score changes to highest score + `tie_breaker` * score for all other matching fields. If you specify a value of 1.0, OpenSearch adds together the scores for all matching fields (effectively defeating the purpose of `best_fields`).
-`time_zone` | UTC offset | The time zone to use (e.g. `-08:00`) if the query string contains a date range (e.g. `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default is `UTC`.
+`time_zone` | UTC offset hours | Specifies the number of hours to offset the desired time zone from `UTC`. You need to indicate the time zone offset number if the query string contains a date range. For example, set `time_zone": "-08:00"` for a query with a date range such as `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default time zone format used to specify number of offset hours is `UTC`.
`type` | `best_fields, most_fields, cross_fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`.
`zero_terms_query` | `none, all` | If the analyzer removes all terms from a query string, whether to match no documents (default) or all documents. For example, the `stop` analyzer removes all terms from the string "an but this."
+
+
diff --git a/_opensearch/query-dsl/full-text/query-string.md b/_query-dsl/query-dsl/full-text/query-string.md
similarity index 98%
rename from _opensearch/query-dsl/full-text/query-string.md
rename to _query-dsl/query-dsl/full-text/query-string.md
index 3688a2d239..258caa1416 100644
--- a/_opensearch/query-dsl/full-text/query-string.md
+++ b/_query-dsl/query-dsl/full-text/query-string.md
@@ -4,6 +4,9 @@ title: Query string queries
parent: Full-text queries
grand_parent: Query DSL
nav_order: 25
+permalink: /query-dsl/full-text/query-string/
+redirect_from:
+ - /opensearch/query-dsl/full-text/query-string/
---
# Query string queries
diff --git a/_opensearch/query-dsl/geo-and-xy/geo-bounding-box.md b/_query-dsl/query-dsl/geo-and-xy/geo-bounding-box.md
similarity index 98%
rename from _opensearch/query-dsl/geo-and-xy/geo-bounding-box.md
rename to _query-dsl/query-dsl/geo-and-xy/geo-bounding-box.md
index 7177334827..0dc63f3452 100644
--- a/_opensearch/query-dsl/geo-and-xy/geo-bounding-box.md
+++ b/_query-dsl/query-dsl/geo-and-xy/geo-bounding-box.md
@@ -4,6 +4,9 @@ title: Geo-bounding box queries
parent: Geographic and xy queries
grand_parent: Query DSL
nav_order: 10
+permalink: /query-dsl/geo-and-xy/geo-bounding-box/
+redirect_from:
+ - /opensearch/query-dsl/geo-and-xy/geo-bounding-box/
---
# Geo-bounding box queries
diff --git a/_opensearch/query-dsl/geo-and-xy/index.md b/_query-dsl/query-dsl/geo-and-xy/index.md
similarity index 96%
rename from _opensearch/query-dsl/geo-and-xy/index.md
rename to _query-dsl/query-dsl/geo-and-xy/index.md
index ba9f2b590e..7c2dadb4cb 100644
--- a/_opensearch/query-dsl/geo-and-xy/index.md
+++ b/_query-dsl/query-dsl/geo-and-xy/index.md
@@ -4,6 +4,9 @@ title: Geographic and xy queries
parent: Query DSL
has_children: true
nav_order: 50
+permalink: /query-dsl/geo-and-xy/
+redirect_from:
+ - /opensearch/query-dsl/geo-and-xy/index/
---
# Geographic and xy queries
diff --git a/_query-dsl/query-dsl/geo-and-xy/xy.md b/_query-dsl/query-dsl/geo-and-xy/xy.md
new file mode 100644
index 0000000000..6b29063bf6
--- /dev/null
+++ b/_query-dsl/query-dsl/geo-and-xy/xy.md
@@ -0,0 +1,438 @@
+---
+layout: default
+title: xy queries
+parent: Geographic and xy queries
+grand_parent: Query DSL
+nav_order: 50
+permalink: /query-dsl/geo-and-xy/xy/
+redirect_from:
+ - /opensearch/query-dsl/geo-and-xy/xy/
+---
+
+# xy queries
+
+To search for documents that contain [xy point]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-point) and [xy shape]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-shape) fields, use an xy query.
+
+## Spatial relations
+
+When you provide an xy shape to the xy query, the xy fields are matched using the following spatial relations to the provided shape.
+
+Relation | Description | Supporting xy Field Type
+:--- | :--- | :---
+`INTERSECTS` | (Default) Matches documents whose xy point or xy shape intersects the shape provided in the query. | `xy_point`, `xy_shape`
+`DISJOINT` | Matches documents whose xy shape does not intersect with the shape provided in the query. | `xy_shape`
+`WITHIN` | Matches documents whose xy shape is completely within the shape provided in the query. | `xy_shape`
+`CONTAINS` | Matches documents whose xy shape completely contains the shape provided in the query. | `xy_shape`
+
+The following examples illustrate searching for documents that contain xy shapes. To learn how to search for documents that contain xy points, see the [Querying xy points](#querying-xy-points) section.
+
+## Defining the shape in an xy query
+
+You can define the shape in an xy query either by providing a new shape definition at query time or by referencing the name of a shape pre-indexed in another index.
+
+### Using a new shape definition
+
+To provide a new shape to an xy query, define it in the `xy_shape` field.
+
+The following example illustrates searching for documents with xy shapes that match an xy shape defined at query time.
+
+First, create an index and map the `geometry` field as an `xy_shape`:
+
+```json
+PUT testindex
+{
+ "mappings": {
+ "properties": {
+ "geometry": {
+ "type": "xy_shape"
+ }
+ }
+ }
+}
+```
+
+Index a document with a point and a document with a polygon:
+
+```json
+PUT testindex/_doc/1
+{
+ "geometry": {
+ "type": "point",
+ "coordinates": [0.5, 3.0]
+ }
+}
+
+PUT testindex/_doc/2
+{
+ "geometry" : {
+ "type" : "polygon",
+ "coordinates" : [
+ [[2.5, 6.0],
+ [0.5, 4.5],
+ [1.5, 2.0],
+ [3.5, 3.5],
+ [2.5, 6.0]]
+ ]
+ }
+}
+```
+
+Define an [`envelope`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-shape#envelope)—a bounding rectangle in the `[[minX, maxY], [maxX, minY]]` format. Search for documents with xy points or shapes that intersect that envelope:
+
+```json
+GET testindex/_search
+{
+ "query": {
+ "xy_shape": {
+ "geometry": {
+ "shape": {
+ "type": "envelope",
+ "coordinates": [ [ 0.0, 6.0], [ 4.0, 2.0] ]
+ },
+ "relation": "WITHIN"
+ }
+ }
+ }
+}
+```
+
+The following image depicts the example. Both the point and the polygon are within the bounding envelope.
+
+
+
+
+The response contains both documents:
+
+```json
+{
+ "took" : 363,
+ "timed_out" : false,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ },
+ "hits" : {
+ "total" : {
+ "value" : 2,
+ "relation" : "eq"
+ },
+ "max_score" : 0.0,
+ "hits" : [
+ {
+ "_index" : "testindex",
+ "_id" : "1",
+ "_score" : 0.0,
+ "_source" : {
+ "geometry" : {
+ "type" : "point",
+ "coordinates" : [
+ 0.5,
+ 3.0
+ ]
+ }
+ }
+ },
+ {
+ "_index" : "testindex",
+ "_id" : "2",
+ "_score" : 0.0,
+ "_source" : {
+ "geometry" : {
+ "type" : "polygon",
+ "coordinates" : [
+ [
+ [
+ 2.5,
+ 6.0
+ ],
+ [
+ 0.5,
+ 4.5
+ ],
+ [
+ 1.5,
+ 2.0
+ ],
+ [
+ 3.5,
+ 3.5
+ ],
+ [
+ 2.5,
+ 6.0
+ ]
+ ]
+ ]
+ }
+ }
+ }
+ ]
+ }
+}
+```
+
+### Using a pre-indexed shape definition
+
+When constructing an xy query, you can also reference the name of a shape pre-indexed in another index. Using this method, you can define an xy shape at index time and refer to it by name, providing the following parameters in the `indexed_shape` object.
+
+Parameter | Description
+:--- | :---
+index | The name of the index that contains the pre-indexed shape.
+id | The document ID of the document that contains the pre-indexed shape.
+path | The field name of the field that contains the pre-indexed shape as a path.
+
+The following example illustrates referencing the name of a shape pre-indexed in another index. In this example, the index `pre-indexed-shapes` contains the shape that defines the boundaries, and the index `testindex` contains the shapes whose locations are checked against those boundaries.
+
+First, create an index `pre-indexed-shapes` and map the `geometry` field for this index as an `xy_shape`:
+
+```json
+PUT pre-indexed-shapes
+{
+ "mappings": {
+ "properties": {
+ "geometry": {
+ "type": "xy_shape"
+ }
+ }
+ }
+}
+```
+
+Index an envelope that specifies the boundaries and name it `rectangle`:
+
+```json
+PUT pre-indexed-shapes/_doc/rectangle
+{
+ "geometry": {
+ "type": "envelope",
+ "coordinates" : [ [ 0.0, 6.0], [ 4.0, 2.0] ]
+ }
+}
+```
+
+Index a document with a point and a document with a polygon into the index `testindex`:
+
+```json
+PUT testindex/_doc/1
+{
+ "geometry": {
+ "type": "point",
+ "coordinates": [0.5, 3.0]
+ }
+}
+
+PUT testindex/_doc/2
+{
+ "geometry" : {
+ "type" : "polygon",
+ "coordinates" : [
+ [[2.5, 6.0],
+ [0.5, 4.5],
+ [1.5, 2.0],
+ [3.5, 3.5],
+ [2.5, 6.0]]
+ ]
+ }
+}
+```
+
+Search for documents with shapes that intersect `rectangle` in the index `testindex` using a filter:
+
+```json
+GET testindex/_search
+{
+ "query": {
+ "bool": {
+ "filter": {
+ "xy_shape": {
+ "geometry": {
+ "indexed_shape": {
+ "index": "pre-indexed-shapes",
+ "id": "rectangle",
+ "path": "geometry"
+ }
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+The preceding query uses the default spatial relation `INTERSECTS` and returns both the point and the polygon:
+
+```json
+{
+ "took" : 26,
+ "timed_out" : false,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ },
+ "hits" : {
+ "total" : {
+ "value" : 2,
+ "relation" : "eq"
+ },
+ "max_score" : 0.0,
+ "hits" : [
+ {
+ "_index" : "testindex",
+ "_id" : "1",
+ "_score" : 0.0,
+ "_source" : {
+ "geometry" : {
+ "type" : "point",
+ "coordinates" : [
+ 0.5,
+ 3.0
+ ]
+ }
+ }
+ },
+ {
+ "_index" : "testindex",
+ "_id" : "2",
+ "_score" : 0.0,
+ "_source" : {
+ "geometry" : {
+ "type" : "polygon",
+ "coordinates" : [
+ [
+ [
+ 2.5,
+ 6.0
+ ],
+ [
+ 0.5,
+ 4.5
+ ],
+ [
+ 1.5,
+ 2.0
+ ],
+ [
+ 3.5,
+ 3.5
+ ],
+ [
+ 2.5,
+ 6.0
+ ]
+ ]
+ ]
+ }
+ }
+ }
+ ]
+ }
+}
+```
+
+## Querying xy points
+
+You can also use an xy query to search for documents that contain xy points.
+
+Create a mapping with `point` as `xy_point`:
+
+```json
+PUT testindex1
+{
+ "mappings": {
+ "properties": {
+ "point": {
+ "type": "xy_point"
+ }
+ }
+ }
+}
+```
+
+Index three points:
+
+```json
+PUT testindex1/_doc/1
+{
+ "point": "1.0, 1.0"
+}
+
+PUT testindex1/_doc/2
+{
+ "point": "2.0, 0.0"
+}
+
+PUT testindex1/_doc/3
+{
+ "point": "-2.0, 2.0"
+}
+```
+
+Search for points that lie within the circle with the center at (0, 0) and a radius of 2:
+
+```json
+GET testindex1/_search
+{
+ "query": {
+ "xy_shape": {
+ "point": {
+ "shape": {
+ "type": "circle",
+ "coordinates": [0.0, 0.0],
+ "radius": 2
+ }
+ }
+ }
+ }
+}
+```
+
+xy point only supports the default `INTERSECTS` spatial relation, so you don't need to provide the `relation` parameter.
+{: .note}
+
+The following image depicts the example. Points 1 and 2 are within the circle, and point 3 is outside the circle.
+
+
+
+The response returns documents 1 and 2:
+
+```json
+{
+ "took" : 575,
+ "timed_out" : false,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ },
+ "hits" : {
+ "total" : {
+ "value" : 2,
+ "relation" : "eq"
+ },
+ "max_score" : 0.0,
+ "hits" : [
+ {
+ "_index" : "testindex1",
+ "_id" : "1",
+ "_score" : 0.0,
+ "_source" : {
+ "point" : "1.0, 1.0"
+ }
+ },
+ {
+ "_index" : "testindex1",
+ "_id" : "2",
+ "_score" : 0.0,
+ "_source" : {
+ "point" : "2.0, 0.0"
+ }
+ }
+ ]
+ }
+}
+```
\ No newline at end of file
diff --git a/_opensearch/query-dsl/index.md b/_query-dsl/query-dsl/index.md
similarity index 98%
rename from _opensearch/query-dsl/index.md
rename to _query-dsl/query-dsl/index.md
index 6f7c277b24..520e2bd737 100644
--- a/_opensearch/query-dsl/index.md
+++ b/_query-dsl/query-dsl/index.md
@@ -1,10 +1,12 @@
---
layout: default
title: Query DSL
-nav_order: 27
+nav_order: 2
has_children: true
+permalink: /query-dsl/
redirect_from:
- /opensearch/query-dsl/
+ - /opensearch/query-dsl/index/
- /docs/opensearch/query-dsl/
---
diff --git a/_opensearch/query-dsl/query-filter-context.md b/_query-dsl/query-dsl/query-filter-context.md
similarity index 98%
rename from _opensearch/query-dsl/query-filter-context.md
rename to _query-dsl/query-dsl/query-filter-context.md
index 53f716c234..05996bfd8c 100644
--- a/_opensearch/query-dsl/query-filter-context.md
+++ b/_query-dsl/query-dsl/query-filter-context.md
@@ -2,6 +2,7 @@
layout: default
title: Query and filter context
parent: Query DSL
+permalink: /query-dsl/query-filter-context/
nav_order: 5
---
diff --git a/_opensearch/query-dsl/span-query.md b/_query-dsl/query-dsl/span-query.md
similarity index 94%
rename from _opensearch/query-dsl/span-query.md
rename to _query-dsl/query-dsl/span-query.md
index 6ed2842991..912505843b 100644
--- a/_opensearch/query-dsl/span-query.md
+++ b/_query-dsl/query-dsl/span-query.md
@@ -3,6 +3,9 @@ layout: default
title: Span queries
parent: Query DSL
nav_order: 60
+permalink: /query-dsl/span-query/
+redirect_from:
+ - /opensearch/query-dsl/span-query/
---
# Span queries
diff --git a/_opensearch/query-dsl/term-vs-full-text.md b/_query-dsl/query-dsl/term-vs-full-text.md
similarity index 99%
rename from _opensearch/query-dsl/term-vs-full-text.md
rename to _query-dsl/query-dsl/term-vs-full-text.md
index c35fa77bd0..68a912b541 100644
--- a/_opensearch/query-dsl/term-vs-full-text.md
+++ b/_query-dsl/query-dsl/term-vs-full-text.md
@@ -2,6 +2,7 @@
layout: default
title: Term-level and full-text queries compared
parent: Query DSL
+permalink: /query-dsl/term-vs-full-text/
nav_order: 10
---
diff --git a/_opensearch/query-dsl/term.md b/_query-dsl/query-dsl/term.md
similarity index 95%
rename from _opensearch/query-dsl/term.md
rename to _query-dsl/query-dsl/term.md
index ffe33cd3cd..38a43f9709 100644
--- a/_opensearch/query-dsl/term.md
+++ b/_query-dsl/query-dsl/term.md
@@ -3,6 +3,9 @@ layout: default
title: Term-level queries
parent: Query DSL
nav_order: 20
+permalink: /query-dsl/term/
+redirect_from:
+ - /opensearch/query-dsl/term/
---
# Term-level queries
@@ -226,7 +229,7 @@ GET shakespeare/_search
## Range
-Use the `range` query to search for a range of values in a field.
+You can search for a range of values in a field with the `range` query.
To search for documents where the `line_id` value is >= 10 and <= 20:
@@ -252,6 +255,9 @@ Parameter | Behavior
`lte` | Less than or equal to.
`lt` | Less than.
+In addition to the range query parameters, you can provide date formats or relation operators such as "contains" or "within." To see the supported field types for range queries, see [Range query optional parameters]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/#range-query). To see all date formats, see [Formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/#formats).
+{: .tip }
+
Assume that you have a `products` index and you want to find all the products that were added in the year 2019:
```json
diff --git a/_search-plugins/async/security.md b/_search-plugins/async/security.md
index 198d9c9ab2..c7cd058cbe 100644
--- a/_search-plugins/async/security.md
+++ b/_search-plugins/async/security.md
@@ -10,7 +10,7 @@ has_children: false
You can use the security plugin with asynchronous searches to limit non-admin users to specific actions. For example, you might want some users to only be able to submit or delete asynchronous searches, while you might want others to only view the results.
-All asynchronous search indices are protected as system indices. Only a super admin user or an admin user with a Transport Layer Security (TLS) certificate can access system indices. For more information, see [System indexes]({{site.url}}{{site.baseurl}}/security/configuration/system-indexes/).
+All asynchronous search indices are protected as system indices. Only a super admin user or an admin user with a Transport Layer Security (TLS) certificate can access system indices. For more information, see [System indices]({{site.url}}{{site.baseurl}}/security/configuration/system-indices/).
## Basic permissions
diff --git a/_search-plugins/knn/api.md b/_search-plugins/knn/api.md
index ff667c480a..46c20d3b04 100644
--- a/_search-plugins/knn/api.md
+++ b/_search-plugins/knn/api.md
@@ -1,7 +1,7 @@
---
layout: default
title: API
-nav_order: 5
+nav_order: 30
parent: k-NN
has_children: false
---
@@ -331,7 +331,7 @@ POST /_plugins/_knn/models/{model_id}/_train?preference={node_id}
"engine":"faiss",
"space_type": "l2",
"parameters":{
- "nlists":128,
+ "nlist":128,
"encoder":{
"name":"pq",
"parameters":{
@@ -361,7 +361,7 @@ POST /_plugins/_knn/models/_train?preference={node_id}
"engine":"faiss",
"space_type": "l2",
"parameters":{
- "nlists":128,
+ "nlist":128,
"encoder":{
"name":"pq",
"parameters":{
diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md
index 722e74a5a2..913d7a956d 100644
--- a/_search-plugins/knn/approximate-knn.md
+++ b/_search-plugins/knn/approximate-knn.md
@@ -1,7 +1,7 @@
---
layout: default
title: Approximate search
-nav_order: 2
+nav_order: 10
parent: k-NN
has_children: false
has_math: true
@@ -9,23 +9,34 @@ has_math: true
# Approximate k-NN search
-The approximate k-NN search method uses nearest neighbor algorithms from *nmslib* and *faiss* to power
-k-NN search. To see the algorithms that the plugin currently supports, check out the [k-NN Index documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions).
-In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three search methods the plugin provides, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is preferred.
+Standard k-NN search methods compute similarity using a brute-force approach that measures the nearest distance between a query and a number of points, which produces exact results. This works well in many applications. However, in the case of extremely large datasets with high dimensionality, this creates a scaling problem that reduces the efficiency of the search. Approximate k-NN search methods can overcome this by employing tools that restructure indexes more efficiently and reduce the dimensionality of searchable vectors. Using this approach requires a sacrifice in accuracy but increases search processing speeds appreciably.
-The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description).
-These native library indices are loaded into native memory during search and managed by a cache. To learn more about
-pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
+The Approximate k-NN search methods leveraged by OpenSearch use approximate nearest neighbor (ANN) algorithms from the [nmslib](https://github.com/nmslib/nmslib), [faiss](https://github.com/facebookresearch/faiss), and [Lucene](https://lucene.apache.org/) libraries to power k-NN search. These search methods employ ANN to improve search latency for large datasets. Of the three search methods the k-NN plugin provides, this method offers the best search scalability for large datasets. This approach is the preferred method when a dataset reaches hundreds of thousands of vectors.
-Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
+For details on the algorithms the plugin currently supports, see [k-NN Index documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions).
+{: .note}
+
+The k-NN plugin builds a native library index of the vectors for each knn-vector field/Lucene segment pair during indexing, which can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These native library indexes are loaded into native memory during search and managed by a cache. To learn more about preloading native library indexes into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see which native library indexes are already loaded in memory. To learn more about this, see the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
+
+Because the native library indexes are constructed during indexing, it is not possible to apply a filter on an index
and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.
+### Recommendations for engines and cluster node sizing
+
+Each of the three engines used for approximate k-NN search has its own attributes that make one more sensible to use than the others in a given situation. You can follow the general information below to help determine which engine will best meet your requirements.
+
+* The faiss engine performs exceptionally well (on orders of magnitude) with hardware that includes a GPU. When cost is not the first concern, this is the recommended engine.
+* When only a CPU is available, nmslib is a good choice. In general, it outperforms both faiss and Lucene.
+* For relatively smaller datasets (up to a few million vectors), the Lucene engine demonstrates better latencies and recall. At the same time, the size of the index is smallest compared to the other engines, which allows it to use smaller AWS instances for data nodes.
Also, the Lucene engine uses pure Java implementation and does not share any of the limitations that engines using platform-native code experience. However, one exception to this is that the maximum number of vector dimensions for the Lucene engine is 1024, compared with 10000 for the other engines. Refer to the sample mapping parameters in the following section to see where this is configured.
+
+When considering cluster node sizing, a general approach is to first establish an even distribution of the index across the cluster. However, there are other considerations. To help make these choices, you can refer to the OpenSearch managed service guidance in the section [Sizing domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html).
+
## Get started with approximate k-NN
-To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with setting `index.knn` to `true`. This setting tells the plugin to create native library indices for the index.
+To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with `index.knn` set to `true`. This setting tells the plugin to create native library indexes for the index.
Next, you must add one or more fields of the `knn_vector` data type. This example creates an index with two
-`knn_vector`'s, one using *faiss*, the other using *nmslib*, fields:
+`knn_vector` fields, one using `faiss` and the other using `nmslib` fields:
```json
PUT my-knn-index-1
@@ -69,12 +80,11 @@ PUT my-knn-index-1
}
```
-In the example above, both `knn_vector`s are configured from method definitions. Additionally, `knn_vector`s can also be configured from models. Learn more about it [here]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#knn_vector-data-type)!
+In the example above, both `knn_vector` fields are configured from method definitions. Additionally, `knn_vector` fields can also be configured from models. You can learn more about this in the [knn_vector data type]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#knn_vector-data-type) section.
-The `knn_vector` data type supports a vector of floats that can have a dimension of up to 10,000, as set by the
-dimension mapping parameter.
+The `knn_vector` data type supports a vector of floats that can have a dimension of up to 10000 for the nmslib and faiss engines, as set by the dimension mapping parameter. The maximum dimension for the Lucene library is 1024.
-In OpenSearch, codecs handle the storage and retrieval of indices. The k-NN plugin uses a custom codec to write vector data to native library indices so that the underlying k-NN search library can read it.
+In OpenSearch, codecs handle the storage and retrieval of indexes. The k-NN plugin uses a custom codec to write vector data to native library indexes so that the underlying k-NN search library can read it.
{: .tip }
After you create the index, you can add some data to it:
@@ -133,24 +143,24 @@ any `knn_vector` field that has a dimension matching the dimension of the model
```json
PUT /train-index
{
- "settings" : {
- "number_of_shards" : 3,
- "number_of_replicas" : 0
+ "settings": {
+ "number_of_shards": 3,
+ "number_of_replicas": 0
},
"mappings": {
- "properties": {
- "train-field": {
- "type": "knn_vector",
- "dimension": 4
+ "properties": {
+ "train-field": {
+ "type": "knn_vector",
+ "dimension": 4
}
- }
+ }
}
}
```
-Notice that `index.knn` is not set in the index settings. This ensures that we do not create native library indices for this index.
+Notice that `index.knn` is not set in the index settings. This ensures that you do not create native library indexes for this index.
-Next, let's add some data to it:
+You can now add some data to the index:
```json
POST _bulk
@@ -176,17 +186,17 @@ POST /_plugins/_knn/models/my-model/_train
"description": "My models description",
"search_size": 500,
"method": {
- "name":"hnsw",
- "engine":"faiss",
- "parameters":{
- "encoder":{
- "name":"pq",
- "parameters":{
- "code_size": 8,
- "m": 8
- }
+ "name": "hnsw",
+ "engine": "faiss",
+ "parameters": {
+ "encoder": {
+ "name": "pq",
+ "parameters": {
+ "code_size": 8,
+ "m": 8
}
}
+ }
}
}
```
@@ -200,24 +210,24 @@ GET /_plugins/_knn/models/my-model?filter_path=state&pretty
}
```
-Once the model enters the "created" state, we can create an index that will use this model to initialize it's native
-library indices:
+Once the model enters the "created" state, you can create an index that will use this model to initialize its native
+library indexes:
```json
PUT /target-index
{
- "settings" : {
- "number_of_shards" : 3,
- "number_of_replicas" : 1,
+ "settings": {
+ "number_of_shards": 3,
+ "number_of_replicas": 1,
"index.knn": true
},
"mappings": {
- "properties": {
- "target-field": {
- "type": "knn_vector",
- "model_id": "my-model"
+ "properties": {
+ "target-field": {
+ "type": "knn_vector",
+ "model_id": "my-model"
}
- }
+ }
}
}
```
@@ -295,11 +305,11 @@ A space corresponds to the function used to measure the distance between two poi
cosinesimil |
\[ d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} · \mathbf{y} \over \|\mathbf{x}\| · \|\mathbf{y}\|}\]\[ = 1 -
{\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} · \sqrt{\sum_{i=1}^n y_i^2}}\]
- where \(\|\mathbf{x}\|\) and \(\|\mathbf{y}\|\) represent normalized vectors. |
- \[ score = {1 \over 1 + d } \] |
+ where \(\|\mathbf{x}\|\) and \(\|\mathbf{y}\|\) represent the norms of vectors x and y respectively.
+ nmslib and faiss:\[ score = {1 \over 1 + d } \] Lucene:\[ score = {1 + d \over 2}\] |
- innerproduct |
+ innerproduct (not supported for Lucene) |
\[ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} · \mathbf{y}} = - \sum_{i=1}^n x_i y_i \] |
\[ \text{If} d \ge 0, \] \[score = {1 \over 1 + d }\] \[\text{If} d < 0, score = −d + 1\] |
diff --git a/_search-plugins/knn/filter-search-knn.md b/_search-plugins/knn/filter-search-knn.md
new file mode 100644
index 0000000000..6e02b610f1
--- /dev/null
+++ b/_search-plugins/knn/filter-search-knn.md
@@ -0,0 +1,649 @@
+---
+layout: default
+title: Search with k-NN filters
+nav_order: 15
+parent: k-NN
+has_children: false
+has_math: true
+---
+
+# Search with k-NN filters
+Introduced 2.4
+{: .label .label-purple }
+
+You can create custom filters using Query domain-specific language (DSL) search options to refine your k-NN searches. You define the filter criteria within the `knn_vector` field's `filter` subsection in your query. You can use any of the OpenSearch query DSL query types as a filter. This includes the common query types: `term`, `range`, `regexp`, and `wildcard`, as well as custom query types. To include or exclude results, use Boolean query clauses. You can also specify a query point with the `knn_vector` type and search for nearest neighbors that match your filter criteria.
+To run k-NN queries with a filter, the Lucene search engine and Hierarchical Navigable Small World (HNSW) method are required.
+
+To learn more about how to use query DSL Boolean query clauses, see [Boolean queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/bool). For more details about the `knn_vector` data type definition, see [k-NN Index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/).
+{: .note }
+
+## How does a k-NN filter work?
+
+The OpenSearch k-NN plugin version 2.2 introduced support for the Lucene engine in order to process k-NN searches. The Lucene engine provides a search that is based on the HNSW algorithm in order to represent a multi-layered graph. The OpenSearch k-NN plugin version 2.4 can incorporate filters for searches based on Lucene 9.4.
+
+After a filter is applied to a set of documents to be searched, the algorithm decides whether to perform pre-filtering for an exact k-NN search or modified post-filtering for an approximate search. The approximate search with filtering ensures the top number of closest vectors in the results.
+
+Lucene also provides the capability to operate its `KnnVectorQuery` across a subset of documents. To learn more about this capability, see the [Apache Lucene Documentation](https://issues.apache.org/jira/browse/LUCENE-10382).
+
+To learn more about all available k-NN search approaches, including approximate k-NN, exact k-NN with script score, and pre-filtering with painless extensions, see [k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/).
+
+### Filtered search performance
+
+Filtering that is tightly integrated with the Lucene HNSW algorithm implementation allows you to apply k-NN searches more efficiently, both in terms of relevancy of search results and performance. Consider, for example, an exact search using post-filtering on a large dataset that returns results slowly and does not ensure the required number of results specified by `k`.
+With this new capability, you can create an approximate k-NN search, apply filters, and get the number of results that you need. To learn more about approximate searches, see [Approximate k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/).
+
+The HNSW algorithm decides which type of filtering to apply to a search based on the volume of documents and number of `k` points in the index that you search with a filter.
+
+![How the algorithm evaluates a doc set]({{site.url}}{{site.baseurl}}/images/hsnw-algorithm.png)
+
+| Variable | Description |
+-- | -- | -- |
+N | The number of documents in the index.
+P | The number of documents in the search set after the filter is applied using the formula P <= N.
+q | The search vector.
+k | The maximum number of vectors to return in the response.
+
+To learn more about k-NN performance tuning, see [Performance tuning]({{site.url}}{{site.baseurl}}/search-plugins/knn/performance-tuning/).
+
+## Filter approaches by use case
+
+Depending on the dataset that you are searching, you might choose a different approach to minimize recall or latency. You can create filters that are:
+
+* Very restrictive: Returns the lowest number of documents (for example, 2.5%).
+* Somewhat restrictive: Returns some documents (for example, 38%).
+* Not very restrictive: Returns the highest number of documents (for example, 80%).
+
+The restrictive percentage indicates the number of documents the filter returns for any given document set in an index.
+
+Number of Vectors | Filter Restrictive Percentage | k | Recall | Latency
+-- | -- | -- | -- | --
+10M | 2.5 | 100 | Scoring script | Scoring script
+10M | 38 | 100 | Lucene filter | Boolean filter
+10M | 80 | 100 | Scoring script | Lucene filter
+1M | 2.5 | 100 | Lucene filter | Scoring script
+1M | 38 | 100 | Lucene filter | lucene_filtering / Scoring script
+1M | 80 | 100 | Boolean filter | lucene_filtering
+
+In this context, *Scoring script* is essentially a brute force search, whereas a Boolean filter is an approximate k-NN search with post-filtering.
+
+To learn more about the dynamic searches you can perform with the score script plugin, see [Exact k-NN with scoring script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/).
+
+### Boolean filter with approximate k-NN search
+
+In a Boolean query that uses post-filtering, you can join a k-NN query with a filter using a `bool` `must` query clause.
+
+#### Example request
+
+The following k-NN query uses a Boolean query clause to filter results:
+
+```json
+POST /hotels-index/_search
+{
+ "size": 3,
+ "query": {
+ "bool": {
+ "filter": {
+ "bool": {
+ "must": [
+ {
+ "range": {
+ "rating": {
+ "gte": 8,
+ "lte": 10
+ }
+ }
+ },
+ {
+ "term": {
+ "parking": "true"
+ }
+ }
+ ]
+ }
+ },
+ "must": [
+ {
+ "knn": {
+ "location": {
+ "vector": [
+ 5.0,
+ 4.0
+ ],
+ "k": 20
+ }
+ }
+ }
+ ]
+ }
+ }
+}
+```
+#### Example response
+
+The Boolean query filter returns the following results in the response:
+
+```json
+{
+ "took" : 95,
+ "timed_out" : false,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ },
+ "hits" : {
+ "total" : {
+ "value" : 5,
+ "relation" : "eq"
+ },
+ "max_score" : 0.72992706,
+ "hits" : [
+ {
+ "_index" : "hotels-index",
+ "_id" : "3",
+ "_score" : 0.72992706,
+ "_source" : {
+ "location" : [
+ 4.9,
+ 3.4
+ ],
+ "parking" : "true",
+ "rating" : 9
+ }
+ },
+ {
+ "_index" : "hotels-index",
+ "_id" : "6",
+ "_score" : 0.3012048,
+ "_source" : {
+ "location" : [
+ 6.4,
+ 3.4
+ ],
+ "parking" : "true",
+ "rating" : 9
+ }
+ },
+ {
+ "_index" : "hotels-index",
+ "_id" : "5",
+ "_score" : 0.24154587,
+ "_source" : {
+ "location" : [
+ 3.3,
+ 4.5
+ ],
+ "parking" : "true",
+ "rating" : 8
+ }
+ }
+ ]
+ }
+}
+```
+
+### Use case 1: Very restrictive 2.5% filter
+
+A very restrictive filter returns the lowest number of documents in your dataset. For example, the following filter criteria specifies hotels with feedback ratings less than or equal to 3. This 2.5% filter only returns 1 document:
+
+```json
+ "filter": {
+ "bool": {
+ "must": [
+ {
+ "range": {
+ "rating": {
+ "lte": 3
+ }
+ }
+ }
+ ]
+ }
+ }
+```
+
+### Use case 2: Somewhat restrictive 38% filter
+
+A somewhat restrictive filter returns 38% of the documents in the data set that you search. For example, the following filter criteria specifies hotels with parking and feedback ratings less than or equal to 8 and returns 5 documents:
+
+```json
+ "filter": {
+ "bool": {
+ "must": [
+ {
+ "range": {
+ "rating": {
+ "lte": 8
+ }
+ }
+ },
+ {
+ "term": {
+ "parking": "true"
+ }
+ }
+ ]
+ }
+ }
+```
+
+### Use case 3: Not very restrictive 80% filter
+
+A filter that is not very restrictive will return 80% of the documents that you search. For example, the following filter criteria specifies hotels with feedback ratings greater than or equal to 5 and returns 10 documents:
+
+```json
+ "filter": {
+ "bool": {
+ "must": [
+ {
+ "range": {
+ "rating": {
+ "gte": 5
+ }
+ }
+ }
+ ]
+ }
+ }
+```
+
+## Overview: How to use filters in a k-NN search
+
+You can search with a filter by following these three steps:
+1. Create an index and specify the requirements for the Lucene engine and HNSW requirements in the mapping.
+1. Add your data to the index.
+1. Search the index and specify these three items in your query:
+* One or more filters defined by query DSL
+* A vector reference point defined by the `vector` field
+* The number of matches you want returned with the `k` field
+
+You can use a range query to specify hotel feedback ratings and a term query to require that parking is available. The criteria is processed with Boolean clauses to indicate whether or not the document contains the criteria.
+
+Consider a dataset that contains 12 documents, a search reference point, and documents that meet two filter criteria.
+
+![Graph of documents with filter criteria]({{site.url}}{{site.baseurl}}/images/knn-two-filters.png)
+
+## Step 1: Create a new index with a Lucene mapping
+
+Before you can run a k-NN search with a filter, you need to create an index, specify the Lucene engine in a mapping, and add data to the index.
+
+You need to add a `location` field to represent the location and specify it as the `knn_vector` type. The most basic vector can be two-dimensional. For example:
+
+```
+ "type": "knn_vector",
+ "dimension": 2,
+```
+
+### Requirement: Lucene engine with HNSW method
+
+Make sure to specify "hnsw" method and "lucene" engine in the `knn_vector` field description, as follows:
+
+```json
+"my_field": {
+ "type": "knn_vector",
+ "dimension": 2,
+ "method": {
+ "name": "hnsw",
+ "space_type": "l2",
+ "engine": "lucene"
+ }
+ }
+```
+
+#### Example request
+
+The following request creates a new index called "hotels-index":
+
+```json
+PUT /hotels-index
+{
+ "settings": {
+ "index": {
+ "knn": true,
+ "knn.algo_param.ef_search": 100,
+ "number_of_shards": 1,
+ "number_of_replicas": 0
+ }
+ },
+ "mappings": {
+ "properties": {
+ "location": {
+ "type": "knn_vector",
+ "dimension": 2,
+ "method": {
+ "name": "hnsw",
+ "space_type": "l2",
+ "engine": "lucene",
+ "parameters": {
+ "ef_construction": 100,
+ "m": 16
+ }
+ }
+ }
+ }
+ }
+}
+```
+#### Example response
+
+Upon success, you should receive a "200-OK" status with the following response:
+
+```json
+{
+ "acknowledged" : true,
+ "shards_acknowledged" : true,
+ "index" : "hotels-index"
+}
+```
+
+## Step 2: Add data to your index
+
+Next, add data to your index with a PUT HTTP request. Make sure that the search criteria is defined in the body of the request.
+
+#### Example request
+
+The following request adds 12 hotel documents that contain criteria such as feedback ratings and whether or not parking is available:
+
+```json
+POST /_bulk
+{ "index": { "_index": "hotels-index", "_id": "1" } }
+{ "location": [5.2, 4.4], "parking" : "true", "rating" : 5 }
+{ "index": { "_index": "hotels-index", "_id": "2" } }
+{ "location": [5.2, 3.9], "parking" : "false", "rating" : 4 }
+{ "index": { "_index": "hotels-index", "_id": "3" } }
+{ "location": [4.9, 3.4], "parking" : "true", "rating" : 9 }
+{ "index": { "_index": "hotels-index", "_id": "4" } }
+{ "location": [4.2, 4.6], "parking" : "false", "rating" : 6}
+{ "index": { "_index": "hotels-index", "_id": "5" } }
+{ "location": [3.3, 4.5], "parking" : "true", "rating" : 8 }
+{ "index": { "_index": "hotels-index", "_id": "6" } }
+{ "location": [6.4, 3.4], "parking" : "true", "rating" : 9 }
+{ "index": { "_index": "hotels-index", "_id": "7" } }
+{ "location": [4.2, 6.2], "parking" : "true", "rating" : 5 }
+{ "index": { "_index": "hotels-index", "_id": "8" } }
+{ "location": [2.4, 4.0], "parking" : "true", "rating" : 8 }
+{ "index": { "_index": "hotels-index", "_id": "9" } }
+{ "location": [1.4, 3.2], "parking" : "false", "rating" : 5 }
+{ "index": { "_index": "hotels-index", "_id": "10" } }
+{ "location": [7.0, 9.9], "parking" : "true", "rating" : 9 }
+{ "index": { "_index": "hotels-index", "_id": "11" } }
+{ "location": [3.0, 2.3], "parking" : "false", "rating" : 6 }
+{ "index": { "_index": "hotels-index", "_id": "12" } }
+{ "location": [5.0, 1.0], "parking" : "true", "rating" : 3 }
+```
+
+#### Example response
+
+Upon success, you should receive a "200-OK" status with entries for each document ID added to the index. The following response is truncated to only show one document:
+
+```json
+{
+ "took" : 140,
+ "errors" : false,
+ "items" : [
+ {
+ "index" : {
+ "_index" : "hotels-index",
+ "_id" : "1",
+ "_version" : 2,
+ "result" : "updated",
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "failed" : 0
+ },
+ "_seq_no" : 12,
+ "_primary_term" : 3,
+ "status" : 200
+ }
+ }
+ ]
+}
+
+```
+
+## Step 3: Search your data with a filter
+
+Now you can create a k-NN search that specifies filters by using query DSL Boolean clauses. You need to include your reference point to search for nearest neighbors. Provide an x-y coordinate for the point within the `vector` field, such as `"vector": [ 5.0, 4.0]`.
+
+ To learn more about how to specify ranges with query DSL, see [Range query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/#range).
+{: .note }
+
+#### Example request
+
+The following request creates a k-NN query that only returns the top hotels rated between 8 and 10 and that provide parking. The filter criteria to indicate the range for the feedback ratings uses a `range` query and a `term` query clause to indicate "parking":
+
+```json
+POST /hotels-index/_search
+{
+ "size": 3,
+ "query": {
+ "knn": {
+ "location": {
+ "vector": [
+ 5.0,
+ 4.0
+ ],
+ "k": 3,
+ "filter": {
+ "bool": {
+ "must": [
+ {
+ "range": {
+ "rating": {
+ "gte": 8,
+ "lte": 10
+ }
+ }
+ },
+ {
+ "term": {
+ "parking": "true"
+ }
+ }
+ ]
+ }
+ }
+ }
+ }
+ }
+}
+```
+
+
+#### Sample Response
+
+The following response indicates that only three hotels met the filter criteria:
+
+
+```json
+{
+ "took" : 47,
+ "timed_out" : false,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ },
+ "hits" : {
+ "total" : {
+ "value" : 3,
+ "relation" : "eq"
+ },
+ "max_score" : 0.72992706,
+ "hits" : [
+ {
+ "_index" : "hotels-index",
+ "_id" : "3",
+ "_score" : 0.72992706,
+ "_source" : {
+ "location" : [
+ 4.9,
+ 3.4
+ ],
+ "parking" : "true",
+ "rating" : 9
+ }
+ },
+ {
+ "_index" : "hotels-index",
+ "_id" : "6",
+ "_score" : 0.3012048,
+ "_source" : {
+ "location" : [
+ 6.4,
+ 3.4
+ ],
+ "parking" : "true",
+ "rating" : 9
+ }
+ },
+ {
+ "_index" : "hotels-index",
+ "_id" : "5",
+ "_score" : 0.24154587,
+ "_source" : {
+ "location" : [
+ 3.3,
+ 4.5
+ ],
+ "parking" : "true",
+ "rating" : 8
+ }
+ }
+ ]
+ }
+}
+
+```
+
+## Additional complex filter query
+
+Depending on how restrictive you want your filter to be, you can add multiple query types to a single request, such as `term`, `wildcard`, `regexp`, or `range`. You can then filter out the search results with the Boolean clauses `must`, `should`, and `must_not`.
+
+#### Example request
+
+The following request returns hotels that provide parking. This request illustrates multiple alternative mechanisms to obtain the parking filter criteria. It uses a regular expression for the value `true`, a term query for the key-value pair `"parking":"true"`, a wildcard for the characters that spell "true", and the `must_not` clause to eliminate hotels with "parking" set to `false`:
+
+```json
+POST /hotels-index/_search
+{
+ "size": 3,
+ "query": {
+ "knn": {
+ "location": {
+ "vector": [
+ 5.0,
+ 4.0
+ ],
+ "k": 3,
+ "filter": {
+ "bool": {
+ "must": {
+ "range": {
+ "rating": {
+ "gte": 1,
+ "lte": 6
+ }
+ }
+ },
+ "should": [
+ {
+ "term": {
+ "parking": "true"
+ }
+ },
+ {
+ "wildcard": {
+ "parking": {
+ "value": "t*e"
+ }
+ }
+ },
+ {
+ "regexp": {
+ "parking": "[a-zA-Z]rue"
+ }
+ }
+ ],
+ "must_not": [
+ {
+ "term": {
+ "parking": "false"
+ }
+ }
+ ],
+ "minimum_should_match": 1
+ }
+ }
+ }
+ }
+ }
+}
+```
+#### Example response
+
+The following response indicates a few results for the search with filters:
+
+```json
+{
+ "took" : 94,
+ "timed_out" : false,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ },
+ "hits" : {
+ "total" : {
+ "value" : 3,
+ "relation" : "eq"
+ },
+ "max_score" : 0.8333333,
+ "hits" : [
+ {
+ "_index" : "hotels-index",
+ "_id" : "1",
+ "_score" : 0.8333333,
+ "_source" : {
+ "location" : [
+ 5.2,
+ 4.4
+ ],
+ "parking" : "true",
+ "rating" : 5
+ }
+ },
+ {
+ "_index" : "hotels-index",
+ "_id" : "7",
+ "_score" : 0.154321,
+ "_source" : {
+ "location" : [
+ 4.2,
+ 6.2
+ ],
+ "parking" : "true",
+ "rating" : 5
+ }
+ },
+ {
+ "_index" : "hotels-index",
+ "_id" : "12",
+ "_score" : 0.1,
+ "_source" : {
+ "location" : [
+ 5.0,
+ 1.0
+ ],
+ "parking" : "true",
+ "rating" : 3
+ }
+ }
+ ]
+ }
+}
+```
diff --git a/_search-plugins/knn/index.md b/_search-plugins/knn/index.md
index d8e5c1c3f9..d360507105 100644
--- a/_search-plugins/knn/index.md
+++ b/_search-plugins/knn/index.md
@@ -22,7 +22,7 @@ This plugin supports three different methods for obtaining the k-nearest neighbo
Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the script scoring method or painless extensions.
- For more details about this method, see [Approximate k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/).
+ For more details about this method, including recommendations for which engine to use, see [Approximate k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/).
2. **Script Score k-NN**
diff --git a/_search-plugins/knn/jni-libraries.md b/_search-plugins/knn/jni-libraries.md
index 052a789510..25d1556908 100644
--- a/_search-plugins/knn/jni-libraries.md
+++ b/_search-plugins/knn/jni-libraries.md
@@ -1,14 +1,17 @@
---
layout: default
title: JNI libraries
-nav_order: 6
+nav_order: 35
parent: k-NN
has_children: false
---
# JNI libraries
-To integrate [*nmslib*'s](https://github.com/nmslib/nmslib/) and [*faiss*'s](https://github.com/facebookresearch/faiss/) Approximate k-NN functionality (implemented in C++) into the k-NN plugin (implemented in Java), we created a Java Native Interface, which lets the k-NN plugin make calls to the native libraries. We create 3 libraries: `libopensearchknn_nmslib`, the JNI library that interfaces with nmslib, `libopensearchknn_faiss`, the JNI library that interfaces with faiss, and `libopensearchknn_common`, a library containing common shared functionality between native libraries.
+To integrate [nmslib](https://github.com/nmslib/nmslib/) and [faiss](https://github.com/facebookresearch/faiss/) approximate k-NN functionality (implemented in C++) into the k-NN plugin (implemented in Java), we created a Java Native Interface, which lets the k-NN plugin make calls to the native libraries. The interface includes three libraries: `libopensearchknn_nmslib`, the JNI library that interfaces with nmslib, `libopensearchknn_faiss`, the JNI library that interfaces with faiss, and `libopensearchknn_common`, a library containing common shared functionality between native libraries.
+
+The Lucene library is not implemented using a native library.
+{: .note}
The libraries `libopensearchknn_faiss` and `libopensearchknn_nmslib` are lazily loaded when they are first called in the plugin. This means that if you are only planning on using one of the libraries, the plugin never loads the other library.
diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md
index 59460d8347..90f08f415a 100644
--- a/_search-plugins/knn/knn-index.md
+++ b/_search-plugins/knn/knn-index.md
@@ -1,7 +1,7 @@
---
layout: default
title: k-NN Index
-nav_order: 1
+nav_order: 5
parent: k-NN
has_children: false
---
@@ -53,54 +53,56 @@ However, if you intend to just use painless scripting or a k-NN score script, yo
A method definition refers to the underlying configuration of the Approximate k-NN algorithm you want to use. Method definitions are used to either create a `knn_vector` field (when the method does not require training) or [create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model) that can then be used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
A method definition will always contain the name of the method, the space_type the method is built for, the engine
-(the native library) to use, and a map of parameters.
+(the library) to use, and a map of parameters.
Mapping Parameter | Required | Default | Updatable | Description
:--- | :--- | :--- | :--- | :---
`name` | true | n/a | false | The identifier for the nearest neighbor method.
-`space_type` | false | "l2" | false | The vector space used to calculate the distance between vectors.
-`engine` | false | "nmslib" | false | The approximate k-NN library to use for indexing and search. Either "faiss" or "nmslib".
+`space_type` | false | l2 | false | The vector space used to calculate the distance between vectors.
+`engine` | false | nmslib | false | The approximate k-NN library to use for indexing and search. The available libraries are faiss, nmslib, and Lucene.
`parameters` | false | null | false | The parameters used for the nearest neighbor method.
### Supported nmslib methods
Method Name | Requires Training? | Supported Spaces | Description
:--- | :--- | :--- | :---
-`hnsw` | false | "l2", "innerproduct", "cosinesimil", "l1", "linf" | Hierarchical proximity graph approach to Approximate k-NN search. For more details on the algorithm, [checkout this paper](https://arxiv.org/abs/1603.09320)!
+`hnsw` | false | l2, innerproduct, cosinesimil, l1, linf | Hierarchical proximity graph approach to Approximate k-NN search. For more details on the algorithm, see this [abstract](https://arxiv.org/abs/1603.09320).
-#### HNSW Parameters
+#### HNSW parameters
-Paramater Name | Required | Default | Updatable | Description
+Parameter Name | Required | Default | Updatable | Description
:--- | :--- | :--- | :--- | :---
-`ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
-`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
+`ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed.
+`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
-**Note** --- For *nmslib*, *ef_search* is set in the [index settings](#index-settings).
+For nmslib, *ef_search* is set in the [index settings](#index-settings).
+{: .note}
### Supported faiss methods
Method Name | Requires Training? | Supported Spaces | Description
:--- | :--- | :--- | :---
-`hnsw` | false | "l2", "innerproduct"* | Hierarchical proximity graph approach to Approximate k-NN search.
-`ivf` | true | "l2", "innerproduct" | Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets are searched.
+`hnsw` | false | l2, innerproduct | Hierarchical proximity graph approach to Approximate k-NN search.
+`ivf` | true | l2, innerproduct | Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets is searched.
-**Note** --- For *hnsw*, "innerproduct" is not available when PQ is used.
+For hnsw, "innerproduct" is not available when PQ is used.
+{: .note}
-#### HNSW Parameters
+#### HNSW parameters
-Paramater Name | Required | Default | Updatable | Description
+Parameter Name | Required | Default | Updatable | Description
:--- | :--- | :--- | :--- | :---
`ef_search` | false | 512 | false | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches.
-`ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph, but slower indexing speed.
-`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2-100.
+`ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed.
+`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
`encoder` | false | flat | false | Encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index, at the expense of search accuracy.
-#### IVF Parameters
+#### IVF parameters
-Paramater Name | Required | Default | Updatable | Description
+Parameter Name | Required | Default | Updatable | Description
:--- | :--- | :--- | :--- | :---
-`nlists` | false | 4 | false | Number of buckets to partition vectors into. Higher values may lead to more accurate searches, at the expense of memory and training latency. For more information about choosing the right value, refer to [*faiss*'s documentation](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index).
-`nprobes` | false | 1 | false | Number of buckets to search over during query. Higher values lead to more accurate but slower searches.
+`nlist` | false | 4 | false | Number of buckets to partition vectors into. Higher values may lead to more accurate searches at the expense of memory and training latency. For more information about choosing the right value, refer to [Guidelines to choose an index](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index).
+`nprobes` | false | 1 | false | Number of buckets to search during query. Higher values lead to more accurate but slower searches.
`encoder` | false | flat | false | Encoder definition for encoding vectors. Encoders can reduce the memory footprint of your index, at the expense of search accuracy.
For more information about setting these parameters, please refer to [*faiss*'s documentation](https://github.com/facebookresearch/faiss/wiki/Faiss-indexes).
@@ -109,13 +111,45 @@ For more information about setting these parameters, please refer to [*faiss*'s
The IVF algorithm requires a training step. To create an index that uses IVF, you need to train a model with the
[Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model), passing the IVF method definition. IVF requires that, at a minimum, there should be `nlist` training
-data points, but it is [recommended to use more](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset).
-Training data can either the same data that is going to be ingested or a separate set of data.
+data points, but it is [recommended that you use more](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index#how-big-is-the-dataset).
+Training data can be composed of either the same data that is going to be ingested or a separate dataset.
+
+### Supported Lucene methods
+
+Method Name | Requires Training? | Supported Spaces | Description
+:--- | :--- | :--- | :---
+`hnsw` | false | l2, cosinesimil | Hierarchical proximity graph approach to Approximate k-NN search.
+
+#### HNSW parameters
+
+Parameter Name | Required | Default | Updatable | Description
+:--- | :--- | :--- | :--- | :---
+`ef_construction` | false | 512 | false | The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed.
The Lucene engine uses the proprietary term "beam_width" to describe this function, which corresponds directly to "ef_construction". To be consistent throughout OpenSearch documentation, we retain the term "ef_construction" to label this parameter.
+`m` | false | 16 | false | The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.
The Lucene engine uses the proprietary term "max_connections" to describe this function, which corresponds directly to "m". To be consistent throughout OpenSearch documentation, we retain the term "m" to label this parameter.
+
+Lucene HNSW implementation ignores `ef_search` and dynamically sets it to the value of "k" in the search request. Therefore, there is no need to make settings for `ef_search` when using the Lucene engine.
+{: .note}
+
+```json
+{
+ "type": "knn_vector",
+ "dimension": 100,
+ "method": {
+ "name":"hnsw",
+ "engine":"lucene",
+ "space_type": "l2",
+ "parameters":{
+ "m":2048,
+ "ef_construction": 245
+ }
+ }
+}
+```
### Supported faiss encoders
-You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. *faiss* has
-several encoder types, but currently, the plugin only supports *flat* and *pq* encoding.
+You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. faiss has
+several encoder types, but the plugin currently only supports *flat* and *pq* encoding.
An example method definition that specifies an encoder may look something like this:
@@ -140,7 +174,7 @@ Encoder Name | Requires Training? | Description
`flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint.
`pq` | true | Short for product quantization, it is a lossy compression technique that encodes a vector into a fixed size of bytes using clustering, with the goal of minimizing the drop in k-NN search accuracy. From a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more details on product quantization, here is a [great blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388)!
-#### PQ Parameters
+#### PQ parameters
Paramater Name | Required | Default | Updatable | Description
:--- | :--- | :--- | :--- | :---
@@ -160,7 +194,7 @@ If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index.
### Memory Estimation
In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates
-native library indices to a portion of the remaining RAM. This portion's size is determined by
+native library indexes to a portion of the remaining RAM. This portion's size is determined by
the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50%.
Having a replica doubles the total number of vectors.
@@ -196,7 +230,7 @@ At the moment, several parameters defined in the settings are in the deprecation
Setting | Default | Updateable | Description
:--- | :--- | :--- | :---
`index.knn` | false | false | Whether the index should build native library indices for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but Approximate k-NN search functionality will be disabled.
-`index.knn.algo_param.ef_search` | 512 | true | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches. Only available for *nmslib*.
-`index.knn.algo_param.ef_construction` | 512 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Only available for *nmslib*. Refer to mapping definition.
-`index.knn.algo_param.m` | 16 | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Only available for *nmslib*. Refer to mapping definition.
-`index.knn.space_type` | "l2" | false | (Deprecated in 1.0.0. Use the mapping parameters to set this value instead.) Only available for *nmslib*. Refer to mapping definition.
+`index.knn.algo_param.ef_search` | 512 | true | The size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches. Only available for nmslib.
+`index.knn.algo_param.ef_construction` | 512 | false | Deprecated in 1.0.0. Use the [mapping parameters](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions) to set this value instead.
+`index.knn.algo_param.m` | 16 | false | Deprecated in 1.0.0. Use the [mapping parameters](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions) to set this value instead.
+`index.knn.space_type` | l2 | false | Deprecated in 1.0.0. Use the [mapping parameters](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions) to set this value instead.
diff --git a/_search-plugins/knn/knn-score-script.md b/_search-plugins/knn/knn-score-script.md
index 1f77d8ff4f..5a87cdf7f7 100644
--- a/_search-plugins/knn/knn-score-script.md
+++ b/_search-plugins/knn/knn-score-script.md
@@ -1,7 +1,7 @@
---
layout: default
title: Exact k-NN with scoring script
-nav_order: 3
+nav_order: 20
parent: k-NN
has_children: false
has_math: true
diff --git a/_search-plugins/knn/painless-functions.md b/_search-plugins/knn/painless-functions.md
index 593fddbf22..223c192eb7 100644
--- a/_search-plugins/knn/painless-functions.md
+++ b/_search-plugins/knn/painless-functions.md
@@ -1,7 +1,7 @@
---
layout: default
title: k-NN Painless extensions
-nav_order: 4
+nav_order: 25
parent: k-NN
has_children: false
has_math: true
diff --git a/_search-plugins/knn/performance-tuning.md b/_search-plugins/knn/performance-tuning.md
index f6e28165c2..d179d99685 100644
--- a/_search-plugins/knn/performance-tuning.md
+++ b/_search-plugins/knn/performance-tuning.md
@@ -2,7 +2,7 @@
layout: default
title: Performance tuning
parent: k-NN
-nav_order: 8
+nav_order: 45
---
# Performance tuning
diff --git a/_search-plugins/knn/settings.md b/_search-plugins/knn/settings.md
index bbcb37c6e9..cdd2e86dd6 100644
--- a/_search-plugins/knn/settings.md
+++ b/_search-plugins/knn/settings.md
@@ -2,7 +2,7 @@
layout: default
title: Settings
parent: k-NN
-nav_order: 7
+nav_order: 40
---
# k-NN settings
diff --git a/_search-plugins/point-in-time-api.md b/_search-plugins/point-in-time-api.md
new file mode 100644
index 0000000000..69824f1671
--- /dev/null
+++ b/_search-plugins/point-in-time-api.md
@@ -0,0 +1,272 @@
+---
+layout: default
+title: Point in Time API
+nav_order: 59
+has_children: false
+parent: Point in Time
+redirect_from:
+ - /opensearch/point-in-time-api/
+---
+
+# Point in Time API
+
+Use the [Point in Time (PIT)]({{site.url}}{{site.baseurl}}/opensearch/point-in-time/) API to manage PITs.
+
+---
+
+#### Table of contents
+- TOC
+{:toc}
+
+---
+
+## Create a PIT
+Introduced 2.4
+{: .label .label-purple }
+
+Creates a PIT. The `keep_alive` query parameter is required; it specifies how long to keep a PIT.
+
+### Path and HTTP methods
+
+```json
+POST //_search/point_in_time?keep_alive=1h&routing=&expand_wildcards=&preference=
+```
+
+### Path parameters
+
+Parameter | Data type | Description
+:--- | :--- | :---
+target_indexes | String | The name(s) of the target index(es) for the PIT. May contain a comma-separated list or a wildcard index pattern.
+
+### Query parameters
+
+Parameter | Data type | Description
+:--- | :--- | :---
+keep_alive | Time | The amount of time to keep the PIT. Every time you access a PIT by using the Search API, the PIT lifetime is extended by the amount of time equal to the `keep_alive` parameter. Required.
+preference | String | The node or the shard used to perform the search. Optional. Default is random.
+routing | String | Specifies to route search requests to a specific shard. Optional. Default is the document's `_id`.
+expand_wildcards | String | The type of index that can match the wildcard pattern. Supports comma-separated values. Valid values are the following:
- `all`: Match any index or data stream, including hidden ones.
- `open`: Match open, non-hidden indexes or non-hidden data streams.
- `closed`: Match closed, non-hidden indexes or non-hidden data streams.
- `hidden`: Match hidden indexes or data streams. Must be combined with `open`, `closed` or both `open` and `closed`.
- `none`: No wildcard patterns are accepted.
Optional. Default is `open`.
+allow_partial_pit_creation | Boolean | Specifies whether to create a PIT with partial failures. Optional. Default is `true`.
+
+#### Example request
+
+```json
+POST /my-index-1/_search/point_in_time?keep_alive=100m
+```
+
+#### Example response
+
+```json
+{
+ "pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAIWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
+ "_shards": {
+ "total": 1,
+ "successful": 1,
+ "skipped": 0,
+ "failed": 0
+ },
+ "creation_time": 1658146050064
+}
+```
+
+### Response fields
+
+Field | Data type | Description
+:--- | :--- | :---
+pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID.
+creation_time | long | The time the PIT was created, in milliseconds since the epoch.
+
+## Extend a PIT time
+
+You can extend a PIT time by providing a `keep_alive` parameter in the `pit` object when you perform a search:
+
+```json
+GET /_search
+{
+ "size": 10000,
+ "query": {
+ "match" : {
+ "user.id" : "elkbee"
+ }
+ },
+ "pit": {
+ "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
+ "keep_alive": "100m"
+ },
+ "sort": [
+ {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
+ {"_shard_doc": "desc"}
+ ],
+ "search_after": [
+ "2021-05-20T05:30:04.832Z"
+ ]
+}
+```
+
+The `keep_alive` parameter in a search request is optional. It specifies the amount by which to extend the time to keep a PIT.
+{: .note}
+
+## List all PITs
+Introduced 2.4
+{: .label .label-purple }
+
+Returns all PITs in the OpenSearch cluster.
+
+### Cross-cluster behavior
+
+The List All PITs API returns only local PITs or mixed PITs (PITs created in both local and remote clusters). It does not return fully remote PITs.
+
+#### Example request
+
+```json
+GET /_search/point_in_time/_all
+```
+
+#### Example response
+
+```json
+{
+ "pits": [
+ {
+ "pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAEWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
+ "creation_time": 1658146048666,
+ "keep_alive": 6000000
+ },
+ {
+ "pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFnNOWU43ckt3U3IyaFVpbGE1UWEtMncAFjFyeXBsRGJmVFM2RTB6eVg1aVVqQncAAAAAAAAAAAIWcDVrM3ZIX0pRNS1XejE5YXRPRFhzUQEWc05ZTjdyS3dTcjJoVWlsYTVRYS0ydwAA",
+ "creation_time": 1658146050064,
+ "keep_alive": 6000000
+ }
+ ]
+}
+```
+
+### Response fields
+
+Field | Data type | Description
+:--- | :--- | :---
+pits | Array of JSON objects | The list of all PITs.
+
+Each PIT object contains the following fields.
+
+Field | Data type | Description
+:--- | :--- | :---
+pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID.
+creation_time | long | The time the PIT was created, in milliseconds since the epoch.
+keep_alive | long | The amount of time to keep the PIT, in milliseconds.
+
+## Delete PITs
+Introduced 2.4
+{: .label .label-purple }
+
+Deletes one, several, or all PITs. PITs are automatically deleted when the `keep_alive` time period elapses. However, to deallocate resources, you can delete a PIT using the Delete PIT API. The Delete PIT API supports deleting a list of PITs by ID or deleting all PITs at once.
+
+### Cross-cluster behavior
+
+The Delete PITs by ID API fully supports deleting cross-cluster PITs.
+
+The Delete All PITs API deletes only local PITs or mixed PITs (PITs created in both local and remote clusters). It does not delete fully remote PITs.
+
+#### Sample Request: Delete all PITs
+
+```json
+DELETE /_search/point_in_time/_all
+```
+
+If you want to delete one or several PITs, specify their PIT IDs in the request body.
+
+### Request fields
+
+Field | Data type | Description
+:--- | :--- | :---
+pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) or an array of binaries | The PIT IDs of the PITs to be deleted. Required.
+
+#### Example request: Delete PITs by ID
+
+```json
+DELETE /_search/point_in_time
+
+{
+ "pit_id": [
+ "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA",
+ "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
+ ]
+}
+```
+
+#### Example response
+
+For each PIT, the response contains a JSON object with a PIT ID and a `successful` field that specifies whether the deletion was successful. Partial failures are treated as failures.
+
+```json
+{
+ "pits": [
+ {
+ "successful": true,
+ "pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
+ },
+ {
+ "successful": false,
+ "pit_id": "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
+ }
+ ]
+}
+```
+
+### Response fields
+
+Field | Data type | Description
+:--- | :--- | :---
+successful | Boolean | Whether the delete operation was successful.
+pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) | The PIT ID of the PIT to be deleted.
+
+## PIT segments
+Introduced 2.4
+{: .label .label-purple }
+
+Similarly to the [CAT Segments API]({{site.url}}{{site.baseurl}}/api-reference/cat/cat-segments), the PIT Segments API provides low-level information about the disk utilization of a PIT by describing its Lucene segments. The PIT Segments API supports listing segment information of a specific PIT by ID or of all PITs at once.
+
+#### Example request: PIT segments of all PITs
+
+```json
+GET /_cat/pit_segments/_all
+```
+
+If you want to list segments for one or several PITs, specify their PIT IDs in the request body.
+
+### Request fields
+
+Field | Data type | Description
+:--- | :--- | :---
+pit_id | [Base64 encoded binary]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/binary) or an array of binaries | The PIT IDs of the PITs whose segments are to be listed. Required.
+
+#### Example request: PIT segments of PITs by ID
+
+```json
+GET /_cat/pit_segments
+
+{
+ "pit_id": [
+ "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAEWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA",
+ "o463QQEPbXktaW5kZXgtMDAwMDAxFkhGN09fMVlPUkVPLXh6MUExZ1hpaEEAFjBGbmVEZHdGU1EtaFhhUFc4ZkR5cWcAAAAAAAAAAAIWaXBPNVJtZEhTZDZXTWFFR05waXdWZwEWSEY3T18xWU9SRU8teHoxQTFnWGloQQAA"
+ ]
+}
+```
+
+#### Example response
+
+```json
+index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
+index1 0 r 10.212.36.190 _0 0 4 0 3.8kb 1364 false true 8.8.2 true
+index1 1 p 10.212.36.190 _0 0 3 0 3.7kb 1364 false true 8.8.2 true
+index1 2 r 10.212.74.139 _0 0 2 0 3.6kb 1364 false true 8.8.2 true
+```
+
+## PIT settings
+
+You can specify the following settings for a PIT.
+
+Setting | Description | Default
+:--- | :--- | :---
+point_in_time.max_keep_alive | A cluster-level setting that specifies the maximum value for the `keep_alive` parameter. | 24h
+search.max_open_pit_context | A node-level setting that specifies the maximum number of open PIT contexts for the node. | 300
\ No newline at end of file
diff --git a/_search-plugins/point-in-time.md b/_search-plugins/point-in-time.md
new file mode 100644
index 0000000000..4453dde3dd
--- /dev/null
+++ b/_search-plugins/point-in-time.md
@@ -0,0 +1,159 @@
+---
+layout: default
+title: Point in Time
+nav_order: 58
+has_children: true
+has_toc: false
+redirect_from:
+ - /opensearch/point-in-time/
+---
+
+# Point in Time
+
+Point in Time (PIT) lets you run different queries against a dataset that is fixed in time.
+
+Normally, if you run a query on an index multiple times, the same query may return different results because documents are continually indexed, updated, and deleted. If you need to run a query against the same data, you can preserve that data's state by creating a PIT. The main use of the PIT feature is to couple it with the `search_after` functionality for deep pagination of search results.
+
+## Paginating search results
+
+Besides the PIT functionality, there are three ways to [paginate search results]({{site.url}}{{site.baseurl}}/opensearch/search/paginate) in OpenSearch: using the Scroll API, specifying `from` and `size` parameters for your search, and using the `search_after` functionality. However, all three have limitations:
+
+- The Scroll API's search results are frozen at the moment of the request, but they are bound to a particular query. Additionally, scroll can only move forward in the search, so if a request for a page fails, the subsequent request skips that page and returns the following one.
+- If you specify the `from` and `size` parameters for your search, the search results are not frozen in time, so they may be inconsistent because of documents being indexed or deleted. The `from` and `size` feature is not recommended for deep pagination because every page request requires processing of all results and filtering them for the requested page.
+- The `search_after` search results are not frozen in time, so they may be inconsistent because of concurrent document indexing or deletion.
+
+The PIT functionality does not have the limitations of other pagination methods, because PIT search is not bound to a query, and it supports consistent pagination going forward and backward. If you have looked at page one of your results and are now on page two, you will see the same page one if you go back.
+
+## PIT search
+
+PIT search has the same capabilities as regular search, except PIT search acts on an older dataset, while a regular search acts on a live dataset. PIT search is not bound to a query, so you can run different queries on the same dataset, which is frozen in time.
+
+You can use the [Create PIT API]({{site.url}}{{site.baseurl}}/opensearch/point-in-time-api#create-a-pit) to create a PIT. When you create a PIT for a set of indexes, OpenSearch locks a set of segments for those indexes, freezing them in time. On a lower level, none of the resources required for this PIT are modified or deleted. If the segments that are part of a PIT are merged, OpenSearch retains a copy of those segments for the period of time specified at PIT creation by the `keep_alive` parameter.
+
+The create PIT operation returns a PIT ID, which you can use to run multiple queries on the frozen dataset. Even though the indexes continue to ingest data and modify or delete documents, the PIT references the data that has not changed since the PIT creation. When your query contains a PIT ID, you don't need to pass the indexes to the search because it will use that PIT. A search with a PIT ID will produce exactly the same result when you run it multiple times.
+
+In case of a cluster or node failure, all PIT data is lost.
+{: .note}
+
+## Pagination with PIT and search_after
+
+When you run a query with a PIT ID, you can use the `search_after` parameter to retrieve the next page of results. This gives you control over the order of documents in the pages of results.
+
+Run a search query with a PIT ID:
+
+```json
+GET /_search
+{
+ "size": 10000,
+ "query": {
+ "match" : {
+ "user.id" : "elkbee"
+ }
+ },
+ "pit": {
+ "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
+ "keep_alive": "100m"
+ },
+ "sort": [
+ {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
+ {"_shard_doc": "desc"}
+ ]
+}
+```
+
+The response contains the first 10,000 documents that match the query. To get the next set of documents, run the same query with the last document's sort values as the `search_after` parameter, keeping the same `sort` and `pit.id`. You can use the optional `keep_alive` parameter to extend the PIT time:
+
+```json
+GET /_search
+{
+ "size": 10000,
+ "query": {
+ "match" : {
+ "user.id" : "elkbee"
+ }
+ },
+ "pit": {
+ "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
+ "keep_alive": "100m"
+ },
+ "sort": [
+ {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos"}},
+ {"_shard_doc": "desc"}
+ ],
+ "search_after": [
+ "2021-05-20T05:30:04.832Z"
+ ]
+}
+```
+
+## Search slicing
+
+Using `search_after` with PIT for pagination gives you control over ordering of the results. If you don't need results in any specific order, or if you want the ability to jump from a page to a non-consecutive page, you can use search slicing. Search slicing splits a PIT search into multiple slices that can be consumed independently by a client application.
+
+For example, if you have a PIT search query that has 1,000,000 results and you want to return 50,000 results at a time, your client application has to make 20 consecutive calls to receive each batch of results. If you use search slicing, you can parallelize these 20 calls. In your multithreaded client application you can use five slices for each PIT. As a result, you will have 5 10,000-hit slices that can be consumed by five different threads in your client, instead of having a single thread consume 50,000 results.
+
+To use search slicing, you have to specify two parameters:
+- `slice.id` is the slice ID you are requesting.
+- `slice.max` is the number of slices to break the search response into.
+
+The following PIT search query illustrates search slicing:
+
+```json
+
+GET /_search
+{
+ "slice": {
+ "id": 0, // id is the slice (page) number being requested. In every request we can only query for one slice
+ "max": 2 // max is the total number of slices (pages) the search response will be broken down into
+ },
+ "query": {
+ "match": {
+ "message": "foo"
+ }
+ },
+ "pit": {
+ "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=="
+ }
+}
+```
+
+In every request you can only query for one slice, so the next query will be the same as the previous one, except the `slice.id` will be `1`.
+
+## Security model
+
+This section describes the permissions needed to use PIT API operations if you are running OpenSearch with the security plugin enabled.
+
+Users can access all PIT API operations using the `point_in_time_full_access` role. If this role doesn't meet your needs, mix and match individual PIT permissions to suit your use case. Each action corresponds to an operation in the REST API. For example, the `indices:data/read/point_in_time/create` permission lets you create a PIT. The following are the possible permissions:
+
+- `indices:data/read/point_in_time/create` – Create API
+- `indices:data/read/point_in_time/delete` – Delete API
+- `indices:data/read/point_in_time/readall` – List All PITs API
+- `indices:data/read/search` – Search API
+- `indices:monitor/point_in_time/segments` – PIT Segments API
+
+For `all` API operations, such as list all and delete all, the user needs the all indexes (*) permission. For API operations such as search, create PIT, or delete list, the user only needs individual index permissions.
+
+The PIT IDs always contain the underlying (resolved) indexes when saved. The following sections describe the required permissions for aliases and data streams.
+
+### Alias permissions
+
+For aliases, users must have either index **or** alias permissions for any PIT operation.
+
+### Data stream permissions
+
+For data streams, users must have both the data stream **and** the data stream's backing index permissions for any PIT operation. For example, the user must have permissions for the `data-stream-11` data stream and for its backing index `.ds-my-data-stream11-000001`.
+
+If users have the data stream permissions only, they will be able to create a PIT, but they will not be able to use the PIT ID for other operations, such as search, without the backing index permissions.
+
+## API
+
+The following table lists all [Point in Time API]({{site.url}}{{site.baseurl}}/opensearch/point-in-time-api) functions.
+
+Function | API | Description
+:--- | :--- | :---
+[Create PIT]({{site.url}}{{site.baseurl}}/opensearch/point-in-time-api#create-a-pit) | `POST //_search/point_in_time?keep_alive=1h` | Creates a PIT.
+[List PIT]({{site.url}}{{site.baseurl}}/opensearch/point-in-time-api#list-all-pits) | `GET /_search/point_in_time/_all` | Lists all PITs.
+[Delete PIT]({{site.url}}{{site.baseurl}}/opensearch/point-in-time-api#delete-pits) | `DELETE /_search/point_in_time`
`DELETE /_search/point_in_time/_all` | Deletes a PIT or all PITs.
+[PIT segments]({{site.url}}{{site.baseurl}}/opensearch/point-in-time-api#pit-segments) | `GET /_cat/pit_segments/_all` | Provides information about the disk utilization of a PIT by describing its Lucene segments.
+
+For information about the relevant cluster and node settings, see [PIT Settings]({{site.url}}{{site.baseurl}}/opensearch/point-in-time-api#pit-settings).
diff --git a/_search-plugins/querqy/index.md b/_search-plugins/querqy/index.md
index 7f97fada8f..3abd12dcbd 100644
--- a/_search-plugins/querqy/index.md
+++ b/_search-plugins/querqy/index.md
@@ -4,7 +4,7 @@ title: Querqy
has_children: false
redirect_from:
- /search-plugins/querqy/
-nav_order: 10
+nav_order: 210
---
# Querqy
@@ -13,51 +13,34 @@ Querqy is a community plugin for query rewriting that helps to solve relevance i
## Querqy plugin installation
-Querqy is currently only compatible with OpenSearch 1.3.1
-{: .note }
+The Querqy plugin is now available for OpenSearch 2.3.0. Run the following command to install the Querqy plugin.
-1. The Querqy plugin code is located here: [querqy-opensearch](https://github.com/querqy/querqy-opensearch). To download the plugin code ZIP file, select the green "Code" button, then select "Download ZIP"
+````bash
+./bin/opensearch-plugin install \
+ "https://repo1.maven.org/maven2/org/querqy/opensearch-querqy/1.0.os2.3.0/opensearch-querqy-1.0.os2.3.0.zip"
+````
-1. Install JDK 11. On Amazon Linux 2, install JDK11 with the following command:
+Answer `yes` to the security prompts during the installation as Querqy requires additional permissions to load query rewriters.
- ```bash
- sudo yum install java-11-amazon-corretto
- ```
+After installing the Querqy plugin you can find comprehensive documentation on the Querqy.org site: [Querqy](https://docs.querqy.org/querqy/index.html)
-1. Uncompress the ZIP file:
+## Path and HTTP methods
- ```bash
- unzip querqy-opensearch-main.zip
- ```
+```
+POST /myindex/_search
+```
-1. Change to the uncompressed Querqy directory:
+## Example query
- ```bash
- cd querqy-opensearch-main
- ```
-
-1. Compile the plugin:
-
- ```bash
- ./gradlew build
- ```
-
-1. The compiled plugin is stored in this directory:
-
- ```bash
- /path/to/file/querqy-opensearch-main/build/distributions/opensearch-querqy-1.3.1.0.zip`
- ```
-
-1. The compiled Querqy plugin is installed the same as [any OpenSearch plugin](https://opensearch.org/docs/latest/opensearch/install/plugins/#install-a-plugin):
-
- ```bash
- /path/to/opensearch/bin/opensearch-plugin install file:///path/to/file/opensearch-querqy-1.3.1.0.zip
- ```
-
-1. Reboot the OpenSearch node:
-
- ```bash
- sudo reboot
- ```
-
-After installing the Querqy plugin you can find comprehensive documentation on the Querqy.org site: [Querqy](https://docs.querqy.org/querqy/index.html)
\ No newline at end of file
+````json
+{
+ "query": {
+ "querqy": {
+ "matching_query": {
+ "query": "books"
+ },
+ "query_fields": [ "title^3.0", "words^2.1", "shortSummary"]
+ }
+ }
+}
+````
\ No newline at end of file
diff --git a/_opensearch/search-template.md b/_search-plugins/search-template.md
similarity index 97%
rename from _opensearch/search-template.md
rename to _search-plugins/search-template.md
index 476e804932..3b9bc7cc7b 100644
--- a/_opensearch/search-template.md
+++ b/_search-plugins/search-template.md
@@ -2,6 +2,8 @@
layout: default
title: Search templates
nav_order: 50
+redirect_from:
+ - /opensearch/search-template/
---
# Search templates
@@ -205,6 +207,15 @@ POST _render/template
}
```
+The following render operations are supported:
+
+```json
+GET /_render/template
+POST /_render/template
+GET /_render/template/
+POST /_render/template/
+```
+
## Advanced parameter conversion with search templates
You have a lot of different syntax options in Mustache to transpose the input parameters into a query.
diff --git a/_opensearch/search/autocomplete.md b/_search-plugins/searching-data/autocomplete.md
similarity index 99%
rename from _opensearch/search/autocomplete.md
rename to _search-plugins/searching-data/autocomplete.md
index 36276ba477..ce867ed415 100644
--- a/_opensearch/search/autocomplete.md
+++ b/_search-plugins/searching-data/autocomplete.md
@@ -3,6 +3,8 @@ layout: default
title: Autocomplete
parent: Searching data
nav_order: 24
+redirect_from:
+ - /opensearch/search/autocomplete/
---
# Autocomplete functionality
diff --git a/_opensearch/search/did-you-mean.md b/_search-plugins/searching-data/did-you-mean.md
similarity index 100%
rename from _opensearch/search/did-you-mean.md
rename to _search-plugins/searching-data/did-you-mean.md
diff --git a/_opensearch/search/highlight.md b/_search-plugins/searching-data/highlight.md
similarity index 99%
rename from _opensearch/search/highlight.md
rename to _search-plugins/searching-data/highlight.md
index 52db512cbb..7b312e563e 100644
--- a/_opensearch/search/highlight.md
+++ b/_search-plugins/searching-data/highlight.md
@@ -3,6 +3,8 @@ layout: default
title: Highlight query matches
parent: Searching data
nav_order: 23
+redirect_from:
+ - /opensearch/search/highlight/
---
# Highlight query matches
diff --git a/_opensearch/search/index.md b/_search-plugins/searching-data/index.md
similarity index 98%
rename from _opensearch/search/index.md
rename to _search-plugins/searching-data/index.md
index 35c6671cd6..7e1c5a7eea 100644
--- a/_opensearch/search/index.md
+++ b/_search-plugins/searching-data/index.md
@@ -1,7 +1,7 @@
---
layout: default
title: Searching data
-nav_order: 20
+nav_order: 5
has_children: true
has_toc: false
redirect_from: /opensearch/ux/
diff --git a/_opensearch/search/paginate.md b/_search-plugins/searching-data/paginate.md
similarity index 99%
rename from _opensearch/search/paginate.md
rename to _search-plugins/searching-data/paginate.md
index 660a99f2a5..a43cfac782 100644
--- a/_opensearch/search/paginate.md
+++ b/_search-plugins/searching-data/paginate.md
@@ -3,6 +3,8 @@ layout: default
title: Paginate results
parent: Searching data
nav_order: 21
+redirect_from:
+ - /opensearch/search/paginate/
---
## Paginate results
diff --git a/_opensearch/search/sort.md b/_search-plugins/searching-data/sort.md
similarity index 99%
rename from _opensearch/search/sort.md
rename to _search-plugins/searching-data/sort.md
index dac96d175a..fa4875d32f 100644
--- a/_opensearch/search/sort.md
+++ b/_search-plugins/searching-data/sort.md
@@ -3,6 +3,8 @@ layout: default
title: Sort results
parent: Searching data
nav_order: 22
+redirect_from:
+ - /opensearch/search/sort/
---
## Sort results
diff --git a/_search-plugins/sql/full-text.md b/_search-plugins/sql/full-text.md
index 9c60692801..ce72cc149c 100644
--- a/_search-plugins/sql/full-text.md
+++ b/_search-plugins/sql/full-text.md
@@ -148,7 +148,7 @@ You can specify the following options for `MULTI_MATCH` in any order:
- `zero_terms_query`
- `boost`
-Please, refer to `multi_match` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#multi-match) for parameter description and supported values.
+Please, refer to `multi_match` query [documentation](#multi-match) for parameter description and supported values.
### For example, REST API search for `Dale` in either the `firstname` or `lastname` fields:
diff --git a/_clients/cli.md b/_tools/cli.md
similarity index 98%
rename from _clients/cli.md
rename to _tools/cli.md
index 01a16593b5..04371d67cd 100644
--- a/_clients/cli.md
+++ b/_tools/cli.md
@@ -1,8 +1,10 @@
---
layout: default
title: OpenSearch CLI
-nav_order: 52
+nav_order: 70
has_children: false
+redirect_from:
+ - /clients/cli/
---
# OpenSearch CLI
diff --git a/_clients/grafana.md b/_tools/grafana.md
similarity index 95%
rename from _clients/grafana.md
rename to _tools/grafana.md
index 97e35de40e..16a899d82e 100644
--- a/_clients/grafana.md
+++ b/_tools/grafana.md
@@ -1,7 +1,7 @@
---
layout: default
title: Grafana
-nav_order: 150
+nav_order: 200
has_children: false
---
diff --git a/_clients/agents-and-ingestion-tools/index.md b/_tools/index.md
similarity index 55%
rename from _clients/agents-and-ingestion-tools/index.md
rename to _tools/index.md
index 4eab146acf..b669d45d0b 100644
--- a/_clients/agents-and-ingestion-tools/index.md
+++ b/_tools/index.md
@@ -1,25 +1,25 @@
---
layout: default
-title: Agents and ingestion tools
-nav_order: 140
+title: Tools
+nav_order: 50
has_children: false
-has_toc: false
redirect_from:
- /clients/agents-and-ingestion-tools/
---
-# Agents and ingestion tools
+# OpenSearch tools
-Historically, many multiple popular agents and ingestion tools have worked with Elasticsearch OSS, such as Beats, Logstash, Fluentd, FluentBit, and OpenTelemetry. OpenSearch aims to continue to support a broad set of agents and ingestion tools, but not all have been tested or have explicitly added OpenSearch compatibility.
+This section provides documentation for OpenSearch-supported tools, including:
-Previously, an intermediate compatibility solution was available. OpenSearch had a setting that instructed the cluster to return version 7.10.2 rather than its actual version.
+- [Agents and ingestion tools](#agents-and-ingestion-tools)
+- [OpenSearch CLI](#opensearch-cli)
+- [OpenSearch Kubernetes operator](#opensearch-kubernetes-operator)
-The override main response setting `compatibility.override_main_response_version` is deprecated from OpenSearch version 1.x and removed from OpenSearch 2.0.0. This setting is no longer supported for compatibility with legacy clients.
-{: .note}
+## Agents and ingestion tools
-
+
Logstash OSS 8.0 introduces a breaking change where all plugins run in ECS compatibility mode by default. If you use a compatible [OSS client](#compatibility-matrices) you must override the default value to maintain legacy behavior:
```yml
ecs_compatibility => disabled
```
-## Downloads
+### Downloads
You can download the OpenSearch output plugin for Logstash from [OpenSearch downloads](https://opensearch.org/downloads.html). The Logstash output plugin is compatible with OpenSearch and Elasticsearch OSS (7.10.2 or lower).
@@ -70,30 +70,41 @@ Some users report compatibility issues with ingest pipelines on these versions o
### Compatibility Matrix for Logstash
-| | Logstash OSS 7.x to 7.11.x | Logstash OSS 7.12.x\* | Logstash 7.13.x-7.16.x without OpenSearch output plugin | Logstash 7.13.x-7.16.x with OpenSearch output plugin | Logstash 8.x+ with OpenSearch output plugin
+| | Logstash OSS 7.0.0 to 7.11.x | Logstash OSS 7.12.x\* | Logstash 7.13.x-7.16.x without OpenSearch output plugin | Logstash 7.13.x-7.16.x with OpenSearch output plugin | Logstash 8.x+ with OpenSearch output plugin
| :---| :--- | :--- | :--- | :--- | :--- |
-| Elasticsearch OSS 7.x to 7.9.x | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
+| Elasticsearch OSS 7.0.0 to 7.9.x | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
| Elasticsearch OSS 7.10.2 | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
-| ODFE 1.x to 1.12 | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
+| ODFE 1.0 to 1.12 | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
| ODFE 1.13 | *Yes* | *Yes* | *No* | *Yes* | *Yes* |
-| OpenSearch 1.x | Yes via version setting | Yes via version setting | *No* | *Yes* | Yes, with Elastic Common Schema Setting |
+| OpenSearch 1.x to 2.x | Yes via version setting | Yes via version setting | *No* | *Yes* | Yes, with Elastic Common Schema Setting |
\* Most current compatible version with Elasticsearch OSS.
### Compatibility Matrix for Beats
-| | Beats OSS 7.x to 7.11.x\*\* | Beats OSS 7.12.x\* | Beats 7.13.x |
+| | Beats OSS 7.0.0 to 7.11.x\*\* | Beats OSS 7.12.x\* | Beats 7.13.x |
| :--- | :--- | :--- | :--- |
-| Elasticsearch OSS 7.x to 7.9.x | *Yes* | *Yes* | No |
+| Elasticsearch OSS 7.0.0 to 7.9.x | *Yes* | *Yes* | No |
| Elasticsearch OSS 7.10.2 | *Yes* | *Yes* | No |
-| ODFE 1.x to 1.12 | *Yes* | *Yes* | No |
+| ODFE 1.0 to 1.12 | *Yes* | *Yes* | No |
| ODFE 1.13 | *Yes* | *Yes* | No |
-| OpenSearch 1.x | Yes via version setting | Yes via version setting | No |
-| Logstash OSS 7.x to 7.11.x | *Yes* | *Yes* | *Yes* |
+| OpenSearch 1.x to 2.x | Yes via version setting | Yes via version setting | No |
+| Logstash OSS 7.0.0 to 7.11.x | *Yes* | *Yes* | *Yes* |
| Logstash OSS 7.12.x\* | *Yes* | *Yes* | *Yes* |
| Logstash 7.13.x with OpenSearch output plugin | *Yes* | *Yes* | *Yes* |
\* Most current compatible version with Elasticsearch OSS.
\*\* Beats OSS includes all Apache 2.0 Beats agents (i.e. Filebeat, Metricbeat, Auditbeat, Heartbeat, Winlogbeat, Packetbeat).
+
+Beats versions newer than 7.12.x are not supported by OpenSearch. If you must update the Beats agent(s) in your environment to a newer version, you can work around the incompatibility by directing traffic from Beats to Logstash and using the Logstash Output plugin to ingest the data to OpenSearch.
+{: .warning }
+
+## OpenSearch CLI
+
+The OpenSearch CLI command line interface (opensearch-cli) lets you manage your OpenSearch cluster from the command line and automate tasks. For more information on OpenSearch CLI, see [OpenSearch CLI]({{site.url}}{{site.baseurl}}/tools/cli/).
+
+## OpenSearch Kubernetes operator
+
+The OpenSearch Kubernetes (K8s) Operator is an open-source kubernetes operator that helps automate the deployment and provisioning of OpenSearch and OpenSearch Dashboards in a containerized environment. For information on how to use the K8s operator, see [OpenSearch Kubernetes operator]({{site.url}}{{site.baseurl}}/tools/k8s-operator/)
\ No newline at end of file
diff --git a/_tools/k8s-operator.md b/_tools/k8s-operator.md
new file mode 100644
index 0000000000..3f9f8512f7
--- /dev/null
+++ b/_tools/k8s-operator.md
@@ -0,0 +1,147 @@
+---
+layout: default
+title: OpenSearch Kubernetes Operator
+nav_order: 80
+has_children: false
+---
+
+The OpenSearch Kubernetes Operator is an open-source kubernetes operator that helps automate the deployment and provisioning of OpenSearch and OpenSearch Dashboards in a containerized environment. The operator can manage multiple OpenSearch clusters that can be scaled up and down depending on your needs.
+
+
+## Installation
+
+There are two ways to get started with the operator:
+
+- [Use a Helm chart](#use-a-helm-chart).
+- [Use a local installation](#use-a-local-installation).
+
+### Use a Helm chart
+
+If you use Helm to manage your Kubernetes cluster, you can use the OpenSearch Kubernetes Operator's Cloud Native Computing Foundation (CNCF) project stored in Artifact Hub, a web-based application for finding, installing, and publishing CNCF packages.
+
+To begin, log in to your Kubernetes cluster and add the Helm repository (repo) from [Artifact Hub](https://opster.github.io/opensearch-Kubernetes-operator/).
+
+```
+helm repo add opensearch-operator https://opster.github.io/opensearch-k8s-operator/
+```
+
+Make sure that the repo is included in your Kubernetes cluster.
+
+```
+helm repo list | grep opensearch
+```
+
+Both the `opensearch` and `opensearch-operator` repos appear in the list of repos.
+
+
+Install the manager that operates all of the OpenSearch Kubernetes Operator's actions.
+
+```
+helm install opensearch-operator opensearch-operator/opensearch-operator
+```
+
+After the installation completes, the operator returns information on the deployment with `STATUS: deployed`. Then you can configure and start your [OpenSearch cluster](#deploy-a-new-opensearch-cluster).
+
+### Use a local installation
+
+If you want to create a new Kubernetes cluster on your existing machine, use a local installation.
+
+If this is your first time running Kubernetes and you intend to run through these instructions on your laptop, make sure that you have the following installed:
+
+- [Kubernetes](https://kubernetes.io/docs/tasks/tools/)
+- [Docker](https://docs.docker.com/engine/install/)
+- [minikube](https://minikube.sigs.k8s.io/docs/start/)
+
+Before running through the installation steps, make sure that you have a Kubernetes environment running locally. When using minikube, open a new terminal window and enter `minikube start`. Kubernetes will now use a containerized minikube cluster with a namespace called `default`.
+
+Then install the OpenSearch Kubernetes Operator using the following steps:
+
+1. In your preferred directory, clone the [OpenSearch Kubernetes Operator repo](https://github.com/Opster/opensearch-k8s-operator). Navigate into repo's directory using `cd`.
+2. Go to the `opensearch-operator` folder.
+3. Enter `make build manifests`.
+4. Start a Kubernetes cluster. When using minikube, open a new terminal window and enter `minikube start`. Kubernetes will now use a containerized minikube cluster with a namespace called `default`. Make sure that `~/.kube/config` points to the cluster.
+
+```yml
+apiVersion: v1
+clusters:
+- cluster:
+ certificate-authority: /Users/naarcha/.minikube/ca.crt
+ extensions:
+ - extension:
+ last-update: Mon, 29 Aug 2022 10:11:47 CDT
+ provider: minikube.sigs.k8s.io
+ version: v1.26.1
+ name: cluster_info
+ server: https://127.0.0.1:61661
+ name: minikube
+contexts:
+- context:
+ cluster: minikube
+ extensions:
+ - extension:
+ last-update: Mon, 29 Aug 2022 10:11:47 CDT
+ provider: minikube.sigs.k8s.io
+ version: v1.26.1
+ name: context_info
+ namespace: default
+ user: minikube
+ name: minikube
+current-context: minikube
+kind: Config
+preferences: {}
+users:
+- name: minikube
+ user:
+ client-certificate: /Users/naarcha/.minikube/profiles/minikube/client.crt
+ client-key: /Users/naarcha/.minikube/profiles/minikube/client.key
+```
+
+5. Enter `make install` to create the CustomResourceDefinition that runs in your Kubernetes cluster.
+6. Start the OpenSearch Kubernetes Operator. Enter `make run`.
+
+## Verify Kubernetes deployment
+
+To ensure that Kubernetes recognizes the OpenSearch Kubernetes Operator as a namespace, enter `k get ns | grep opensearch`. Both `opensearch` and `opensearch-operator-system` should appear as `Active`.
+
+With the operator active, use `k get pod -n opensearch-operator-system` to make sure that the operator's pods are running.
+
+```
+NAME READY STATUS RESTARTS AGE
+opensearch-operator-controller-manager- 2/2 Running 0 25m
+```
+
+With the Kubernetes cluster running, you can now run OpenSearch inside the cluster.
+
+## Deploy a new OpenSearch cluster
+
+From your cloned OpenSearch Kubernetes Operator repo, navigate to the `opensearch-operator/examples` directory. There you'll find the `opensearch-cluster.yaml` file, which can be customized to the needs of your cluster, including the `clusterName` that acts as the namespace in which your new OpenSearch cluster will reside.
+
+With your cluster configured, run the `kubectl apply` command.
+
+```
+kubectl apply -f opensearch-cluster.yaml
+```
+
+The operator creates several pods, including a bootstrap pod, three OpenSearch cluster pods, and one Dashboards pod. To connect to your cluster, use the `port-forward` command.
+
+```
+kubectl port-forward svc/my-cluster-dashboards 5601
+```
+
+Open http://localhost:5601 in your preferred browser and log in with the default demo credentials `admin / admin`. You can also run curl commands against the OpenSearch REST API by forwarding to port 9200.
+
+```
+kubectl port-forward svc/my-cluster 9200
+```
+
+In order to delete the OpenSearch cluster, delete the cluster resources. The following command deletes the cluster namespace and all its resources.
+
+```
+kubectl delete -f opensearch-cluster.yaml
+```
+
+## Next steps
+
+To learn more about how to customize your Kubernetes OpenSearch cluster, including data persistence, authentication methods, and scaling, see the [OpenSearch Kubernetes Operator User Guide](https://github.com/Opster/opensearch-k8s-operator/blob/main/docs/userguide/main.md).
+
+If you want to contribute to the development of the OpenSearch Kubernetes Operator, see the repo [design documents](https://github.com/Opster/opensearch-k8s-operator/blob/main/docs/designs/high-level.md).
\ No newline at end of file
diff --git a/_clients/logstash/advanced-config.md b/_tools/logstash/advanced-config.md
similarity index 100%
rename from _clients/logstash/advanced-config.md
rename to _tools/logstash/advanced-config.md
diff --git a/_clients/logstash/common-filters.md b/_tools/logstash/common-filters.md
similarity index 100%
rename from _clients/logstash/common-filters.md
rename to _tools/logstash/common-filters.md
diff --git a/_clients/logstash/execution-model.md b/_tools/logstash/execution-model.md
similarity index 100%
rename from _clients/logstash/execution-model.md
rename to _tools/logstash/execution-model.md
diff --git a/_clients/logstash/index.md b/_tools/logstash/index.md
similarity index 99%
rename from _clients/logstash/index.md
rename to _tools/logstash/index.md
index 2947ff7340..deb447045b 100644
--- a/_clients/logstash/index.md
+++ b/_tools/logstash/index.md
@@ -1,7 +1,7 @@
---
layout: default
title: Logstash
-nav_order: 200
+nav_order: 150
has_children: true
has_toc: true
redirect_from:
diff --git a/_clients/logstash/read-from-opensearch.md b/_tools/logstash/read-from-opensearch.md
similarity index 100%
rename from _clients/logstash/read-from-opensearch.md
rename to _tools/logstash/read-from-opensearch.md
diff --git a/_clients/logstash/ship-to-opensearch.md b/_tools/logstash/ship-to-opensearch.md
similarity index 86%
rename from _clients/logstash/ship-to-opensearch.md
rename to _tools/logstash/ship-to-opensearch.md
index 050c8a4336..2728ee98dd 100644
--- a/_clients/logstash/ship-to-opensearch.md
+++ b/_tools/logstash/ship-to-opensearch.md
@@ -9,7 +9,7 @@ nav_order: 220
You can Ship Logstash events to an OpenSearch cluster and then visualize your events with OpenSearch Dashboards.
-Make sure you have [Logstash]({{site.url}}{{site.baseurl}}/tools/logstash/index#install-logstash), [OpenSearch]({{site.url}}{{site.baseurl}}/opensearch/install/index/), and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/install/index/).
+Make sure you have [Logstash]({{site.url}}{{site.baseurl}}/tools/logstash/index#install-logstash), [OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/), and [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/).
{: .note }
## OpenSearch output plugin
@@ -117,7 +117,8 @@ output {
type => 'aws_iam'
aws_access_key_id => 'ACCESS_KEY'
aws_secret_access_key => 'SECRET_KEY'
- region => 'us-west-2'
+ region => 'us-west-2'
+ service_name => 'es'
}
index => "logstash-logs-%{+YYYY.MM.dd}"
}
@@ -142,8 +143,11 @@ output {
- Environment variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)
- Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
- Instance profile credentials delivered through the Amazon EC2 metadata service
-- template (path) - You can set the path to your own template here, if you so desire. If not set, the included template will be used.
-- template_name (string, default => "logstash") - defines how the template is named inside Opensearch
+- template (path) - You can set the path to your own template here. If no template is specified, the plugin uses the default template.
+- template_name (string, default => "logstash") - Defines how the template is named inside Opensearch
+- service_name (string, default => "es") - Defines the service name to be used for `aws_iam` authentication.
+- legacy_template (boolean, default => true) - Selects the OpenSearch template API. When `true`, uses legacy templates via the _template API. When `false`, uses composable templates via the _index_template API.
+- default_server_major_version (number) - The OpenSearch server major version to use when it's not available from the OpenSearch root URL. If not set, the plugin throws an exception when the version can't be fetched.
## Data streams
diff --git a/_tuning-your-cluster/availability-and-recovery/segment-replication/configuration.md b/_tuning-your-cluster/availability-and-recovery/segment-replication/configuration.md
deleted file mode 100644
index b336df6985..0000000000
--- a/_tuning-your-cluster/availability-and-recovery/segment-replication/configuration.md
+++ /dev/null
@@ -1,84 +0,0 @@
----
-layout: default
-title: Segment replication configuration
-nav_order: 12
-parent: Segment replication
-grand_parent: Availability and Recovery
----
-
-# Segment replication configuration
-
-Segment replication is an experimental feature. Therefore, we do not recommend the use of segment replication in a production environment. For updates on the progress of segment replication or if you want to leave feedback that could help improve the feature, see the [Segment replication issue](https://github.com/opensearch-project/OpenSearch/issues/2194).
-{: .warning }
-
-To enable the segment replication type, reference the steps below.
-
-## Enabling the feature flag
-
-There are several methods for enabling segment replication, depending on the install type. You will also need to set the replication strategy to `SEGMENT` when creating the index.
-
-### Enable on a node using a tarball install
-
-The flag is toggled using a new jvm parameter that is set either in `OPENSEARCH_JAVA_OPTS` or in config/jvm.options.
-
-1. Option 1: Update config/jvm.options by adding the following line:
-
- ````json
- -Dopensearch.experimental.feature.replication_type.enabled=true
- ````
-
-1. Option 2: Use the `OPENSEARCH_JAVA_OPTS` environment variable:
-
- ````json
- export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true"
- ````
-1. Option 3: For developers using Gradle, update run.gradle by adding the following lines:
-
- ````json
- testClusters {
- runTask {
- testDistribution = 'archive'
- if (numZones > 1) numberOfZones = numZones
- if (numNodes > 1) numberOfNodes = numNodes
- systemProperty 'opensearch.experimental.feature.replication_type.enabled', 'true'
- }
- }
- ````
-
-### Enable with Docker containers
-
-If you're running Docker, add the following line to docker-compose.yml underneath the `opensearch-node` and `environment` section:
-
-````json
-OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.replication_type.enabled=true" # Enables segment replication
-````
-
-### Setting the replication strategy on the index
-
-To set the replication strategy to segment replication, create an index with replication.type set to `SEGMENT`:
-
-````json
-PUT /my-index1
-{
- "settings": {
- "index": {
- "replication.type": "SEGMENT"
- }
- }
-}
-````
-
-## Known limitations
-
-1. Enabling segment replication for an existing index requires [reindexing](https://github.com/opensearch-project/OpenSearch/issues/3685).
-1. Rolling upgrades are currently not supported. Full cluster restarts are required when upgrading indexes using segment replication. [Issue 3881](https://github.com/opensearch-project/OpenSearch/issues/3881).
-1. [Cross-cluster replication](https://github.com/opensearch-project/OpenSearch/issues/4090) does not currently use segment replication to copy between clusters.
-1. Increased network congestion on primary shards. [Issue - Optimize network bandwidth on primary shards](https://github.com/opensearch-project/OpenSearch/issues/4245).
-1. Shard allocation algorithms have not been updated to evenly spread primary shards across nodes.
-1. Integration with remote-backed storage as the source of replication is [currently unsupported](https://github.com/opensearch-project/OpenSearch/issues/4448).
-
-### Further resources regarding segment replication
-
-1. [Known issues](https://github.com/opensearch-project/OpenSearch/issues/2194).
-1. Steps for testing (link coming soon).
-1. Segment replication blog post (link coming soon).
\ No newline at end of file
diff --git a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md b/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md
deleted file mode 100644
index b7641f8192..0000000000
--- a/_tuning-your-cluster/availability-and-recovery/segment-replication/index.md
+++ /dev/null
@@ -1,27 +0,0 @@
----
-layout: default
-title: Segment replication
-nav_order: 70
-has_children: true
-parent: Availability and Recovery
-redirect_from:
- - /opensearch/segment-replication/
- - /opensearch/segment-replication/index/
----
-
-# Segment replication
-
-Segment replication is an experimental feature with OpenSearch 2.3. Therefore, we do not recommend the use of segment replication in a production environment. For updates on the progress of segment replication or if you want leave feedback that could help improve the feature, see the [Segment replication git issue](https://github.com/opensearch-project/OpenSearch/issues/2194).
-{: .warning}
-
-With segment replication, segment files are copied across shards instead of documents being indexed on each shard copy. This improves indexing throughput and lowers resource utilization at the expense of increased network utilization.
-
-As an experimental feature, segment replication will be behind a feature flag and must be enabled on **each node** of a cluster and pass a new setting during index creation.
-{: .note }
-
-### Potential use cases
-
-- Users who have high write loads but do not have high search requirements and are comfortable with longer refresh times.
-- Users with very high loads who want to add new nodes, as you do not need to index all nodes when adding a new node to the cluster.
-
-This is the first step in a series of features designed to decouple reads and writes in order to lower compute costs.
\ No newline at end of file
diff --git a/_tuning-your-cluster/availability-and-recovery/shard-indexing-backpressure.md b/_tuning-your-cluster/availability-and-recovery/shard-indexing-backpressure.md
deleted file mode 100644
index cde2f125cb..0000000000
--- a/_tuning-your-cluster/availability-and-recovery/shard-indexing-backpressure.md
+++ /dev/null
@@ -1,33 +0,0 @@
----
-layout: default
-title: Shard indexing backpressure
-nav_order: 62
-has_children: true
-parent: Availability and Recovery
-redirect_from:
- - /opensearch/shard-indexing-backpressure/
----
-
-# Shard indexing backpressure
-
-Shard indexing backpressure is a smart rejection mechanism at a per-shard level that dynamically rejects indexing requests when your cluster is under strain. It propagates a backpressure that transfers requests from an overwhelmed node or shard to other nodes or shards that are still healthy.
-
-With shard indexing backpressure, you can prevent nodes in your cluster from running into cascading failures due to performance degradation caused by slow nodes, stuck tasks, resource-intensive requests, traffic surges, skewed shard allocations, and so on.
-
-Shard indexing backpressure comes into effect only when one primary and one secondary parameter is breached.
-
-## Primary parameters
-
-Primary parameters are early indicators that a cluster is under strain:
-
-- Shard memory limit breach: If the memory usage of a shard exceeds 95% of its allocated memory, this limit is breached.
-- Node memory limit breach: If the memory usage of a node exceeds 70% of its allocated memory, this limit is breached.
-
-The breach of primary parameters doesn’t cause any actual request rejections, it just triggers an evaluation of the secondary parameters.
-
-## Secondary parameters
-
-Secondary parameters check the performance at the shard level to confirm that the cluster is under strain:
-
-- Throughput: If the throughput at the shard level decreases significantly in its historic view, this limit is breached.
-- Successful Request: If the number of pending requests increases significantly in its historic view, this limit is breached.
diff --git a/_tuning-your-cluster/availability-and-recovery/shard-indexing-settings.md b/_tuning-your-cluster/availability-and-recovery/shard-indexing-settings.md
deleted file mode 100644
index 88b0ea70b4..0000000000
--- a/_tuning-your-cluster/availability-and-recovery/shard-indexing-settings.md
+++ /dev/null
@@ -1,52 +0,0 @@
----
-layout: default
-title: Settings
-parent: Shard indexing backpressure
-nav_order: 50
-grand_parent: Availability and Recovery
-redirect_from:
- - /opensearch/shard-indexing-settings/
----
-
-# Settings
-
-Shard indexing backpressure adds several settings to the standard OpenSearch cluster settings. They are dynamic, so you can change the default behavior of this feature without restarting your cluster.
-
-## High-level controls
-
-The high-level controls allow you to turn the shard indexing backpressure feature on or off.
-
-Setting | Default | Description
-:--- | :--- | :---
-`shard_indexing_pressure.enabled` | False | Change to `true` to enable shard indexing backpressure.
-`shard_indexing_pressure.enforced` | False | Run shard indexing backpressure in shadow mode or enforced mode. In shadow mode (value set as `false`), shard indexing backpressure tracks all granular-level metrics, but it doesn't actually reject any indexing requests. In enforced mode (value set as `true`), shard indexing backpressure rejects any requests to the cluster that might cause a dip in its performance.
-
-## Node-level limits
-
-Node-level limits allow you to control memory usage on a node.
-
-Setting | Default | Description
-:--- | :--- | :---
-`shard_indexing_pressure.primary_parameter.node.soft_limit` | 70% | Define the percentage of the node-level memory threshold that acts as a soft indicator for strain on a node.
-
-## Shard-level limits
-
-Shard-level limits allow you to control memory usage on a shard.
-
-Setting | Default | Description
-:--- | :--- | :---
-`shard_indexing_pressure.primary_parameter.shard.min_limit` | 0.001d | Specify the minimum assigned quota for a new shard in any role (coordinator, primary, or replica). Shard indexing backpressure increases or decreases this allocated quota based on the inflow of traffic for the shard.
-`shard_indexing_pressure.operating_factor.lower` | 75% | Specify the lower occupancy limit of the allocated quota of memory for the shard. If the total memory usage of a shard is below this limit, shard indexing backpressure decreases the current allocated memory for that shard.
-`shard_indexing_pressure.operating_factor.optimal` | 85% | Specify the optimal occupancy of the allocated quota of memory for the shard. If the total memory usage of a shard is at this level, shard indexing backpressure doesn't change the current allocated memory for that shard.
-`shard_indexing_pressure.operating_factor.upper` | 95% | Specify the upper occupancy limit of the allocated quota of memory for the shard. If the total memory usage of a shard is above this limit, shard indexing backpressure increases the current allocated memory for that shard.
-
-## Performance degradation factors
-
-The performance degradation factors allow you to control the dynamic performance thresholds for a shard.
-
-Setting | Default | Description
-:--- | :--- | :---
-`shard_indexing_pressure.secondary_parameter.throughput.request_size_window` | 2,000 | The number of requests in the sampling window size on a shard. Shard indexing backpressure compares the overall performance of requests with the requests in the sample window to detect any performance degradation.
-`shard_indexing_pressure.secondary_parameter.throughput.degradation_factor` | 5x | The degradation factor per unit byte for a request. This parameter determines the threshold for any latency spikes. The default value is 5x, which implies that if the latency shoots up 5 times in the historic view, shard indexing backpressure marks it as a performance degradation.
-`shard_indexing_pressure.secondary_parameter.successful_request.elapsed_timeout` | 300000 ms | The amount of time a request is pending in a cluster. This parameter helps identify any stuck-request scenarios.
-`shard_indexing_pressure.secondary_parameter.successful_request.max_outstanding_requests` | 100 | The maximum number of pending requests in a cluster.
diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/index.md b/_tuning-your-cluster/availability-and-recovery/snapshots/index.md
new file mode 100644
index 0000000000..3fde2804b7
--- /dev/null
+++ b/_tuning-your-cluster/availability-and-recovery/snapshots/index.md
@@ -0,0 +1,30 @@
+---
+layout: default
+title: Snapshots
+nav_order: 5
+has_children: true
+parent: Availability and Recovery
+redirect_from:
+ - /opensearch/snapshots/
+ - /opensearch/snapshots/index/
+has_toc: false
+---
+
+# Snapshots
+
+Snapshots are backups of a cluster's indexes and state. State includes cluster settings, node information, index metadata (mappings, settings, templates, etc.), and shard allocation.
+
+Snapshots have two main uses:
+
+- **Recovering from failure**
+
+ For example, if cluster health goes red, you might restore the red indexes from a snapshot.
+
+- **Migrating from one cluster to another**
+
+ For example, if you're moving from a proof-of-concept to a production cluster, you might take a snapshot of the former and restore it on the latter.
+
+
+You can take and restore snapshots using the [snapshot API]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore).
+
+If you need to automate taking snapshots, you can use the [snapshot management]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-management) feature.
diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md
new file mode 100644
index 0000000000..f7ef3c981d
--- /dev/null
+++ b/_tuning-your-cluster/availability-and-recovery/snapshots/searchable_snapshot.md
@@ -0,0 +1,125 @@
+---
+layout: default
+title: Searchable snapshots
+parent: Snapshots
+nav_order: 40
+grand_parent: Availability and Recovery
+redirect_from:
+ - /opensearch/snapshots/searchable_snapshot/
+---
+
+# Searchable snapshots
+
+Searchable snapshots is an experimental feature released in OpenSearch 2.4. Therefore, we do not recommend the use of this feature in a production environment. For updates on progress, follow us on [GitHub](https://github.com/opensearch-project/OpenSearch/issues/3739). If you have any feedback please [submit a new issue](https://github.com/opensearch-project/OpenSearch/issues/new/choose).
+{: .warning }
+
+A searchable snapshot is an index where data is read from a [snapshot repository]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/#register-repository) on demand at search time rather than all index data being downloaded to cluster storage at restore time. Because the index data remains in the snapshot format in the repository, searchable snapshot indexes are inherently read-only. Any attempt to write to a searchable snapshot index will result in an error.
+
+To enable the searchable snapshots feature, reference the following steps.
+
+## Enabling the feature flag
+
+There are several methods for enabling searchable snapshots, depending on the installation type.
+
+### Enable on a node using a tarball installation
+
+The flag is toggled using a new jvm parameter that is set either in `OPENSEARCH_JAVA_OPTS` or in config/jvm.options:
+
+- Option 1: Update config/jvm.options by adding the following line:
+
+ ```json
+ -Dopensearch.experimental.feature.searchable_snapshot.enabled=true
+ ```
+
+- Option 2: Use the `OPENSEARCH_JAVA_OPTS` environment variable:
+
+ ```json
+ export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.searchable_snapshot.enabled=true"
+ ```
+- Option 3: For developers using Gradle, update run.gradle by adding the following lines:
+
+ ```json
+ testClusters {
+ runTask {
+ testDistribution = 'archive'
+ if (numZones > 1) numberOfZones = numZones
+ if (numNodes > 1) numberOfNodes = numNodes
+ systemProperty 'opensearch.experimental.feature.searchable_snapshot.enabled', 'true'
+ }
+ }
+ ```
+
+- Finally, create a node in your opensearch.yml file and define the node role as `search`:
+
+ ```bash
+ node.name: snapshots-node
+ node.roles: [ search ]
+ ```
+
+### Enable with Docker containers
+
+If you're running Docker, add the following line to docker-compose.yml underneath the `opensearch-node` and `environment` sections:
+
+```json
+OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.searchable_snapshot.enabled=true" # Enables searchable snapshot
+```
+
+To create a node with the `search` node role, add the line `- node.roles: [ search ]` to your docker-compose.yml file:
+
+```bash
+version: '3'
+services:
+ opensearch-node1:
+ image: opensearchproject/opensearch:2.4.0
+ container_name: opensearch-node1
+ environment:
+ - cluster.name=opensearch-cluster
+ - node.name=opensearch-node1
+ - node.roles: [ search ]
+```
+
+## Create a searchable snapshot index
+
+A searchable snapshot index is created by specifying the `remote_snapshot` storage type using the [restore snapshots API]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore/#restore-snapshots).
+
+Request Field | Description
+:--- | :---
+`storage_type` | `local` indicates that all snapshot metadata and index data will be downloaded to local storage.
`remote_snapshot` indicates that snapshot metadata will be downloaded to the cluster, but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured with the `search` node role in order to restore a snapshot using the `remote_snapshot` type.
Defaults to `local`.
+
+## Listing indexes
+
+To determine whether an index is a searchable snapshot index, look for a store type with the value of `remote_snapshot`:
+
+```
+GET /my-index/_settings?pretty
+```
+
+```json
+{
+ "my-index": {
+ "settings": {
+ "index": {
+ "store": {
+ "type": "remote_snapshot"
+ }
+ }
+ }
+ }
+}
+```
+
+## Potential use cases
+
+The following are potential use cases for the searchable snapshots feature:
+
+- The ability to offload indexes from cluster-based storage but retain the ability to search them.
+- The ability to have a large number of searchable indexes in lower-cost media.
+
+## Known limitations
+
+The following are known limitations of the searchable snapshots feature:
+
+- Accessing data from a remote repository is slower than local disk reads, so higher latencies on search queries are expected.
+- Data is discarded immediately after being read. Subsequent searches for the same data will have to be downloaded again. This will be addressed in the future by implementing a disk-based cache for storing frequently accessed data.
+- Many remote object stores charge on a per-request basis for retrieval, so users should closely monitor any costs incurred.
+- Searching remote data can impact the performance of other queries running on the same node. We recommend that users provision dedicated nodes with the `search` role for performance-critical applications.
\ No newline at end of file
diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md
new file mode 100644
index 0000000000..c664b39ad6
--- /dev/null
+++ b/_tuning-your-cluster/availability-and-recovery/snapshots/sm-api.md
@@ -0,0 +1,463 @@
+---
+layout: default
+title: Snapshot management API
+parent: Snapshots
+nav_order: 30
+has_children: false
+grand_parent: Availability and Recovery
+redirect_from:
+ - /opensearch/snapshots/sm-api/
+---
+
+# Snapshot Management API
+
+Use the snapshot management (SM) API to automate [taking snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#take-snapshots).
+
+---
+
+#### Table of contents
+- TOC
+{:toc}
+
+
+---
+
+## Create or update a policy
+Introduced 2.1
+{: .label .label-purple }
+
+Creates or updates an SM policy.
+
+#### Request
+
+Create:
+
+```json
+POST _plugins/_sm/policies/
+```
+
+Update:
+
+```json
+PUT _plugins/_sm/policies/?if_seq_no=0&if_primary_term=1
+```
+
+You must provide the `seq_no` and `primary_term` parameters for an update request.
+
+### Example
+
+```json
+POST _plugins/_sm/policies/daily-policy
+{
+ "description": "Daily snapshot policy",
+ "creation": {
+ "schedule": {
+ "cron": {
+ "expression": "0 8 * * *",
+ "timezone": "UTC"
+ }
+ },
+ "time_limit": "1h"
+ },
+ "deletion": {
+ "schedule": {
+ "cron": {
+ "expression": "0 1 * * *",
+ "timezone": "America/Los_Angeles"
+ }
+ },
+ "condition": {
+ "max_age": "7d",
+ "max_count": 21,
+ "min_count": 7
+ },
+ "time_limit": "1h"
+ },
+ "snapshot_config": {
+ "date_format": "yyyy-MM-dd-HH:mm",
+ "timezone": "America/Los_Angeles",
+ "indices": "*",
+ "repository": "s3-repo",
+ "ignore_unavailable": "true",
+ "include_global_state": "false",
+ "partial": "true",
+ "metadata": {
+ "any_key": "any_value"
+ }
+ },
+ "notification": {
+ "channel": {
+ "id": "NC3OpoEBzEoHMX183R3f"
+ },
+ "conditions": {
+ "creation": true,
+ "deletion": false,
+ "failure": false,
+ "time_limit_exceeded": false
+ }
+ }
+}
+```
+
+### Response
+
+```json
+{
+ "_id" : "daily-policy-sm-policy",
+ "_version" : 5,
+ "_seq_no" : 54983,
+ "_primary_term" : 21,
+ "sm_policy" : {
+ "name" : "daily-policy",
+ "description" : "Daily snapshot policy",
+ "schema_version" : 15,
+ "creation" : {
+ "schedule" : {
+ "cron" : {
+ "expression" : "0 8 * * *",
+ "timezone" : "UTC"
+ }
+ },
+ "time_limit" : "1h"
+ },
+ "deletion" : {
+ "schedule" : {
+ "cron" : {
+ "expression" : "0 1 * * *",
+ "timezone" : "America/Los_Angeles"
+ }
+ },
+ "condition" : {
+ "max_age" : "7d",
+ "min_count" : 7,
+ "max_count" : 21
+ },
+ "time_limit" : "1h"
+ },
+ "snapshot_config" : {
+ "indices" : "*",
+ "metadata" : {
+ "any_key" : "any_value"
+ },
+ "ignore_unavailable" : "true",
+ "timezone" : "America/Los_Angeles",
+ "include_global_state" : "false",
+ "date_format" : "yyyy-MM-dd-HH:mm",
+ "repository" : "s3-repo",
+ "partial" : "true"
+ },
+ "schedule" : {
+ "interval" : {
+ "start_time" : 1656425122909,
+ "period" : 1,
+ "unit" : "Minutes"
+ }
+ },
+ "enabled" : true,
+ "last_updated_time" : 1656425122909,
+ "enabled_time" : 1656425122909,
+ "notification" : {
+ "channel" : {
+ "id" : "NC3OpoEBzEoHMX183R3f"
+ },
+ "conditions" : {
+ "creation" : true,
+ "deletion" : false,
+ "failure" : false,
+ "time_limit_exceeded" : false
+ }
+ }
+ }
+}
+```
+
+### Parameters
+
+You can specify the following parameters to create/update an SM policy.
+
+Parameter | Type | Description
+:--- | :--- |:--- |:--- |
+`description` | String | The description of the SM policy. Optional.
+`enabled` | Boolean | Should this SM policy be enabled at creation? Optional.
+`snapshot_config` | Object | The configuration options for snapshot creation. Required.
+`snapshot_config.date_format` | String | Snapshot names have the format `--`. `date_format` specifies the format for the date in the snapshot name. Supports all date formats supported by OpenSearch. Optional. Default is "yyyy-MM-dd'T'HH:mm:ss".
+`snapshot_config.date_format_timezone` | String | Snapshot names have the format `--`. `date_format_timezone` specifies the time zone for the date in the snapshot name. Optional. Default is UTC.
+`snapshot_config.indices` | String | The names of the indexes in the snapshot. Multiple index names are separated by `,`. Supports wildcards (`*`). Optional. Default is `*` (all indexes).
+`snapshot_config.repository` | String | The repository in which to store snapshots. Required.
+`snapshot_config.ignore_unavailable` | Boolean | Do you want to ignore unavailable indexes? Optional. Default is `false`.
+`snapshot_config.include_global_state` | Boolean | Do you want to include cluster state? Optional. Default is `true` because of [Security plugin considerations]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore#security-considerations).
+`snapshot_config.partial` | Boolean | Do you want to allow partial snapshots? Optional. Default is `false`.
+`snapshot_config.metadata` | Object | Metadata in the form of key/value pairs. Optional.
+`creation` | Object | Configuration for snapshot creation. Required.
+`creation.schedule` | String | The cron schedule used to create snapshots. Required.
+`creation.time_limit` | String | Sets the maximum time to wait for snapshot creation to finish. If time_limit is longer than the scheduled time interval for taking snapshots, no scheduled snapshots are taken until time_limit elapses. For example, if time_limit is set to 35 minutes and snapshots are taken every 30 minutes starting at midnight, the snapshots at 00:00 and 01:00 are taken, but the snapshot at 00:30 is skipped. Optional.
+`deletion` | Object | Configuration for snapshot deletion. Optional. Default is to retain all snapshots.
+`deletion.schedule` | String | The cron schedule used to delete snapshots. Optional. Default is to use `creation.schedule`, which is required.
+`deletion.time_limit` | String | Sets the maximum time to wait for snapshot deletion to finish. Optional.
+`deletion.delete_condition` | Object | Conditions for snapshot deletion. Optional.
+`deletion.delete_condition.max_count` | Integer | The maximum number of snapshots to be retained. Optional.
+`deletion.delete_condition.max_age` | String | The maximum time a snapshot is retained. Optional.
+`deletion.delete_condition.min_count` | Integer | The minimum number of snapshots to be retained. Optional. Default is one.
+`notification` | Object | Defines notifications for SM events. Optional.
+`notification.channel` | Object | Defines a channel for notifications. Required.
+`notification.channel.id` | String | The channel ID of the channel used for notifications. To get the channel IDs of all created channels, use `GET _plugins/_notifications/configs`. Required.
+`notification.conditions` | Object | SM events you want to be notified about. Set the ones you are interested in to `true`.
+`notification.conditions.creation` | Boolean | Do you want notifications about snapshot creation? Optional. Default is `true`.
+`notification.conditions.deletion` | Boolean | Do you want notifications about snapshot deletion? Optional. Default is `false`.
+`notification.conditions.failure` | Boolean | Do you want notifications about creation or deletion failure? Optional. Default is `false`.
+`notification.conditions.time_limit_exceeded` | Boolean | Do you want notifications when snapshot operations take longer than time_limit? Optional. Default is `false`.
+
+## Get policies
+Introduced 2.1
+{: .label .label-purple }
+
+Gets SM policies.
+
+#### Request
+
+Get all SM policies:
+
+```json
+GET _plugins/_sm/policies
+```
+You can use a [query string]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#query-string) and specify pagination, the field to be sorted by, and sort order:
+
+```json
+GET _plugins/_sm/policies?from=0&size=20&sortField=sm_policy.name&sortOrder=desc&queryString=*
+```
+
+Get a specific SM policy:
+
+```
+GET _plugins/_sm/policies/
+```
+
+### Example
+
+```json
+GET _plugins/_sm/policies/daily-policy
+```
+
+### Response
+
+```json
+{
+ "_id" : "daily-policy-sm-policy",
+ "_version" : 6,
+ "_seq_no" : 44696,
+ "_primary_term" : 19,
+ "sm_policy" : {
+ "name" : "daily-policy",
+ "description" : "Daily snapshot policy",
+ "schema_version" : 15,
+ "creation" : {
+ "schedule" : {
+ "cron" : {
+ "expression" : "0 8 * * *",
+ "timezone" : "UTC"
+ }
+ },
+ "time_limit" : "1h"
+ },
+ "deletion" : {
+ "schedule" : {
+ "cron" : {
+ "expression" : "0 1 * * *",
+ "timezone" : "America/Los_Angeles"
+ }
+ },
+ "condition" : {
+ "max_age" : "7d",
+ "min_count" : 7,
+ "max_count" : 21
+ },
+ "time_limit" : "1h"
+ },
+ "snapshot_config" : {
+ "metadata" : {
+ "any_key" : "any_value"
+ },
+ "ignore_unavailable" : "true",
+ "include_global_state" : "false",
+ "date_format" : "yyyy-MM-dd-HH:mm",
+ "repository" : "s3-repo",
+ "partial" : "true"
+ },
+ "schedule" : {
+ "interval" : {
+ "start_time" : 1656341042874,
+ "period" : 1,
+ "unit" : "Minutes"
+ }
+ },
+ "enabled" : true,
+ "last_updated_time" : 1656341042874,
+ "enabled_time" : 1656341042874
+ }
+}
+```
+
+## Explain
+Introduced 2.1
+{: .label .label-purple }
+
+Provides the enabled/disabled status and the metadata for all policies specified. Multiple policy names are separated with `,`. You can also specify desired policies with a wildcard pattern.
+
+
+
+SM uses a state machine for snapshot creation and deletion. The image on the left shows one execution period of the creation workflow, from the CREATION_START state to the CREATION_FINISHED state. Deletion workflow follows the same pattern as creation workflow.
+
+The creation workflow starts in the CREATION_START state and continuously checks if the conditions in the creation cron schedule are met. After the conditions are met, the creation workflow switches to the CREATION_CONDITION_MET state and continues to the CREATING state. The CREATING state calls the create snapshot API asynchronously and then waits for snapshot creation to end in the CREATION_FINISHED state. Once snapshot creation ends, the creation workflow goes back to the CREATION_START state, and the cycle continues. The `current_state` field of `metadata.creation` and `metadata.deletion` returns the current state of the state machine.
+
+#### Request
+
+```json
+GET _plugins/_sm/policies//_explain
+```
+
+### Example
+
+```json
+GET _plugins/_sm/policies/daily*/_explain
+```
+
+### Response
+
+```json
+{
+ "policies" : [
+ {
+ "name" : "daily-policy",
+ "creation" : {
+ "current_state" : "CREATION_START",
+ "trigger" : {
+ "time" : 1656403200000
+ }
+ },
+ "deletion" : {
+ "current_state" : "DELETION_START",
+ "trigger" : {
+ "time" : 1656403200000
+ }
+ },
+ "policy_seq_no" : 44696,
+ "policy_primary_term" : 19,
+ "enabled" : true
+ }
+ ]
+}
+```
+
+The following table lists all fields for each policy in the response.
+
+Field | Description
+:--- |:---
+`name` | The name of the SM policy.
+`creation` | Information about the latest creation operation. See subfields below.
+`deletion` | Information about the latest deletion operation. See subfields below.
+`policy_seq_no`
`policy_primary_term` | The version of the SM policy.
+`enabled` | Is the policy running?
+
+The following table lists all fields in the `creation` and `deletion` objects of each policy.
+
+Field | Description
+:--- |:---
+`current_state` | The current state of the state machine that runs snapshot creation/deletion as described above.
+`trigger.time` | The next creation/deletion execution time in milliseconds since the epoch.
+`latest_execution` | Describes the latest creation/deletion execution.
+`latest_execution.status` | The execution status of the latest creation/deletion. Possible values are:
`IN_PROGRESS`: Snapshot creation/deletion has started.
`SUCCESS`: Snapshot creation/deletion has finished successfully.
`RETRYING`: The creation/deletion attempt has failed. It will be retried three times.
`FAILED`: The creation/deletion attempt failed after three retries. End the current execution period and go to the next execution period.
`TIME_LIMIT_EXCEEDED`: The creation/deletion time exceeded the time_limit set in the policy. End the current execution period and go to the next execution period.
+`latest_execution.start_time` | The start time of the latest execution in milliseconds since the epoch.
+`latest_execution.end_time` | The end time of the latest execution in milliseconds since the epoch.
+`latest_execution.info.message` | A user-friendly message describing the status of the latest execution.
+`latest_execution.info.cause` | Contains the failure reason if the latest execution fails.
+`retry.count` | The number of remaining execution retry attempts.
+
+
+## Start a policy
+Introduced 2.1
+{: .label .label-purple }
+
+Starts the policy by setting its `enabled` flag to `true`.
+
+#### Request
+
+```json
+POST _plugins/_sm/policies//_start
+```
+
+### Example
+
+```json
+POST _plugins/_sm/policies/daily-policy/_start
+```
+
+### Response
+
+```json
+{
+ "acknowledged" : true
+}
+```
+
+## Stop a policy
+Introduced 2.1
+{: .label .label-purple }
+
+Sets the `enabled` flag to `false` for an SM policy. The policy will not run until you [start](#start-a-policy) it.
+
+#### Request
+
+```json
+POST _plugins/_sm/policies//_stop
+```
+
+### Example
+
+```json
+POST _plugins/_sm/policies/daily-policy/_stop
+```
+
+### Response
+
+```json
+{
+ "acknowledged" : true
+}
+```
+
+## Delete a policy
+Introduced 2.1
+{: .label .label-purple }
+
+Deletes the specified SM policy.
+
+#### Request
+
+```json
+DELETE _plugins/_sm/policies/
+```
+
+### Example
+
+```json
+DELETE _plugins/_sm/policies/daily-policy
+```
+
+### Response
+
+```json
+{
+ "_index" : ".opendistro-ism-config",
+ "_id" : "daily-policy-sm-policy",
+ "_version" : 8,
+ "result" : "deleted",
+ "forced_refresh" : true,
+ "_shards" : {
+ "total" : 2,
+ "successful" : 2,
+ "failed" : 0
+ },
+ "_seq_no" : 45366,
+ "_primary_term" : 20
+}
+```
\ No newline at end of file
diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md
new file mode 100644
index 0000000000..9a25b28683
--- /dev/null
+++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-management.md
@@ -0,0 +1,81 @@
+---
+layout: default
+title: Snapshot management
+parent: Snapshots
+nav_order: 20
+has_children: false
+grand_parent: Availability and Recovery
+redirect_from:
+ - /opensearch/snapshots/snapshot-management/
+---
+
+# Snapshot management
+
+Snapshot management (SM) lets you automate [taking snapshots]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#take-snapshots). To use this feature, you need to install the [Index Management (IM) Plugin]({{site.url}}{{site.baseurl}}/im-plugin). Snapshots store only incremental changes since the last snapshot. Thus, while taking an initial snapshot may be a heavy operation, subsequent snapshots have minimal overhead. To set up automatic snapshots, you have to create an SM policy with a desired SM schedule and configuration.
+
+When you create an SM policy, its document ID is given the name `-sm-policy`. Because of this, SM policies have to obey the following rules:
+
+- SM policies must have unique names.
+
+- You cannot update the policy name after its creation.
+
+SM-created snapshots have names in the format `--`. Two snapshots created by different policies at the same time always have different names because of the `` prefix. To avoid name collisions within the same policy, each snapshot's name contains a random string suffix.
+
+Each policy has associated metadata that stores the policy status. Snapshot management saves SM policies and metadata in the system index and reads them from the system index. Thus, Snapshot Management depends on the OpenSearch cluster's indexing and searching functions. The policy's metadata keeps information about the latest creation and deletion only. The metadata is read before running every scheduled job so that SM can continue execution from the previous job's state. You can view the metadata using the [explain API]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#explain).
+
+An SM schedule is a custom [cron]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/cron) expression. It consists of two parts: a creation schedule and a deletion schedule. You must set up a creation schedule that specifies the frequency and timing of snapshot creation. Optionally, you can set up a separate schedule for deleting snapshots.
+
+An SM configuration includes the indexes and repository for the snapshots and supports all parameters you can define when [creating a snapshot]({{site.url}}{{site.baseurl}}/opensearch/snapshots/snapshot-restore#take-snapshots) using the API. Additionally, you can specify the format and time zone for the date used in the snapshot's name.
+
+
+## Performance
+
+One snapshot can contain as many indexes as there are in the cluster. We expect at most dozens of SM policies in one cluster, but a snapshot repository can safely scale to thousands of snapshots. However, to manage its metadata, a large repository requires more memory on the cluster manager node.
+
+Snapshot Management depends on the Job Scheduler plugin to schedule a job that is run periodically. Each SM policy corresponds to one SM-scheduled job. The scheduled job is lightweight, so the burden of SM depends on the snapshot creation frequency and the burden of running the snapshot operation itself.
+
+## Concurrency
+
+An SM policy does not support concurrent snapshot operations, since too many such operations may degrade the cluster. Snapshot operations (creation or deletion) are performed asynchronously. SM does not start a new operation until the previous asynchronous operation finishes.
+
+We don't recommend creating several SM policies with the same schedule and overlapping indexes in one cluster because it leads to concurrent snapshot creation on the same indexes and hinders performance.
+{: .warning }
+
+
+We don't recommend setting up the same repository for multiple SM policies with same schedule in different clusters, since it may cause a sudden spike of burden in this repository.
+{: .warning }
+
+## Failure management
+
+If a snapshot operation fails, it is retried a maximum of three times. The failure message is saved in `metadata.latest_execution` and is overwritten when a subsequent snapshot operation starts. You can view the failure message using the [explain API]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#explain). When using OpenSearch Dashboards, you can view the failure message on the [policy details page]({{site.url}}{{site.baseurl}}/dashboards/admin-ui-index/sm-dashboards#view-edit-or-delete-an-sm-policy). Possible reasons for failure include red index status and shard reallocation.
+
+## Security
+
+The Security plugin has two built-in roles for Snapshot Management actions: `snapshot_management_full_access` and `snapshot_management_read_access`. For descriptions of each, see [Predefined roles]({{site.url}}{{site.baseurl}}/security/access-control/users-roles#predefined-roles).
+
+The following table lists the required permissions for each Snapshot Management API.
+
+Function | API | Permission
+:--- | :--- | :---
+Get policy | GET _plugins/_sm/policies
GET _plugins/_sm/policies/`policy_name` | cluster:admin/opensearch/snapshot_management/policy/get
cluster:admin/opensearch/snapshot_management/policy/search
+Create/update policy | POST _plugins/_sm/policies/`policy_name`
PUT _plugins/_sm/policies/`policy_name`?if_seq_no=1&if_primary_term=1 | cluster:admin/opensearch/snapshot_management/policy/write
+Delete policy | DELETE _plugins/_sm/policies/`policy_name` | cluster:admin/opensearch/snapshot_management/policy/delete
+Explain | GET _plugins/_sm/policies/`policy_names`/_explain | cluster:admin/opensearch/snapshot_management/policy/explain
+Start | POST _plugins/_sm/policies/`policy_name`/_start | cluster:admin/opensearch/snapshot_management/policy/start
+Stop| POST _plugins/_sm/policies/`policy_name`/_stop | cluster:admin/opensearch/snapshot_management/policy/stop
+
+
+## API
+
+The following table lists all [Snapshot Management API]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api) functions.
+
+Function | API | Description
+:--- | :--- | :---
+[Create policy]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#create-or-update-a-policy) | POST _plugins/_sm/policies/`policy_name` | Creates an SM policy.
+[Update policy]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#create-or-update-a-policy) | PUT _plugins/_sm/policies/`policy_name`?if_seq_no=`sequence_number`&if_primary_term=`primary_term` | Modifies the `policy_name` policy.
+[Get all policies]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#get-policies) | GET _plugins/_sm/policies | Returns all SM policies.
+[Get the policy `policy_name`]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#get-policies) | GET _plugins/_sm/policies/`policy_name` | Returns the `policy_name` SM policy.
+[Delete policy]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#delete-a-policy) | DELETE _plugins/_sm/policies/`policy_name` | Deletes the `policy_name` policy.
+[Explain]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#explain) | GET _plugins/_sm/policies/`policy_names`/_explain | Provides the enabled/disabled status and the metadata for all policies specified by `policy_names`.
+[Start policy]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#start-a-policy) | POST _plugins/_sm/policies/`policy_name`/_start | Starts the `policy_name` policy.
+[Stop policy]({{site.url}}{{site.baseurl}}/opensearch/snapshots/sm-api#stop-a-policy)| POST _plugins/_sm/policies/`policy_name`/_stop | Stops the `policy_name` policy.
\ No newline at end of file
diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md
index c51a581029..238cbac18a 100644
--- a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md
+++ b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md
@@ -7,6 +7,7 @@ has_children: false
grand_parent: Availability and Recovery
redirect_from:
- /opensearch/snapshots/snapshot-restore/
+ - /availability-and-recovery/snapshots/snapshot-restore/
---
# Take and restore snapshots
diff --git a/_tuning-your-cluster/availability-and-recovery/stats-api.md b/_tuning-your-cluster/availability-and-recovery/stats-api.md
index a0de57ca16..539703367e 100644
--- a/_tuning-your-cluster/availability-and-recovery/stats-api.md
+++ b/_tuning-your-cluster/availability-and-recovery/stats-api.md
@@ -47,7 +47,7 @@ If `enforced` is `true`:
"roles": [
"data",
"ingest",
- "master",
+ "cluster_manager",
"remote_cluster_client"
],
"attributes": {
@@ -154,7 +154,7 @@ If `enforced` is `false`:
"roles": [
"data",
"ingest",
- "master",
+ "cluster_manager",
"remote_cluster_client"
],
"attributes": {
@@ -267,7 +267,7 @@ GET _nodes/_local/stats/shard_indexing_pressure?include_all
"roles": [
"data",
"ingest",
- "master",
+ "cluster_manager",
"remote_cluster_client"
],
"attributes": {
@@ -382,7 +382,7 @@ If `enforced` is `true`:
"roles": [
"data",
"ingest",
- "master",
+ "cluster_manager",
"remote_cluster_client"
],
"attributes": {
@@ -425,7 +425,7 @@ If `enforced` is `false`:
"roles": [
"data",
"ingest",
- "master",
+ "cluster_manager",
"remote_cluster_client"
],
"attributes": {
@@ -474,7 +474,7 @@ GET _nodes/stats/shard_indexing_pressure
"roles": [
"data",
"ingest",
- "master",
+ "cluster_manager",
"remote_cluster_client"
],
"attributes": {
diff --git a/_tuning-your-cluster/cluster.md b/_tuning-your-cluster/cluster.md
index e4ef23d7fa..99d489a3d3 100644
--- a/_tuning-your-cluster/cluster.md
+++ b/_tuning-your-cluster/cluster.md
@@ -32,6 +32,7 @@ Cluster manager eligible | Elects one node among them as the cluster manager nod
Data | Stores and searches data. Performs all data-related operations (indexing, searching, aggregating) on local shards. These are the worker nodes of your cluster and need more disk space than any other node type. | As you add data nodes, keep them balanced between zones. For example, if you have three zones, add data nodes in multiples of three, one for each zone. We recommend using storage and RAM-heavy nodes.
Ingest | Pre-processes data before storing it in the cluster. Runs an ingest pipeline that transforms your data before adding it to an index. | If you plan to ingest a lot of data and run complex ingest pipelines, we recommend you use dedicated ingest nodes. You can also optionally offload your indexing from the data nodes so that your data nodes are used exclusively for searching and aggregating.
Coordinating | Delegates client requests to the shards on the data nodes, collects and aggregates the results into one final result, and sends this result back to the client. | A couple of dedicated coordinating-only nodes is appropriate to prevent bottlenecks for search-heavy workloads. We recommend using CPUs with as many cores as you can.
+Dynamic | Delegates a specific node for custom work, such as machine learning (ML) tasks, preventing the consumption of resources from data nodes and therefore not affecting any OpenSearch functionality.
By default, each node is a cluster-manager-eligible, data, ingest, and coordinating node. Deciding on the number of nodes, assigning node types, and choosing the hardware for each node type depends on your use case. You must take into account factors like the amount of time you want to hold on to your data, the average size of your documents, your typical workload (indexing, searches, aggregations), your expected price-performance ratio, your risk tolerance, and so on.
@@ -72,13 +73,13 @@ After you name the cluster, set node attributes for each node in your cluster.
Give your cluster manager node a name. If you don't specify a name, OpenSearch assigns a machine-generated name that makes the node difficult to monitor and troubleshoot.
```yml
-node.name: opensearch-master
+node.name: opensearch-cluster_manager
```
-You can also explicitly specify that this node is a cluster manager node, even though it is already set to true by default. Set the node role to `master` to make it easier to identify the cluster manager node.
+You can also explicitly specify that this node is a cluster manager node, even though it is already set to true by default. Set the node role to `cluster_manager` to make it easier to identify the cluster manager node.
```yml
-node.roles: [ master ]
+node.roles: [ cluster_manager ]
```
#### Data nodes
@@ -139,7 +140,7 @@ Zen Discovery is the built-in, default mechanism that uses [unicast](https://en.
You can generally just add all of your cluster-manager-eligible nodes to the `discovery.seed_hosts` array. When a node starts up, it finds the other cluster-manager-eligible nodes, determines which one is the cluster manager, and asks to join the cluster.
-For example, for `opensearch-master` the line looks something like this:
+For example, for `opensearch-cluster_manager` the line looks something like this:
```yml
discovery.seed_hosts: ["", "", ""]
@@ -169,8 +170,8 @@ curl -XGET https://:9200/_cat/nodes?v -u 'admin:admin' --insecure
```
```
-ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
-x.x.x.x 13 61 0 0.02 0.04 0.05 mi * opensearch-master
+ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role cluster_manager name
+x.x.x.x 13 61 0 0.02 0.04 0.05 mi * opensearch-cluster_manager
x.x.x.x 16 60 0 0.06 0.05 0.05 md - opensearch-d1
x.x.x.x 34 38 0 0.12 0.07 0.06 md - opensearch-d2
x.x.x.x 23 38 0 0.12 0.07 0.06 md - opensearch-c1
@@ -180,6 +181,8 @@ To better understand and monitor your cluster, use the [CAT API]({{site.url}}{{s
## (Advanced) Step 6: Configure shard allocation awareness or forced awareness
+### Shard allocation awareness
+
If your nodes are spread across several geographical zones, you can configure shard allocation awareness to allocate all replica shards to a zone that’s different from their primary shard.
With shard allocation awareness, if the nodes in one of your zones fail, you can be assured that your replica shards are spread across your other zones. It adds a layer of fault tolerance to ensure your data survives a zone failure beyond just individual node failures.
@@ -209,6 +212,8 @@ You can either use `persistent` or `transient` settings. We recommend the `persi
Shard allocation awareness attempts to separate primary and replica shards across multiple zones. However, if only one zone is available (such as after a zone failure), OpenSearch allocates replica shards to the only remaining zone.
+### Forced awareness
+
Another option is to require that primary and replica shards are never allocated to the same zone. This is called forced awareness.
To configure forced awareness, specify all the possible values for your zone attributes:
@@ -230,6 +235,28 @@ If that is not the case, and `opensearch-d1` and `opensearch-d2` do not have the
Choosing allocation awareness or forced awareness depends on how much space you might need in each zone to balance your primary and replica shards.
+### Replica count enforcement
+
+To enforce an even distribution of shards across all zones and avoid hotspots, you can set the `routing.allocation.awareness.balance` attribute to `true`. This setting can be configured in the opensearch.yml file and dynamically updated using the cluster update settings API:
+
+```json
+PUT _cluster/settings
+{
+ "persistent": {
+ "cluster": {
+ "routing.allocation.awareness.balance": "true"
+ }
+ }
+}
+```
+
+The `routing.allocation.awareness.balance` setting is false by default. When it is set to `true`, the total number of shards for the index must be a multiple of the highest count for any awareness attribute. For example, consider a configuration with two awareness attributes—zones and rack IDs. Let's say there are two zones and three rack IDs. The highest count of either the number of zones or the number of rack IDs is three. Therefore, the number of shards must be a multiple of three. If it is not, OpenSearch throws a validation exception.
+
+`routing.allocation.awareness.balance` takes effect only if `cluster.routing.allocation.awareness.attributes` and `cluster.routing.allocation.awareness.force.zone.values` are set.
+{: .note}
+
+`routing.allocation.awareness.balance` applies to all operations that create or update indices. For example, let's say you're running a cluster with three nodes and three zones in a zone-aware setting. If you try to create an index with one replica or update an index's settings to one replica, the attempt will fail with a validation exception because the number of shards must be a multiple of three. Similarly, if you try to create an index template with one shard and no replicas, the attempt will fail for the same reason. However, in all of those operations, if you set the number of shards to one and the number of replicas to two, the total number of shards is three and the attempt will succeed.
+
## (Advanced) Step 7: Set up a hot-warm architecture
You can design a hot-warm architecture where you first index your data to hot nodes---fast and expensive---and after a certain period of time move them to warm nodes---slow and cheap.
diff --git a/_tuning-your-cluster/replication-plugin/api.md b/_tuning-your-cluster/replication-plugin/api.md
new file mode 100644
index 0000000000..bec1721d16
--- /dev/null
+++ b/_tuning-your-cluster/replication-plugin/api.md
@@ -0,0 +1,394 @@
+---
+layout: default
+title: API
+nav_order: 50
+parent: Cross-cluster replication
+redirect_from:
+ - /replication-plugin/api/
+---
+
+# Cross-cluster replication API
+
+Use these replication operations to programmatically manage cross-cluster replication.
+
+#### Table of contents
+- TOC
+{:toc}
+
+## Start replication
+Introduced 1.1
+{: .label .label-purple }
+
+Initiate replication of an index from the leader cluster to the follower cluster. Send this request to the follower cluster.
+
+
+#### Request
+
+```json
+PUT /_plugins/_replication//_start
+{
+ "leader_alias":"",
+ "leader_index":"",
+ "use_roles":{
+ "leader_cluster_role":"",
+ "follower_cluster_role":""
+ }
+}
+```
+
+Specify the following options:
+
+Options | Description | Type | Required
+:--- | :--- |:--- |:--- |
+`leader_alias` | The name of the cross-cluster connection. You define this alias when you [set up a cross-cluster connection]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/#set-up-a-cross-cluster-connection). | `string` | Yes
+`leader_index` | The index on the leader cluster that you want to replicate. | `string` | Yes
+`use_roles` | The roles to use for all subsequent backend replication tasks between the indexes. Specify a `leader_cluster_role` and `follower_cluster_role`. See [Map the leader and follower cluster roles]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/#map-the-leader-and-follower-cluster-roles). | `string` | If security plugin is enabled
+
+#### Example response
+
+```json
+{
+ "acknowledged": true
+}
+```
+
+## Stop replication
+Introduced 1.1
+{: .label .label-purple }
+
+Terminates replication and converts the follower index to a standard index. Send this request to the follower cluster.
+
+#### Request
+
+```json
+POST /_plugins/_replication//_stop
+{}
+```
+
+#### Example response
+
+```json
+{
+ "acknowledged": true
+}
+```
+
+## Pause replication
+Introduced 1.1
+{: .label .label-purple }
+
+Pauses replication of the leader index. Send this request to the follower cluster.
+
+#### Request
+
+```json
+POST /_plugins/_replication//_pause
+{}
+```
+
+You can't resume replication after it's been paused for more than 12 hours. You must [stop replication]({{site.url}}{{site.baseurl}}/replication-plugin/api/#stop-replication), delete the follower index, and restart replication of the leader.
+
+#### Example response
+
+```json
+{
+ "acknowledged": true
+}
+```
+
+## Resume replication
+Introduced 1.1
+{: .label .label-purple }
+
+Resumes replication of the leader index. Send this request to the follower cluster.
+
+#### Request
+
+```json
+POST /_plugins/_replication//_resume
+{}
+```
+
+#### Example response
+
+```json
+{
+ "acknowledged": true
+}
+```
+
+## Get replication status
+Introduced 1.1
+{: .label .label-purple }
+
+Gets the status of index replication. Possible statuses are `SYNCING`, `BOOTSTRAPING`, `PAUSED`, and `REPLICATION NOT IN PROGRESS`. Use the syncing details to measure replication lag. Send this request to the follower cluster.
+
+#### Request
+
+```json
+GET /_plugins/_replication//_status
+```
+
+#### Example response
+
+```json
+{
+ "status" : "SYNCING",
+ "reason" : "User initiated",
+ "leader_alias" : "my-connection-name",
+ "leader_index" : "leader-01",
+ "follower_index" : "follower-01",
+ "syncing_details" : {
+ "leader_checkpoint" : 19,
+ "follower_checkpoint" : 19,
+ "seq_no" : 0
+ }
+}
+```
+To include shard replication details in the response, add the `&verbose=true` parameter.
+
+The leader and follower checkpoint values begin as negative integers and reflect the shard count (-1 for one shard, -5 for five shards, and so on). The values increment toward positive integers with each change that you make. For example, when you make a change on the leader index, the `leader_checkpoint` becomes `0`. The `follower_checkpoint` is initially still `-1` until the follower index pulls the change from the leader, at which point it increments to `0`. If the values are the same, it means the indexes are fully synced.
+
+## Get leader cluster stats
+Introduced 1.1
+{: .label .label-purple }
+
+Gets information about replicated leader indexes on a specified cluster.
+
+#### Request
+
+```json
+GET /_plugins/_replication/leader_stats
+```
+
+#### Example response
+
+```json
+{
+ "num_replicated_indices": 2,
+ "operations_read": 15,
+ "translog_size_bytes": 1355,
+ "operations_read_lucene": 0,
+ "operations_read_translog": 15,
+ "total_read_time_lucene_millis": 0,
+ "total_read_time_translog_millis": 659,
+ "bytes_read": 1000,
+ "index_stats":{
+ "leader-index-1":{
+ "operations_read": 7,
+ "translog_size_bytes": 639,
+ "operations_read_lucene": 0,
+ "operations_read_translog": 7,
+ "total_read_time_lucene_millis": 0,
+ "total_read_time_translog_millis": 353,
+ "bytes_read":466
+ },
+ "leader-index-2":{
+ "operations_read": 8,
+ "translog_size_bytes": 716,
+ "operations_read_lucene": 0,
+ "operations_read_translog": 8,
+ "total_read_time_lucene_millis": 0,
+ "total_read_time_translog_millis": 306,
+ "bytes_read": 534
+ }
+ }
+}
+```
+
+## Get follower cluster stats
+Introduced 1.1
+{: .label .label-purple }
+
+Gets information about follower (syncing) indexes on a specified cluster.
+
+#### Request
+
+```json
+GET /_plugins/_replication/follower_stats
+```
+
+#### Example response
+
+```json
+{
+ "num_syncing_indices": 2,
+ "num_bootstrapping_indices": 0,
+ "num_paused_indices": 0,
+ "num_failed_indices": 0,
+ "num_shard_tasks": 2,
+ "num_index_tasks": 2,
+ "operations_written": 3,
+ "operations_read": 3,
+ "failed_read_requests": 0,
+ "throttled_read_requests": 0,
+ "failed_write_requests": 0,
+ "throttled_write_requests": 0,
+ "follower_checkpoint": 1,
+ "leader_checkpoint": 1,
+ "total_write_time_millis": 2290,
+ "index_stats":{
+ "follower-index-1":{
+ "operations_written": 2,
+ "operations_read": 2,
+ "failed_read_requests": 0,
+ "throttled_read_requests": 0,
+ "failed_write_requests": 0,
+ "throttled_write_requests": 0,
+ "follower_checkpoint": 1,
+ "leader_checkpoint": 1,
+ "total_write_time_millis": 1355
+ },
+ "follower-index-2":{
+ "operations_written": 1,
+ "operations_read": 1,
+ "failed_read_requests": 0,
+ "throttled_read_requests": 0,
+ "failed_write_requests": 0,
+ "throttled_write_requests": 0,
+ "follower_checkpoint": 0,
+ "leader_checkpoint": 0,
+ "total_write_time_millis": 935
+ }
+ }
+}
+```
+
+## Get auto-follow stats
+Introduced 1.1
+{: .label .label-purple }
+
+Gets information about auto-follow activity and any replication rules configured on the specified cluster.
+
+#### Request
+
+```json
+GET /_plugins/_replication/autofollow_stats
+```
+
+#### Example response
+
+```json
+{
+ "num_success_start_replication": 2,
+ "num_failed_start_replication": 0,
+ "num_failed_leader_calls": 0,
+ "failed_indices":[
+
+ ],
+ "autofollow_stats":[
+ {
+ "name":"my-replication-rule",
+ "pattern":"movies*",
+ "num_success_start_replication": 2,
+ "num_failed_start_replication": 0,
+ "num_failed_leader_calls": 0,
+ "failed_indices":[
+
+ ]
+ }
+ ]
+}
+```
+
+## Update settings
+Introduced 1.1
+{: .label .label-purple }
+
+Updates settings on the follower index.
+
+#### Request
+
+```json
+PUT /_plugins/_replication//_update
+{
+ "settings":{
+ "index.number_of_shards": 4,
+ "index.number_of_replicas": 2
+ }
+}
+```
+
+#### Example response
+
+```json
+{
+ "acknowledged": true
+}
+```
+
+## Create replication rule
+Introduced 1.1
+{: .label .label-purple }
+
+Automatically starts replication on indexes matching a specified pattern. If a new index on the leader cluster matches the pattern, OpenSearch automatically creates a follower index and begins replication. You can also use this API to update existing replication rules.
+
+Send this request to the follower cluster.
+
+Make sure to note the names of all auto-follow patterns after you create them. The replication plugin currently does not include an API operation to retrieve a list of existing patterns.
+{: .tip }
+
+#### Request
+
+```json
+POST /_plugins/_replication/_autofollow
+{
+ "leader_alias" : "",
+ "name": "",
+ "pattern": "",
+ "use_roles":{
+ "leader_cluster_role": "",
+ "follower_cluster_role": ""
+ }
+}
+```
+
+Specify the following options:
+
+Options | Description | Type | Required
+:--- | :--- |:--- |:--- |
+`leader_alias` | The name of the cross-cluster connection. You define this alias when you [set up a cross-cluster connection]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/#set-up-a-cross-cluster-connection). | `string` | Yes
+`name` | A name for the auto-follow pattern. | `string` | Yes
+`pattern` | An array of index patterns to match against indexes in the specified leader cluster. Supports wildcard characters. For example, `leader-*`. | `string` | Yes
+`use_roles` | The roles to use for all subsequent backend replication tasks between the indexes. Specify a `leader_cluster_role` and `follower_cluster_role`. See [Map the leader and follower cluster roles]({{site.url}}{{site.baseurl}}/replication-plugin/permissions/#map-the-leader-and-follower-cluster-roles). | `string` | If security plugin is enabled
+
+#### Example response
+
+```json
+{
+ "acknowledged": true
+}
+```
+
+## Delete replication rule
+Introduced 1.1
+{: .label .label-purple }
+
+Deletes the specified replication rule. This operation prevents any new indexes from being replicated but does not stop existing replication that the rule has already initiated. Replicated indexes remain read-only until you stop replication.
+
+Send this request to the follower cluster.
+
+#### Request
+
+```json
+DELETE /_plugins/_replication/_autofollow
+{
+ "leader_alias" : "",
+ "name": "",
+}
+```
+
+Specify the following options:
+
+Options | Description | Type | Required
+:--- | :--- |:--- |:--- |
+`leader_alias` | The name of the cross-cluster connection. You define this alias when you [set up a cross-cluster connection]({{site.url}}{{site.baseurl}}/replication-plugin/get-started/#set-up-a-cross-cluster-connection). | `string` | Yes
+`name` | The name of the pattern. | `string` | Yes
+
+#### Example response
+
+```json
+{
+ "acknowledged": true
+}
+```