Skip to content

Commit

Permalink
Updates remote-backed storage documentation (#2363)
Browse files Browse the repository at this point in the history
* Update remote store documentation to reflect request level durability

Signed-off-by: Ashish Singh <ssashish@amazon.com>

* Update remote store documentation to reflect request level durability

Signed-off-by: Ashish Singh <ssashish@amazon.com>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Adds to the remote storage documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Incorporated tech review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Minor rewording

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Incorporated doc review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tech review updates

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Ashish Singh <ssashish@amazon.com>
Co-authored-by: Ashish <ashking94@gmail.com>
  • Loading branch information
3 people authored and vagimeli committed Jan 19, 2023
1 parent bd93752 commit c441bef
Showing 1 changed file with 101 additions and 4 deletions.
105 changes: 101 additions & 4 deletions _opensearch/remote.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,54 @@ Remote-backed storage is an experimental feature. Therefore, we do not recommend

Remote-backed storage offers OpenSearch users a new way to protect against data loss by automatically creating backups of all index transactions and sending them to remote storage. In order to expose this feature, segment replication must also be enabled. See [Segment replication]({{site.url}}{{site.baseurl}}/opensearch/segment-replication/) for additional information.

## Translog

Any index changes, such as indexing or deleting documents, are written to disk during a Lucene commit. However, Lucene commits are expensive operations, so they cannot be performed after every change to the index. Instead, each shard records every indexing operation in a transaction log called *translog*. When a document is indexed, it is added to the memory buffer and recorded in the translog. Frequent refresh operations write the documents in the memory buffer to a segment and then clear the memory buffer. Periodically, a flush performs a Lucene commit, which includes writing the segments to disk using fsync, purging the old translog, and starting a new translog. Thus, a translog contains all operations that have not yet been flushed.

## Segment replication and remote-backed storage

When neither segment replication nor remote-backed storage is enabled, OpenSearch uses document replication. In document replication, when a write request lands on the primary shard, the request is indexed to Lucene and stored in the translog. After this, the request is sent to the replicas, where, in turn, it is indexed to Lucene and stored in the translog for durability.

With segment replication, segments are created on the primary shard only and then copied to all replicas. The replicas do not index requests to Lucene, but they do create and maintain a translog.

With remote-backed storage, when a write request lands on the primary shard, the request is indexed to Lucene on the primary shard only. The corresponding translog is then uploaded to remote store. OpenSearch does not send the write request to the replicas, but rather performs a primary term validation to confirm that the request originator shard is still the primary shard. Primary term validation ensures that the acting primary shard fails if it becomes isolated and is unaware of the cluster manager electing a new primary.

## The `index.translog.durability` translog setting

Without remote-backed storage, indexing operations are only persisted to disk when the translog is fsynced. Therefore, any data that has not been written to disk can potentially be lost.

The `index.translog.durability` setting controls how frequently OpenSearch fsyncs the translog to disk:

- By default, `index.translog.durability` is set to `request`. This means that fsync happens after every request, and all acknowledged write requests persist in case of failure.

- If you set `index.translog.durability` to `async`, fsync happens periodically at the specified `sync_interval` (5 seconds by default). The fsync operation is asynchronous, so acknowledge is sent without waiting for fsync. Consequently, all acknowledged writes since the last commit are lost in case of failure.

With remote-backed storage, the translog is uploaded to a remote store for durability.

`index.translog.durability` is a dynamic setting. To update it, use the following query:

```json
PUT my_index/_settings
{
"index" : {
"translog.durability" : "request"
}
}
```

## Refresh-level and request-level durability

The remote store feature supports two levels of durability:

- Refresh-level durability: Segment files are uploaded to remote store after every refresh. Set the `remote_store` flag to `true` to achieve refresh-level durability. Commit-level durability is inherent, and uploads are asynchronous.

If you need to refresh an index manually, you can use the `_refresh` API. For example, to refresh the `my_index` index, use the following request:

```json
POST my_index/_refresh
```

- Request-level durability: Translogs are uploaded before acknowledging the request. Set the `translog` flag to `true` to achieve request-level durability. In this scenario, we recommend to batch as many requests as possible in a bulk request. Batching requests will improve indexing throughput and latency compared to sending individual write requests.

## Enable the feature flag

Expand Down Expand Up @@ -89,7 +137,7 @@ Now that your deployment is running with the feature flags enabled, the next ste
Remote-backed storage is enabled for an index when it is created. This feature cannot be enabled for indexes that already exist.
When creating the index, include the `remote_store` property to enable the feature and specify a target repository:
For refresh-level durability, include the `remote_store` property to enable the feature and specify a segment repository:
```bash
curl -X PUT "https://localhost:9200/my-index?pretty" -ku admin:admin -H 'Content-Type: application/json' -d'
Expand All @@ -103,15 +151,47 @@ curl -X PUT "https://localhost:9200/my-index?pretty" -ku admin:admin -H 'Content
},
"remote_store": {
"enabled": true,
"repository": "my-repo-1"
"repository": "segment-repo"
}
}
}
}
'
```
All data that is added to the index will also be uploaded to the remote storage once it is committed.
For request-level durability, in addition to the `remote_store` and segment repository, include the `translog` property and specify a translog repository:
```bash
curl -X PUT "https://localhost:9200/my-index?pretty" -ku admin:admin -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1,
"replication": {
"type": "SEGMENT"
},
"remote_store": {
"enabled": true,
"repository": "segment-repo",
"translog": {
"enabled": true,
"repository": "translog-repo"
}
}
}
}
}
'
```
You can have the same repository serve as both the segment repository and translog repository.
{: .note}
As data is added to the index, it also will be continuously uploaded to remote storage in the form of segment and translog files because of refreshes, flushes, and translog fsyncs to disk. Along with data, other metadata files will be uploaded.
Setting `translog.enabled` to `true` is currently an irreversible operation.
{: .warning}
### Restoring from a backup
Expand All @@ -132,4 +212,21 @@ curl -X POST "https://localhost:9200/_remotestore/_restore" -ku admin:admin -H '
```
If the security plugin is enabled, a user must have the `cluster:admin/remotestore/restore` permission. See [Access control](/security-plugin/access-control/index/) for information about configuring user permissions.
{: .note}
{: .note}
## Potential use cases
You can use remote-backed storage for the following purposes:
- To restore red clusters or indexes
- To recover all data up to the last acknowledged write, regardless of replica count, if `index.translog.durability` is set to `request`
## Known limitations
The following are known limitations of the remote-backed storage feature:
- Writing data to a remote store can be a high-latency operation when compared to writing data on the local file system. This may impact the indexing throughput and latency.
- Primary-to-primary relocation is unstable because handover of translog uploads from older to new primary has not been implemented.
- Garbage collection of the metadata file has not been implemented.
For other limitations, see the [Remote store known issue list](https://github.com/opensearch-project/OpenSearch/issues/5678).

0 comments on commit c441bef

Please sign in to comment.