Skip to content

[Feature Request] Support all-active mode in pull-based ingestion #19287

@varunbharadwaj

Description

@varunbharadwaj

Is your feature request related to a problem? Please describe

Pull-based ingestion today only supports segment replication mode, and is recommended to use a remote store for scalability. This adds a remote store dependency for using pull-based ingestion. This can be improved by supporting a doc-rep equivalent in pull-based ingestion having the replicas process the indexing requests/messages.

Describe the solution you'd like

In the document replication mode (default in opensearch), enable ingestion on both primary and replica shards for a pull-based ingestion index. Both primary and replica shards will consume from the same Kafka partition (or Kinesis equivalent) and process the messages independently. During shard recovery, the index will be recovered from the disk if present. If not, peer recovery (from primary shard) will follow after which ingestion will resume from the offset saved in the commit.

This model will be equivalent to docrep where the replicas will index documents. A key point to note is that the primary and each replica can process the events at different rates as there is no coordination between them. But this model removes the segrep and remote store dependency. It is also expected to support higher throughput and better scalability when compared to push-based docrep mode. In comparison to push-based segrep, this mode should provide better freshness as all replicas directly ingest from the Kafka topic.

A new config to indicate the "all-active" model will be introduced in the future when we plan to support this model along with segrep. For now, we default to this model when using pull-based without segrep. As a next step, it is possible to improve the recovery logic to consider the batchStartPointer (recovery offset/start point) to decide when to recover from the primary shard.

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Labels

IndexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or requestuntriaged

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions