Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Reindex-from-Snapshot with Sub-shard checkpoints #1095

Open
sumobrian opened this issue Oct 23, 2024 · 0 comments
Open

Optimized Reindex-from-Snapshot with Sub-shard checkpoints #1095

sumobrian opened this issue Oct 23, 2024 · 0 comments
Labels
enhancement New feature or request MAv2.2

Comments

@sumobrian
Copy link
Collaborator

sumobrian commented Oct 23, 2024

Is your feature request related to a problem? Please describe.

Currently, the reindex-from-snapshot process will attempt to migrate all docs in a shard with one worker and will only mark the shard as completed once all docs are migrated in a single attempt. This increases migration time and risk for larger shards as the first docs in each shard may be retried several times before succeeding. The reliance on a long-running single worker for large shards increases risk of failure increasing the time to complete the migration.

Describe the solution you'd like

Implement the ability to regularly checkpoint and resume migration of shards, which limits the amount of duplicate times a doc is migrated particularly for large shards. Implement a ceiling on the duration a lease for a work-item can reach.

Describe alternatives you've considered

  • We can upfront split up the shard into sub-shard work items that can be migrated in parallel, but this introduces complexity and increases unevenness of work distribution in the target cluster as when an index shard count remains the same during a migration, each sub-shard worker for a given source shard will hit a single node/shard in the target cluster.

Additional context

Jira Epic(s)

@sumobrian sumobrian added enhancement New feature or request untriaged labels Oct 23, 2024
@sumobrian sumobrian moved this from Not Committed to 3-6 Months in OpenSearch Migrations - Roadmap Oct 23, 2024
@sumobrian sumobrian changed the title [FEATURE] Optimized Reindex-from-Snapshot with Direct S3 Ingestion and Sub-Shard Parallelization Optimized Reindex-from-Snapshot with Direct S3 Ingestion and Sub-Shard Parallelization Oct 23, 2024
@sumobrian sumobrian changed the title Optimized Reindex-from-Snapshot with Direct S3 Ingestion and Sub-Shard Parallelization Optimized Reindex-from-Snapshot with Shard Parallelization Nov 5, 2024
@sumobrian sumobrian added MAv2.1 and removed MAv2.x labels Nov 5, 2024
@sumobrian sumobrian changed the title Optimized Reindex-from-Snapshot with Shard Parallelization Optimized Reindex-from-Snapshot with Sub-shard checkpoints Dec 7, 2024
@sumobrian sumobrian added MAv2.2 and removed MAv2.1 labels Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request MAv2.2
Projects
Status: Within 3 Months
Development

No branches or pull requests

1 participant