Skip to content

Commit

Permalink
Replace multipart download with parallel file download (opensearch-pr…
Browse files Browse the repository at this point in the history
…oject#10519)

There are a few open issues with the multi-stream download approach:
 - Recovery stats are not being reported correctly
 - It is incompatible (short of reopening and re-reading the entire
   file) with the existing Lucene checksum validation logic
 - There are some issues with integrating it with the pending client
   side encryption work

Given this, I attempted an experiment where I replaced with
multi-stream-within-a-single-file approach with simply parallelizing
downloads across files (this is how snapshot restore works). I actually
got better results with this approach: recovering a ~52GiB shard took
about 4.7 minutes with the multi-stream code versus 3.9 minutes with the
parallel file approach (r7g.4xlarge EC2 instance, 500MiB/s EBS volume,
S3 as remote repository).

I think this is the right approach as it leverages the more
battle-tested code path and addresses the three issues listed above. The
multi-stream approach still has promise as it will allow us to download
very large files faster (whereas this approach they can be the long poll
on the transfer operation). However, given that 5GB segments (made up of
multiple files in practice) are the norm, we generally aren't dealing with
huge files.

Signed-off-by: Andrew Ross <andrross@amazon.com>
  • Loading branch information
andrross authored and austintlee committed Jan 19, 2024
1 parent 8c2c617 commit 3663301
Showing 1 changed file with 0 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,6 @@
import org.opensearch.index.seqno.SequenceNumbers;
import org.opensearch.index.shard.PrimaryReplicaSyncer.ResyncTask;
import org.opensearch.index.similarity.SimilarityService;
import org.opensearch.index.store.DirectoryFileTransferTracker;
import org.opensearch.index.store.RemoteSegmentStoreDirectory;
import org.opensearch.index.store.RemoteStoreFileDownloader;
import org.opensearch.index.store.Store;
Expand Down

0 comments on commit 3663301

Please sign in to comment.