Send file chunks asynchronously in peer recovery #39769

dnhatn · 2019-03-06T20:44:59Z

With this change, peer recovery will send file chunks asynchronously and concurrently. Recovery with compression enabled should be faster with this implementation. I will run a recovery benchmark after we agree on the approach.

Relates #36981

elasticmachine · 2019-03-06T20:45:02Z

Pinging @elastic/es-distributed

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

original-brownbear

LGTM just a few comments/questions :)

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

s1monw

I left some comments and questions

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

s1monw · 2019-03-09T17:36:12Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

-        if (error.get() == null) {
-            cancellableThreads.execute(() -> requestSeqIdTracker.waitForOpsToComplete(requestSeqIdTracker.getMaxSeqNo()));
+
+        synchronized FileChunk readChunk(final byte[] buffer) throws Exception {


I don't understand why we parallelize the reading on top of a single file and then synchronize all of it. This doesn't make sense to me. I think we should build the model on top of the file and chuck head of time. ie. if we want to read with N threads in parallel then chunk the file up in N pieces and send them all in parallel. That means we must write them in the correct places on the other side as well but blocking on the read side here is not making much sense to me.

Another option is to have a multiplexer if you want to make use of the parallelism between sending and reading. We need some kind of threadpool and a task queue for that. Once I am done reading a chunk I put it on a queue and read the next chunk. Another worker can then pick it up and send it. If the queue fills up we add more threads until we saturate. Or we do reading and sending in the same thread but notify others that another chunk can be read. But there is so much blocking going on here I feel like we didn't make the right design decisions?

dnhatn · 2019-03-10T20:39:04Z

@s1monw Thanks for reviewing.

We introduced SeqId to FileChunk in #36981. The idea was to send up to N consecutive file chunks without waiting for replies from the recovery target. We used SeqId instead of Semaphore to make sure that the recovery target won't buffer more than N chunks in memory in any situation. This change reduces the recovery time significantly without using any extra thread.

This PR makes #36981 non-blocking with these keys: (1) maintain the recovery time, (2) does not require extra threads, (3) never block any thread. Here we use a semaphore to ensure that only one thread can read file chunks. Other threads can quickly check this condition and exit without being blocked.

I hope this clarifies the approach. As I said in the PR description, I am open to suggestions.

s1monw · 2019-03-11T09:43:23Z

I looked at the non-blocking version and it's more intuitive here. I would still like to have a comment what we are trying to do with the seqIds etc. What confuses me is the partial synchronization
here and here that just doesn't make much sense. I think we have a single thread that reads and puts chunks on the network and not multiple, correct? If that is the case why don't we make the entire sendFileChunks method synchronized and remove all the primitives? I would love to have this done simpler such that there is only a single owner thread at any time. The one that holds the lock.

s1monw · 2019-03-11T09:40:22Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

+                                r -> {
+                                    recycledBuffers.addFirst(buffer);
+                                    requestSeqIdTracker.markSeqNoAsCompleted(chunk.seqId);
+                                    sendFileChunks(listener);


can we make sure we always fork here somehow? I am a bit worried that we are ending up with a stack overflow? Like we can assert that we don't have sendFileChunks in the stacktrace for instance.

I opened #39988 for this.

dnhatn · 2019-06-26T11:47:04Z

I am closing this PR and will open a new one. @original-brownbear @s1monw Thanks for looking.

Send file chunks asynchronously in peer recovery

0492097

dnhatn added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.2.0 labels Mar 6, 2019

dnhatn requested review from s1monw and ywelsch March 6, 2019 20:45

Merge branch 'master' into async-chunks

f45f369

original-brownbear reviewed Mar 7, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java Show resolved Hide resolved

don’t check for cancellation

e9e3584

dnhatn requested a review from original-brownbear March 7, 2019 20:29

original-brownbear approved these changes Mar 8, 2019

View reviewed changes

s1monw requested changes Mar 9, 2019

View reviewed changes

dnhatn added 3 commits March 9, 2019 13:10

Merge branch 'master' into async-chunks

fd4fb60

feedback

1dc4823

handle out of order in test

24519f1

dnhatn requested a review from s1monw March 10, 2019 20:40

s1monw reviewed Mar 11, 2019

View reviewed changes

assert method is not called recursively

037cdc6

dnhatn mentioned this pull request Mar 13, 2019

Ensure sendBatch not called recursively #39988

Merged

dnhatn removed the v7.2.0 label May 21, 2019

dnhatn removed the v8.0.0 label Jun 26, 2019

dnhatn closed this Jun 26, 2019

dnhatn deleted the async-chunks branch June 26, 2019 11:47

dnhatn mentioned this pull request Jul 7, 2019

Make peer recovery send file chunks async #44040

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send file chunks asynchronously in peer recovery #39769

Send file chunks asynchronously in peer recovery #39769

dnhatn commented Mar 6, 2019

elasticmachine commented Mar 6, 2019

original-brownbear left a comment

s1monw left a comment

s1monw Mar 9, 2019

s1monw Mar 9, 2019

dnhatn commented Mar 10, 2019

s1monw commented Mar 11, 2019

s1monw Mar 11, 2019

dnhatn Mar 13, 2019

dnhatn commented Jun 26, 2019

Send file chunks asynchronously in peer recovery #39769

Send file chunks asynchronously in peer recovery #39769

Conversation

dnhatn commented Mar 6, 2019

elasticmachine commented Mar 6, 2019

original-brownbear left a comment

Choose a reason for hiding this comment

s1monw left a comment

Choose a reason for hiding this comment

s1monw Mar 9, 2019

Choose a reason for hiding this comment

s1monw Mar 9, 2019

Choose a reason for hiding this comment

dnhatn commented Mar 10, 2019

s1monw commented Mar 11, 2019

s1monw Mar 11, 2019

Choose a reason for hiding this comment

dnhatn Mar 13, 2019

Choose a reason for hiding this comment

dnhatn commented Jun 26, 2019