-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-9912. EC client Reusing thread resources #5791
Conversation
# Conflicts: # hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java # hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ECBlockOutputStream.java # hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/RatisBlockOutputStream.java # hadoop-hdds/client/src/test/java/org/apache/hadoop/hdds/scm/storage/TestBlockOutputStreamCorrectness.java # hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ec/reconstruction/ECReconstructionCoordinator.java # hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/BlockOutputStreamEntry.java # hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/BlockOutputStreamEntryPool.java # hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockOutputStreamEntry.java # hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECBlockOutputStreamEntryPool.java # hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java # hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/KeyOutputStream.java # hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java # hadoop-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/client/OzoneOutputStreamStub.java
@duongkame @jojochuang @kerneltime @tanvipenumudy @umamaheswararao please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM not that familiar with EC but the analysis and the code makes sense to me.
Great work!
@@ -2358,7 +2366,7 @@ private KeyOutputStream.Builder createKeyOutputStream( | |||
.enableUnsafeByteBufferConversion(unsafeByteBufferConversion) | |||
.setConfig(clientConfig) | |||
.setAtomicKeyCreation(isS3GRequest.get()) | |||
.setClientMetrics(clientMetrics) | |||
.setBlockOutPutStreamResourceProvider(blockOutPutStreamResourceProvider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but it seems the same ecWriteExecutor
is used for both EC and Ratis keys:
ozone/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
Lines 319 to 320 in f695a37
this.blockOutPutStreamResourceProvider = BlockOutPutStreamResourceProvider | |
.create(this::getECWriteThreadPool, ContainerClientMetrics.acquire()); |
@xichen01 Shouldn't we use a different resource provider for Ratis keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @adoroszlai @xichen01 , Thanks work on this, I think the reason for the current SingleThreadExecutor is because it uses asynchrony in an OutputStream and needs to guarantee the order of execution. If that's the case, is switching to a multi-threaded Executor still a guarantee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but it seems the same
ecWriteExecutor
is used for both EC and Ratis keys:ozone/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
Lines 319 to 320 in f695a37
this.blockOutPutStreamResourceProvider = BlockOutPutStreamResourceProvider .create(this::getECWriteThreadPool, ContainerClientMetrics.acquire()); @xichen01 Shouldn't we use a different resource provider for Ratis keys?
@adoroszlai Yes, since Ratis Key and EC Key reuse class BlockOutputStream
, the EC Key and Ratis Key both use this ecWriteExecutor
. I think that's acceptable, since ecWriteExecutor
just provides thread resources, not logic. And the Ratis key also benefits from not having to create a new thread for each Block. But mybe we need rename the ecWriteExecutor
to a more normal name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I agree, we should rename it if it's intended for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @adoroszlai @xichen01 , Thanks work on this, I think the reason for the current SingleThreadExecutor is because it uses asynchrony in an OutputStream and needs to guarantee the order of execution. If that's the case, is switching to a multi-threaded Executor still a guarantee?
@guohao-rosicky. I think it's okay here, because ecWriteExecutor
is just used to wait for an RPC response, not to send a request. the thread that sends request was always a fixed single thread.
@adoroszlai how do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the synchronized
be moved to AbstractCommitWatcher
? It is a much smaller class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the closure accesses a lot of external variables, which could be a potential thread-safety issue, so I just added a "big" synchronized
here, which ensures that even if someone changes the code later on, there's probably no thread-safety issue, and the thenApplyAsync
just sets/creates the variable, which is supposed to be lightweight.
So, I think the current approach is acceptable, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closure accesses a lot of external variables, which could be a potential thread-safety issue
I have the following concerns:
- Are the same external variables protected by the same object elsewhere? If not, there is no point in making this synchronized.
validateResponse
only accesses an atomic reference and the response object itself. Neither of those need to be synced.- Protobuf conversion (
BlockID.getFromProtobuf
) should not be performed while holding the lock. (this.)blockID
is also atomic, does not need the lock.- logging should not be performed while holding the lock.
So in the end I think only updateCommitInfo
needs to be protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronized
has been moved to AbstractCommitWatcher
.
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/OzoneClientConfig.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
Outdated
Show resolved
Hide resolved
...lient/src/main/java/org/apache/hadoop/ozone/client/io/BlockOutPutStreamResourceProvider.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java
Outdated
Show resolved
Hide resolved
...lient/src/main/java/org/apache/hadoop/ozone/client/io/BlockOutPutStreamResourceProvider.java
Outdated
Show resolved
Hide resolved
@xichen01 , thanks a lot for working on this. The idea of reusing threads is great! The change here is quite big. How about separating this into at least two JIRAs?
|
I understand, but although the code seems to have changed a lot, most of the changes were actually made to pass parameters, so it may not be very easy to split, but if you think it's necessary, I'll give it a try. |
# Conflicts: # hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ec/reconstruction/ECReconstructionCoordinator.java
I tried to review this but not able to reason about the changes. In particular, I like to understand why it needs the new class If the changes are just some simply, mechanical changes, it is not hard to review even the change size is large. However, it is hard to review when some complicated changes (e.g. synchronization) is mixed in a lot of simple changes. |
OK, thank for you explain, I will try to split this into two JIRAs. |
@xichen01 , thank a lot! Will review both JIRAs. |
What changes were proposed in this pull request?
When writing an EC key, multiple temporary threads are created, creating and destroying these threads will cause performance degradation, this PR improves the performance of writing an EC key by reusing thread resources.
The Root cause
For a 3 + 2 EC key 1
flushExecutor
thread and 5responseExecutors
are created.For a key in replica mode, only one
responseExecutor
is created, because replica mode only needs to write once and the rest of the replicas are synchronized by Ratis, whereas EC mode requires the Client to write all the strips to DNs.So temporary threads have less impact on performance in replica mode buckets, and more impact in EC buckets.
ozone/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java
Line 178 in 8b25c55
ozone/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java
Line 159 in 8b25c55
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9912
How was this patch tested?
Existing test
Performance testing result
https://issues.apache.org/jira/browse/HDDS-9911