-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-10384. RPC client Reusing thread resources. #6326
Conversation
@szetszwo could you please take a look? |
@xichen01 , thanks for digging out the root cause!
Is this an existing bug? It seems the previous PR does not cause the bug.
Let's don't change it. We may chain the futures as below: +++ b/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/RatisBlockOutputStream.java
@@ -115,7 +115,9 @@ void updateCommitInfo(XceiverClientReply reply, List<ChunkBuffer> buffers) {
@Override
void putFlushFuture(long flushPos,
CompletableFuture<ContainerCommandResponseProto> flushFuture) {
- commitWatcher.getFutureMap().put(flushPos, flushFuture);
+ commitWatcher.getFutureMap().compute(flushPos,
+ (key, previous) -> previous == null? flushFuture
+ : previous.thenCombine(flushFuture, (prev, curr) -> curr));
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
(cherry picked from commit 2f05353) Change-Id: I5134395348242e595ffe8f001aacc19bda0e3d4a
(cherry picked from commit 2f05353)
(cherry picked from commit 2f05353)
(cherry picked from commit 2f05353)
What changes were proposed in this pull request?
The old PR #6270 has been reverted due to a bug. This PR fixed the bug and recreated the PR.
root cause
The root cause of the bug in the old PR is that the future put into the
executePutBlock
ofCommitWatcher#futureMap
may be overwritten by the last closedexecutePutBlock
. Therefore, the buffer cannot be released correctly.Fix
Change the
CommitWatcher#futureMap
fromConcurrentMap<Long, CompletableFuture<xxx>>
toConcurrentMap<Long, List<CompletableFuture<xxx>>>
Records and releases all futures with the same key, so all the future can be released.A detail log about this bug:
2024-03-03 18:49:16,741 [Thread-955] ERROR storage.BlockOutputStream (BlockOutputStream.java:executePutBlock(512)) - executePutBlock putFlushFuture flushPos 4194304, flushFuture java.util.concurrent.CompletableFuture@8bc6bca[Not completed], close false, force false 2024-03-03 18:49:16,741 [Thread-955] ERROR storage.BlockOutputStream (RatisBlockOutputStream.java:putFlushFuture(120)) - putFlushFuture flushPos 4194304 flushFuture java.util.concurrent.CompletableFuture@8bc6bca[Not completed]
In the next log, we can see the entry {4194304, CompletableFuture@8bc6bca} has been overwritten by the {4194304, CompletableFuture@4230f10c}
2024-03-03 18:49:16,741 [Thread-955] ERROR storage.BlockOutputStream (BlockOutputStream.java:executePutBlock(512)) - executePutBlock putFlushFuture flushPos 4194304, flushFuture java.util.concurrent.CompletableFuture@4230f10c[Not completed], close true, force true 2024-03-03 18:49:16,741 [Thread-955] ERROR storage.BlockOutputStream (RatisBlockOutputStream.java:putFlushFuture(120)) - putFlushFuture flushPos 4194304 flushFuture java.util.concurrent.CompletableFuture@4230f10c[Not completed] 2024-03-03 18:49:16,741 [Thread-955] ERROR storage.BlockOutputStream (RatisBlockOutputStream.java:waitOnFlushFutures(127)) - waitOnFlushFutures getFutureMap keySet [4194304] keySet Value [java.util.concurrent.CompletableFuture@4230f10c[Not completed]]
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10384
How was this patch tested?
Twice 10x10 tests for TestSecureOzoneRpcClient,TestFreonWithPipelineDestroy,TestOzoneRpcClientWithRatis#ALL all passed. (In order to make all tests pass, this test code includes a fix for the unstable test
testParallelDeleteBucketAndCreateKey
HDDS-10143)https://github.com/xichen01/ozone/actions/runs/8138828724/attempts/1
https://github.com/xichen01/ozone/actions/runs/8138828724