HDDS-1610. applyTransaction failure should not be lost on restart. #1226

bshashikant · 2019-08-05T08:21:46Z

No description provided.

hadoop-yetus · 2019-08-05T11:27:53Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	46	Docker mode activated.
		_ Prechecks _
+1	dupname	0	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 1 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	96	Maven dependency ordering for branch
+1	mvninstall	751	trunk passed
+1	compile	466	trunk passed
+1	checkstyle	86	trunk passed
+1	mvnsite	0	trunk passed
+1	shadedclient	1092	branch has no errors when building and testing our client artifacts.
+1	javadoc	189	trunk passed
0	spotbugs	555	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	847	trunk passed
		_ Patch Compile Tests _
0	mvndep	37	Maven dependency ordering for patch
+1	mvninstall	676	the patch passed
+1	compile	489	the patch passed
+1	javac	489	the patch passed
-0	checkstyle	40	hadoop-ozone: The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1	mvnsite	0	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	876	patch has no errors when building and testing our client artifacts.
+1	javadoc	179	the patch passed
+1	findbugs	720	the patch passed
		_ Other Tests _
+1	unit	330	hadoop-hdds in the patch passed.
-1	unit	2435	hadoop-ozone in the patch failed.
+1	asflicense	38	The patch does not generate ASF License warnings.
		9613

Reason	Tests
Failed junit tests	hadoop.hdds.scm.pipeline.TestPipelineClose
	hadoop.ozone.client.rpc.TestContainerStateMachine
	hadoop.ozone.om.TestScmSafeMode
	hadoop.ozone.web.client.TestBuckets
	hadoop.ozone.om.TestOMRatisSnapshots
	hadoop.ozone.client.rpc.TestContainerStateMachineFailures
	hadoop.ozone.client.rpc.TestOzoneRpcClient
	hadoop.ozone.om.TestOzoneManagerHA
	hadoop.ozone.om.TestContainerReportWithKeys
	hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
	hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
	hadoop.ozone.client.rpc.TestCommitWatcher
	hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
	hadoop.ozone.dn.scrubber.TestDataScrubber
	hadoop.ozone.web.TestOzoneVolumes
	hadoop.ozone.container.common.transport.server.ratis.TestCSMMetrics
	hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures

Subsystem	Report/Notes
Docker	Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/1/artifact/out/Dockerfile
GITHUB PR	#1226
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux 48b3f23d31a4 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `f8ea6e1`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/1/artifact/out/diff-checkstyle-hadoop-ozone.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/1/artifact/out/patch-unit-hadoop-ozone.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/1/testReport/
Max. process+thread count	3858 (vs. ulimit of 5500)
modules	C: hadoop-hdds/container-service hadoop-ozone/integration-test U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/1/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

supratimdeka

+1
I am ok with the changes, after the comments have been resolved/clarified.

supratimdeka · 2019-08-06T12:15:46Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

+        if (r.getResult() != ContainerProtos.Result.SUCCESS) {
+          StorageContainerException sce =
+              new StorageContainerException(r.getMessage(), r.getResult());
+          LOG.error(gid + ": ApplyTransaction failed: cmd " + r.getCmdType()


Should we add containerID in this error log?

Container Id will be present in the Response Message. Will add that to the logger output.

lets convert this to parameterized logging.

supratimdeka · 2019-08-06T12:30:34Z

...java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java

+    String msg =
+        "Ratis Transaction failure in datanode" + dnId + " with role " + role
+            + " Triggering pipeline close action.";
+    triggerPipelineClose(groupId, msg, ClosePipelineInfo.Reason.PIPELINE_FAILED,


Do we need a new reason code here? Pipeline Failed is getting overloaded.

I think, the msg will differentiate what was the cause of the error. The reason code is just for SCM to take action of closing the pipeline. I don't think possibly SCM needs to differentiate its behaviour depending on why the pipelien failed.

If required, we can add it in a separate jira as it needs to change for other reasons of pipeline failure.

Lets add a new failure reason here as @supratimdeka is suggesting. This will help in identifying difference failure via metrics in SCM.

supratimdeka · 2019-08-06T12:31:57Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

+      client.sendCommand(request.build());
+      Assert.fail("Expected exception not thrown");
+    } catch (IOException e) {
+    }


Perhaps a log message here to say that the test caught an IOException as expected by the test case.

Will address in the next patch..

supratimdeka · 2019-08-06T13:02:38Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

+    XceiverServerRatis raftServer = (XceiverServerRatis)
+        cluster.getHddsDatanodes().get(0).getDatanodeStateMachine()
+            .getContainer().getWriteChannel();
+    Assert.assertTrue(raftServer.isClosed());


one key goal of HDDS-1610 is to ensure that no snapshot can be taken after a log apply failure.

Should the unit test include this assertion? Perhaps by setting the autoTriggerThreshold in Ratis to take a snapshot after every applyLog.

will address in the next patch.

mukul1987

Thanks for working on this @bshashikant.
This patch does not fix the problem in HDDS-1610, as even after a applyTransactionFailure,
the raftServer shutdown will trigger a snapshot create as part of shutdown.

I also feel that the metadata around failure should be maintained inside the ContainerStateMachine as the shutdown behavior of the ratis server can most probably change in the future.

mukul1987 · 2019-08-06T17:14:10Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

        updateLastApplied();
-      }).whenComplete((r, t) -> applyTransactionSemaphore.release());
-      return future;
+        applyTransactionSemaphore.release();


This should go inside a finally, to make sure the semaphore is released always.

mukul1987 · 2019-08-06T17:18:05Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

+    key.write("ratis".getBytes());
+
+    //get the name of a valid container
+    OmKeyArgs keyArgs = new OmKeyArgs.Builder().setVolumeName(volumeName).


This is unused.

mukul1987 · 2019-08-06T17:19:31Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

+    request.setContainerID(containerID);
+    request.setCloseContainer(
+        ContainerProtos.CloseContainerRequestProto.getDefaultInstance());
+    // close container transaction will fail over Ratis and will cause the raft


incomplete comment here.

mukul1987 · 2019-08-06T17:19:53Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

+      client.sendCommand(request.build());
+      Assert.fail("Expected exception not thrown");
+    } catch (IOException e) {
+    }


What exception are we expecting here ?

mukul1987 · 2019-08-06T17:22:31Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

+    Pipeline pipeline = cluster.getStorageContainerLocationClient()
+        .getContainerWithPipeline(containerID).getPipeline();
+    XceiverClientSpi client = xceiverClientManager.acquireClient(pipeline);
+    ContainerProtos.ContainerCommandRequestProto.Builder request =


How about we append some more data to the key and flush again ? in place of a close ?

The idea is to execute a transaction on the same container. If we write more data , it can potentially go a new container altogether.

mukul1987 · 2019-08-06T17:29:23Z

...java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java

@@ -630,6 +640,10 @@ void handleInstallSnapshotFromLeader(RaftGroupId groupId,
    handlePipelineFailure(groupId, roleInfoProto);
  }

+  @VisibleForTesting
+  public boolean isClosed() {


lets remove this. and this can be replaced with

raftServer.getServer().getLifeCycleState().isClosingOrClosed()

mukul1987 · 2019-08-06T17:30:07Z

...java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java

+            + " Triggering pipeline close action.";
+    triggerPipelineClose(groupId, msg, ClosePipelineInfo.Reason.PIPELINE_FAILED,
+        false);
+    stop();


We do not necessarily need to stop the raftServer here, for the other container's we can still keep on applying the transaction

As far as i know from previous discussions , the decision was to not take any other transactions on this pipeline at all and kill the RaftServerImpl instance. Any deviation from that conclusion?

mukul1987 · 2019-08-06T17:40:20Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

-          .supplyAsync(() -> runCommand(requestProto, builder.build()),
+      CompletableFuture<ContainerCommandResponseProto> future =
+          CompletableFuture.supplyAsync(
+              () -> runCommandGetResponse(requestProto, builder.build()),


lets rename runCommandGetResponse and remove runCommand as all the existing caller of the earlier function runCommand can be removed.

mukul1987 · 2019-08-06T17:43:10Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

+        if (r.getResult() != ContainerProtos.Result.SUCCESS) {
+          StorageContainerException sce =
+              new StorageContainerException(r.getMessage(), r.getResult());
+          LOG.error(gid + ": ApplyTransaction failed: cmd " + r.getCmdType()


lets convert this to parameterized logging.

mukul1987 · 2019-08-06T17:44:23Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

              requestProto.getWriteChunk().getChunkData().getLen());
+          LOG.debug(gid + ": ApplyTransaction completed: cmd " + r.getCmdType()


if this is a success, then "" Error message: " + r.getMessage()" will not be the right thing to print here.

bshashikant · 2019-08-07T02:24:34Z

Thanks @mukul1987 . In ratis, as far as my understanding goes, before taking a snapshot we wait for all the pending applyTrannsaction futures to complete and since now with the patch, the applyTransaction exception is being propagated to Ratis, ideally snapshot creation will fail in Ratis.

I will add a test case to verify the same.
I will address the remaining review comments as part of the next patch.

hadoop-yetus · 2019-08-08T01:44:14Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	35	Docker mode activated.
		_ Prechecks _
+1	dupname	0	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 1 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	73	Maven dependency ordering for branch
+1	mvninstall	612	trunk passed
+1	compile	387	trunk passed
+1	checkstyle	76	trunk passed
+1	mvnsite	0	trunk passed
+1	shadedclient	916	branch has no errors when building and testing our client artifacts.
+1	javadoc	177	trunk passed
0	spotbugs	467	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	687	trunk passed
		_ Patch Compile Tests _
0	mvndep	76	Maven dependency ordering for patch
+1	mvninstall	564	the patch passed
+1	compile	382	the patch passed
+1	javac	382	the patch passed
-0	checkstyle	42	hadoop-ozone: The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1	mvnsite	0	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	778	patch has no errors when building and testing our client artifacts.
+1	javadoc	179	the patch passed
+1	findbugs	731	the patch passed
		_ Other Tests _
+1	unit	350	hadoop-hdds in the patch passed.
-1	unit	371	hadoop-ozone in the patch failed.
+1	asflicense	36	The patch does not generate ASF License warnings.
		6653

Subsystem	Report/Notes
Docker	Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/2/artifact/out/Dockerfile
GITHUB PR	#1226
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux 81e1c0db1839 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `3cc0ace`
Default Java	1.8.0_222
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/2/artifact/out/diff-checkstyle-hadoop-ozone.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/2/artifact/out/patch-unit-hadoop-ozone.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/2/testReport/
Max. process+thread count	1105 (vs. ulimit of 5500)
modules	C: hadoop-hdds/container-service hadoop-ozone/integration-test U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/2/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

supratimdeka · 2019-08-08T07:49:39Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

        updateLastApplied();
-      }).whenComplete((r, t) -> applyTransactionSemaphore.release());
-      return future;
+        applyTransactionSemaphore.release();


I think we should keep the whenComplete() stage at the end.
releasing the semaphore from a whenComplete() stage guarantees that the semaphore will be released even if the processing inside thenApply() stage hits an exception. This seems to me to be a good practice.

hadoop-yetus · 2019-08-08T10:22:46Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	43	Docker mode activated.
		_ Prechecks _
+1	dupname	0	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 1 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	22	Maven dependency ordering for branch
+1	mvninstall	585	trunk passed
+1	compile	352	trunk passed
+1	checkstyle	66	trunk passed
+1	mvnsite	0	trunk passed
+1	shadedclient	834	branch has no errors when building and testing our client artifacts.
+1	javadoc	146	trunk passed
0	spotbugs	459	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	657	trunk passed
		_ Patch Compile Tests _
0	mvndep	29	Maven dependency ordering for patch
+1	mvninstall	574	the patch passed
+1	compile	376	the patch passed
+1	javac	376	the patch passed
-0	checkstyle	37	hadoop-ozone: The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1	mvnsite	0	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	638	patch has no errors when building and testing our client artifacts.
+1	javadoc	165	the patch passed
+1	findbugs	667	the patch passed
		_ Other Tests _
+1	unit	292	hadoop-hdds in the patch passed.
-1	unit	1955	hadoop-ozone in the patch failed.
+1	asflicense	40	The patch does not generate ASF License warnings.
		7657

Reason	Tests
Failed junit tests	hadoop.ozone.container.common.transport.server.ratis.TestCSMMetrics
	hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
	hadoop.ozone.client.rpc.Test2WayCommitInRatis
	hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory
	hadoop.ozone.client.rpc.TestOzoneRpcClient
	hadoop.ozone.om.TestKeyManagerImpl
	hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
	hadoop.ozone.om.TestScmSafeMode
	hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures

Subsystem	Report/Notes
Docker	Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/3/artifact/out/Dockerfile
GITHUB PR	#1226
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
uname	Linux f451f6c48524 4.4.0-157-generic #185-Ubuntu SMP Tue Jul 23 09:17:01 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `00b5a27`
Default Java	1.8.0_212
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/3/artifact/out/diff-checkstyle-hadoop-ozone.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/3/artifact/out/patch-unit-hadoop-ozone.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/3/testReport/
Max. process+thread count	5342 (vs. ulimit of 5500)
modules	C: hadoop-hdds/container-service hadoop-ozone/integration-test U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/3/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2019-08-13T13:33:16Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	142	Docker mode activated.
		_ Prechecks _
+1	dupname	1	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 3 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	88	Maven dependency ordering for branch
+1	mvninstall	804	trunk passed
+1	compile	444	trunk passed
+1	checkstyle	86	trunk passed
+1	mvnsite	0	trunk passed
+1	shadedclient	958	branch has no errors when building and testing our client artifacts.
+1	javadoc	166	trunk passed
0	spotbugs	445	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	644	trunk passed
		_ Patch Compile Tests _
0	mvndep	32	Maven dependency ordering for patch
+1	mvninstall	595	the patch passed
+1	compile	401	the patch passed
+1	cc	401	the patch passed
+1	javac	401	the patch passed
-0	checkstyle	42	hadoop-ozone: The patch generated 8 new + 0 unchanged - 0 fixed = 8 total (was 0)
+1	mvnsite	0	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	731	patch has no errors when building and testing our client artifacts.
+1	javadoc	175	the patch passed
+1	findbugs	699	the patch passed
		_ Other Tests _
+1	unit	361	hadoop-hdds in the patch passed.
-1	unit	2308	hadoop-ozone in the patch failed.
+1	asflicense	45	The patch does not generate ASF License warnings.
		8928

Reason	Tests
Failed junit tests	hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
	hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
	hadoop.ozone.client.rpc.TestContainerStateMachineFailures
	hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
	hadoop.ozone.client.rpc.TestOzoneRpcClient
	hadoop.ozone.container.server.TestSecureContainerServer
	hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory
	hadoop.ozone.container.common.transport.server.ratis.TestCSMMetrics

Subsystem	Report/Notes
Docker	Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/4/artifact/out/Dockerfile
GITHUB PR	#1226
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc
uname	Linux 07f043e4650b 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `0b507d2`
Default Java	1.8.0_222
checkstyle	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/4/artifact/out/diff-checkstyle-hadoop-ozone.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/4/artifact/out/patch-unit-hadoop-ozone.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/4/testReport/
Max. process+thread count	4545 (vs. ulimit of 5500)
modules	C: hadoop-hdds/container-service hadoop-ozone/integration-test hadoop-ozone/tools U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/4/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

mukul1987

Thanks for working on this @bshashikant. The patch looks good to me. Some minor comments.

mukul1987 · 2019-08-14T08:57:53Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

+    if (!isStateMachineHealthy.get()) {
+      String msg =
+          "Failed to take snapshot " + " for " + gid + " as the stateMachine"
+              + " is unhealthy. The last applied index is at " + ti;


lets log this as well.

Addressed in the latest patch.

mukul1987 · 2019-08-14T08:59:22Z

...a/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java

+          // before any further snapshot is taken , the exception will be
+          // caught in stateMachineUpdater in Ratis and ratis server will
+          // shutdown.
+          applyTransactionFuture.completeExceptionally(sce);


lets move the ratisServer.handleApplyTransactionFailure(gid, trx.getServerRole()); as the last line in the if block.

Addressed in the latest patch.

hadoop-yetus · 2019-08-14T10:31:23Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	70	Docker mode activated.
		_ Prechecks _
+1	dupname	0	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 3 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	70	Maven dependency ordering for branch
-1	mvninstall	141	hadoop-ozone in trunk failed.
-1	compile	49	hadoop-ozone in trunk failed.
+1	checkstyle	63	trunk passed
+1	mvnsite	0	trunk passed
+1	shadedclient	915	branch has no errors when building and testing our client artifacts.
+1	javadoc	156	trunk passed
0	spotbugs	201	Used deprecated FindBugs config; considering switching to SpotBugs.
-1	findbugs	101	hadoop-ozone in trunk failed.
		_ Patch Compile Tests _
0	mvndep	23	Maven dependency ordering for patch
-1	mvninstall	137	hadoop-ozone in the patch failed.
-1	compile	52	hadoop-ozone in the patch failed.
-1	cc	52	hadoop-ozone in the patch failed.
-1	javac	52	hadoop-ozone in the patch failed.
+1	checkstyle	63	the patch passed
+1	mvnsite	0	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	721	patch has no errors when building and testing our client artifacts.
+1	javadoc	153	the patch passed
-1	findbugs	101	hadoop-ozone in the patch failed.
		_ Other Tests _
+1	unit	339	hadoop-hdds in the patch passed.
-1	unit	329	hadoop-ozone in the patch failed.
+1	asflicense	29	The patch does not generate ASF License warnings.
		4445

Reason	Tests
Failed junit tests	hadoop.ozone.s3.TestOzoneClientProducer

Subsystem	Report/Notes
Docker	Client=19.03.0 Server=19.03.0 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/Dockerfile
GITHUB PR	#1226
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc
uname	Linux 960f8c74122e 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `0e4b757`
Default Java	1.8.0_212
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/branch-mvninstall-hadoop-ozone.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/branch-compile-hadoop-ozone.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/branch-findbugs-hadoop-ozone.txt
mvninstall	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/patch-mvninstall-hadoop-ozone.txt
compile	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/patch-compile-hadoop-ozone.txt
cc	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/patch-compile-hadoop-ozone.txt
javac	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/patch-compile-hadoop-ozone.txt
findbugs	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/patch-findbugs-hadoop-ozone.txt
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/artifact/out/patch-unit-hadoop-ozone.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/testReport/
Max. process+thread count	427 (vs. ulimit of 5500)
modules	C: hadoop-hdds/container-service hadoop-ozone/integration-test hadoop-ozone/tools U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/5/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

mukul1987

+1, the patch looks good to me.

mukul1987 · 2019-08-16T10:05:50Z

/retest

hadoop-yetus · 2019-08-16T17:35:50Z

💔 -1 overall

Vote	Subsystem	Runtime	Comment
0	reexec	79	Docker mode activated.
		_ Prechecks _
+1	dupname	0	No case conflicting files found.
+1	@author	0	The patch does not contain any @author tags.
+1	test4tests	0	The patch appears to include 3 new or modified test files.
		_ trunk Compile Tests _
0	mvndep	24	Maven dependency ordering for branch
+1	mvninstall	665	trunk passed
+1	compile	409	trunk passed
+1	checkstyle	82	trunk passed
+1	mvnsite	0	trunk passed
+1	shadedclient	976	branch has no errors when building and testing our client artifacts.
+1	javadoc	174	trunk passed
0	spotbugs	465	Used deprecated FindBugs config; considering switching to SpotBugs.
+1	findbugs	673	trunk passed
		_ Patch Compile Tests _
0	mvndep	34	Maven dependency ordering for patch
+1	mvninstall	626	the patch passed
+1	compile	429	the patch passed
+1	cc	429	the patch passed
+1	javac	429	the patch passed
+1	checkstyle	86	the patch passed
+1	mvnsite	0	the patch passed
+1	whitespace	0	The patch has no whitespace issues.
+1	shadedclient	825	patch has no errors when building and testing our client artifacts.
+1	javadoc	215	the patch passed
+1	findbugs	802	the patch passed
		_ Other Tests _
+1	unit	381	hadoop-hdds in the patch passed.
-1	unit	3032	hadoop-ozone in the patch failed.
+1	asflicense	42	The patch does not generate ASF License warnings.
		9719

Reason	Tests
Failed junit tests	hadoop.ozone.container.common.transport.server.ratis.TestCSMMetrics
	hadoop.ozone.client.rpc.TestContainerStateMachineFailures
	hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
	hadoop.ozone.container.server.TestSecureContainerServer
	hadoop.ozone.client.rpc.TestOzoneRpcClientForAclAuditLog
	hadoop.hdds.scm.pipeline.TestRatisPipelineCreateAndDestory

Subsystem	Report/Notes
Docker	Client=19.03.0 Server=19.03.0 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/6/artifact/out/Dockerfile
GITHUB PR	#1226
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc
uname	Linux 6314dd3b3d85 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	personality/hadoop.sh
git revision	trunk / `9b8359b`
Default Java	1.8.0_212
unit	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/6/artifact/out/patch-unit-hadoop-ozone.txt
Test Results	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/6/testReport/
Max. process+thread count	4716 (vs. ulimit of 5500)
modules	C: hadoop-hdds/container-service hadoop-ozone/integration-test hadoop-ozone/tools U: .
Console output	https://builds.apache.org/job/hadoop-multibranch/job/PR-1226/6/console
versions	git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by	Apache Yetus 0.10.0 http://yetus.apache.org

This message was automatically generated.

bshashikant · 2019-08-19T15:16:22Z

The test failures are not related to the patch.

mukul1987 · 2019-08-20T05:35:01Z

Thanks for working on this @bshashikant. +1 the patch looks good to me. The test failures do not look related.

bshashikant · 2019-08-20T09:17:35Z

Thanks @mukul1987 and @supratimdeka for the review. I have committed the change to trunk.

cawatson · 2019-08-21T13:29:15Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

-     * during key.close() because the first container is UNHEALTHY by that time
-     */
-    Assert.assertTrue("Expect Key to be stored in 2 separate containers",
-        keyDetails.getOzoneKeyLocations().size() == 2);
  }


I am creating a tool to aid developers by partially automating unit testing.
Could you speak to the validity of these generated assert statement for the given test method?

org.junit.Assert.assertEquals( OmKeyLocationInfo.getHddsDatanodes(), groupOutputStream.getHddsDatanodes( groupOutputStream ));

Thank you.

cawatson · 2019-08-21T13:34:01Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

@@ -270,4 +269,83 @@ public void testUnhealthyContainer() throws Exception {
    Assert.assertEquals(ContainerProtos.Result.CONTAINER_UNHEALTHY,
        dispatcher.dispatch(request.build(), null).getResult());
  }


Could you speak to the validity of these generated assert statement for the above test method?

org.junit.Assert.assertTrue( ONE.readContainerFile( node ) );
org.junit.Assert.assertTrue( ONE.readContainerFile( node.toString() ));
org.junit.Assert.assertTrue( ONE.readContainerFile( node.toString(), KeyOutputStream ))

Thank you.

cawatson · 2019-08-21T13:37:02Z

...test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java

+    // Make sure the latest snapshot is same as the previous one
+    FileInfo latestSnapshot = storage.findLatestSnapshot().getFile();
+    Assert.assertTrue(snapshot.getPath().equals(latestSnapshot.getPath()));
+  }
 }


Could you speak to the validity of these generated assert statement for the above test method?

org.junit.Assert.assertTrue( name, volumeName.isValid() )

Thank you.

…ntributed by Shashikant Banerjee(#1226).

…ntributed by Shashikant Banerjee(apache#1226).

HDDS-1610. applyTransaction failure should not be lost on restart.

a6f60eb

bshashikant requested a review from mukul1987 August 5, 2019 08:21

bshashikant added the ozone label Aug 5, 2019

supratimdeka approved these changes Aug 6, 2019

View reviewed changes

mukul1987 requested changes Aug 6, 2019

View reviewed changes

supratimdeka reviewed Aug 8, 2019

View reviewed changes

Addressed review comments.

529a05b

mukul1987 requested changes Aug 14, 2019

View reviewed changes

Addressed review comments and checkstyle issues.

412a7ff

mukul1987 approved these changes Aug 16, 2019

View reviewed changes

bshashikant added 2 commits August 19, 2019 14:40

Addressed related test failures

0d301a0

Fix the test failure related to CSMMetrics.

d559ba0

bshashikant closed this Aug 20, 2019

cawatson reviewed Aug 21, 2019

View reviewed changes

asfgit pushed a commit that referenced this pull request Aug 27, 2019

HDDS-1610. applyTransaction failure should not be lost on restart. Co…

66cfa48

…ntributed by Shashikant Banerjee(#1226).

bshashikant deleted the HDDS-1610 branch August 27, 2019 18:10

amahussein pushed a commit to amahussein/hadoop that referenced this pull request Oct 29, 2019

HDDS-1610. applyTransaction failure should not be lost on restart. Co…

746091a

…ntributed by Shashikant Banerjee(apache#1226).

RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020

HDDS-1610. applyTransaction failure should not be lost on restart. Co…

3b3baa4

…ntributed by Shashikant Banerjee(apache#1226).

		requestProto.getWriteChunk().getChunkData().getLen());
		LOG.debug(gid + ": ApplyTransaction completed: cmd " + r.getCmdType()

HDDS-1610. applyTransaction failure should not be lost on restart. #1226

HDDS-1610. applyTransaction failure should not be lost on restart. #1226

Conversation

bshashikant commented Aug 5, 2019

hadoop-yetus commented Aug 5, 2019

supratimdeka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mukul1987 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bshashikant commented Aug 7, 2019 • edited Loading

hadoop-yetus commented Aug 8, 2019

Choose a reason for hiding this comment

hadoop-yetus commented Aug 8, 2019

hadoop-yetus commented Aug 13, 2019

mukul1987 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Aug 14, 2019

mukul1987 left a comment

Choose a reason for hiding this comment

mukul1987 commented Aug 16, 2019

hadoop-yetus commented Aug 16, 2019

bshashikant commented Aug 19, 2019

mukul1987 commented Aug 20, 2019

bshashikant commented Aug 20, 2019

cawatson Aug 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bshashikant commented Aug 7, 2019 •

edited

Loading

cawatson Aug 21, 2019 •

edited

Loading