HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster #7665

aryangupta1998 · 2025-01-08T12:14:15Z

What changes were proposed in this pull request?

In case of resetDeletedBlockRetryCount with --all option, scm takes lock and tries to get all the transaction with max retry and then updates DB with 0 count. In some large scale env this count can be huge which can lead to multiple problem.

i) Lock can lead to block all other normal operation.

ii) Since message is passed through ratis, which will fail because of size.

Instead of doing like above we should do this operation in batches to avoid long lock and ratis message size failure.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11714

How was this patch tested?

Tested Manually.

nandakumar131 · 2025-01-10T16:00:03Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

+  @Override
+  public List<DeletedBlocksTransaction> getFailedTransactionsBatch(
+      int batchSize, long startTxId) throws IOException {
+    List<DeletedBlocksTransaction> failedTXs = new ArrayList<>();
+
+    lock.lock();
+    try {
+      try (
+          TableIterator<Long, ? extends Table.KeyValue<Long, DeletedBlocksTransaction>> iter =
+              deletedBlockLogStateManager.getReadOnlyIterator()) {
+
+        iter.seek(startTxId);
+
+        while (iter.hasNext() && failedTXs.size() < batchSize) {
+          DeletedBlocksTransaction delTX = iter.next().getValue();
+          if (delTX.getCount() == -1) {
+            failedTXs.add(delTX);
+          }
+        }
+      }
+    } finally {
+      lock.unlock();
+    }
+
+    return failedTXs;
+  }
+


Why do we need this additional method? The same thing can be achieved with the existing getFailedTransactions method.

nandakumar131 · 2025-01-10T16:14:04Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

+
+        } while (!batch.isEmpty());
+      } else {
+        // Process txIDs provided by the user in batches


The user provided list of txIDs reaches SCM via RPC call, so it's ok to process this in single go.

…ause long db lock in large cluster

nandakumar131 · 2025-01-23T11:26:47Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java

-      AUDIT.logReadFailure(
-          buildAuditMessageForFailure(
-              SCMAction.GET_FAILED_DELETED_BLOCKS_TRANSACTION, auditMap, ex)
-      );


Please avoid unnecessary formatting.

nandakumar131 · 2025-01-23T11:26:54Z

...e/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestStorageContainerManager.java

-    // These blocks cannot be found in the container, skip deleting them
-    // eventually these TX will success.


Please avoid unnecessary formatting.

nandakumar131 · 2025-01-23T11:27:33Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

@@ -129,7 +129,7 @@ public List<DeletedBlocksTransaction> getFailedTransactions(int count,
      final List<DeletedBlocksTransaction> failedTXs = Lists.newArrayList();
      try (TableIterator<Long,
          ? extends Table.KeyValue<Long, DeletedBlocksTransaction>> iter =
-               deletedBlockLogStateManager.getReadOnlyIterator()) {
+          deletedBlockLogStateManager.getReadOnlyIterator()) {


Please avoid unnecessary formatting.

nandakumar131 · 2025-01-23T11:49:06Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

@@ -176,18 +176,52 @@ public void incrementCount(List<Long> txIDs)
   */
  @Override
  public int resetCount(List<Long> txIDs) throws IOException {


You can simplify the whole logic

public int resetCount(List<Long> txIDs) throws IOException { if (txIDs != null) { try { lock.lock(); transactionStatusManager.resetRetryCount(txIDs); return deletedBlockLogStateManager.resetRetryCountOfTransactionInDB( new ArrayList<>(txIDs)); } finally { lock.unlock(); } catch (Exception e) { throw new IOException("Error during transaction reset", e); } } final int batchSize = 1000; int totalProcessed = 0; long startTxId = 0; try { // If txIDs are not provided, fetch all failed transactions in batches List<DeletedBlocksTransaction> batch; do { // Fetch the batch of failed transactions batch = getFailedTransactions(batchSize, startTxId); List<Long> batchTxIDs = batch.stream().map(DeletedBlocksTransaction::getTxID).collect(Collectors.toList()); totalProcessed += resetCount(batchTxIDs); // Update startTxId to continue from the last processed transaction in the next iteration startTxId = batch.get(batch.size() - 1).getTxID() + 1; } while (!batch.isEmpty()); } retrun totalProcessed; }

nandakumar131 · 2025-01-24T10:07:29Z

@aryangupta1998 the test failure seems related to this change, can you take a look at it?

aryangupta1998 · 2025-02-04T05:55:39Z

Thanks @nandakumar131, fixed the test case!

sadanand48

LGTM

nandakumar131 · 2025-02-04T06:44:33Z

Thanks @aryangupta1998 for the contribution. Thanks @sadanand48 for the review.

* master: (168 commits) HDDS-12112. Fix interval used for Chunk Read/Write Dashboard (apache#7724) HDDS-12212. Fix grammar in decommissioning and observability documentation (apache#7815) HDDS-12195. Implement skip() in OzoneFSInputStream (apache#7801) HDDS-12200. Fix grammar in OM HA, EC and Snapshot doc (apache#7806) HDDS-12202. OpsCreate and OpsAppend metrics not incremented (apache#7811) HDDS-12203. Initialize block length before skip (apache#7809) HDDS-12183. Reuse cluster across safe test classes (apache#7793) HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster. (apache#7665) HDDS-12186. (addendum) Avoid array allocation for table iterator (apache#7799) HDDS-12186. Avoid array allocation for table iterator. (apache#7797) HDDS-11508. Decouple delete batch limits from Ratis request size for DirectoryDeletingService. (apache#7365) HDDS-12073. Don't show Source Bucket and Volume if null in DU metadata (apache#7760) HDDS-12142. Save logs from build check (apache#7782) HDDS-12163. Reduce number of individual getCapacity/getAvailable/getUsedSpace calls (apache#7790) HDDS-12176. Trivial dependency cleanup.(apache#7787) HDDS-12181. Bump jline to 3.29.0 (apache#7789) HDDS-12165. Refactor VolumeInfoMetrics to use getCurrentUsage (apache#7784) HDDS-12085. Add manual refresh button for DU page (apache#7780) HDDS-12132. Parameterize testUpdateTransactionInfoTable for SCM (apache#7768) HDDS-11277. Remove dependency on hadoop-hdfs in Ozone client (apache#7781) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java hadoop-ozone/dist/src/main/smoketest/admincli/container.robot hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ClosedContainerReplicator.java

…ee-improvements * HDDS-10239-container-reconciliation: (168 commits) HDDS-12112. Fix interval used for Chunk Read/Write Dashboard (apache#7724) HDDS-12212. Fix grammar in decommissioning and observability documentation (apache#7815) HDDS-12195. Implement skip() in OzoneFSInputStream (apache#7801) HDDS-12200. Fix grammar in OM HA, EC and Snapshot doc (apache#7806) HDDS-12202. OpsCreate and OpsAppend metrics not incremented (apache#7811) HDDS-12203. Initialize block length before skip (apache#7809) HDDS-12183. Reuse cluster across safe test classes (apache#7793) HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster. (apache#7665) HDDS-12186. (addendum) Avoid array allocation for table iterator (apache#7799) HDDS-12186. Avoid array allocation for table iterator. (apache#7797) HDDS-11508. Decouple delete batch limits from Ratis request size for DirectoryDeletingService. (apache#7365) HDDS-12073. Don't show Source Bucket and Volume if null in DU metadata (apache#7760) HDDS-12142. Save logs from build check (apache#7782) HDDS-12163. Reduce number of individual getCapacity/getAvailable/getUsedSpace calls (apache#7790) HDDS-12176. Trivial dependency cleanup.(apache#7787) HDDS-12181. Bump jline to 3.29.0 (apache#7789) HDDS-12165. Refactor VolumeInfoMetrics to use getCurrentUsage (apache#7784) HDDS-12085. Add manual refresh button for DU page (apache#7780) HDDS-12132. Parameterize testUpdateTransactionInfoTable for SCM (apache#7768) HDDS-11277. Remove dependency on hadoop-hdfs in Ozone client (apache#7781) ...

…ause long db lock in large cluster. (apache#7665)

nandakumar131 requested changes Jan 10, 2025

View reviewed changes

Aryan Gupta added 3 commits January 20, 2025 15:20

HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can c…

d7724a0

…ause long db lock in large cluster

Fixed TestDeletedBlocksTxnShell.

37317f0

Addressed Comments.

88722d6

aryangupta1998 force-pushed the HDDS-11714 branch from e2df43e to 88722d6 Compare January 20, 2025 09:54

Addressed Comment.

622dc73

nandakumar131 requested changes Jan 23, 2025

View reviewed changes

Aryan Gupta added 2 commits January 24, 2025 10:24

Addressed Comments.

a58f7c1

Fixed formatting.

1f8515c

Fixed TestDeletedBlocksTxnShell.

c4e5c12

sadanand48 reviewed Feb 4, 2025

View reviewed changes

nandakumar131 approved these changes Feb 4, 2025

View reviewed changes

nandakumar131 merged commit cfe56de into apache:master Feb 4, 2025
42 checks passed

nandakumar131 pushed a commit to nandakumar131/ozone that referenced this pull request Feb 10, 2025

HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can c…

f0bc4fa

…ause long db lock in large cluster. (apache#7665)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster #7665

HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster #7665

aryangupta1998 commented Jan 8, 2025

nandakumar131 Jan 10, 2025

aryangupta1998 Jan 10, 2025

nandakumar131 Jan 10, 2025

aryangupta1998 Jan 10, 2025

nandakumar131 Jan 23, 2025

nandakumar131 Jan 23, 2025

nandakumar131 Jan 23, 2025

nandakumar131 Jan 23, 2025

nandakumar131 commented Jan 24, 2025

aryangupta1998 commented Feb 4, 2025

sadanand48 left a comment

nandakumar131 commented Feb 4, 2025

		// These blocks cannot be found in the container, skip deleting them
		// eventually these TX will success.

HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster #7665

HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster #7665

Conversation

aryangupta1998 commented Jan 8, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nandakumar131 commented Jan 24, 2025

aryangupta1998 commented Feb 4, 2025

sadanand48 left a comment

Choose a reason for hiding this comment

nandakumar131 commented Feb 4, 2025