Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster #7665

Merged
merged 7 commits into from
Feb 4, 2025

Conversation

aryangupta1998
Copy link
Contributor

What changes were proposed in this pull request?

In case of resetDeletedBlockRetryCount with --all option, scm takes lock and tries to get all the transaction with max retry and then updates DB with 0 count. In some large scale env this count can be huge which can lead to multiple problem.

i) Lock can lead to block all other normal operation.

ii) Since message is passed through ratis, which will fail because of size.

Instead of doing like above we should do this operation in batches to avoid long lock and ratis message size failure.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11714

How was this patch tested?

Tested Manually.

Comment on lines 123 to 149
@Override
public List<DeletedBlocksTransaction> getFailedTransactionsBatch(
int batchSize, long startTxId) throws IOException {
List<DeletedBlocksTransaction> failedTXs = new ArrayList<>();

lock.lock();
try {
try (
TableIterator<Long, ? extends Table.KeyValue<Long, DeletedBlocksTransaction>> iter =
deletedBlockLogStateManager.getReadOnlyIterator()) {

iter.seek(startTxId);

while (iter.hasNext() && failedTXs.size() < batchSize) {
DeletedBlocksTransaction delTX = iter.next().getValue();
if (delTX.getCount() == -1) {
failedTXs.add(delTX);
}
}
}
} finally {
lock.unlock();
}

return failedTXs;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this additional method? The same thing can be achieved with the existing getFailedTransactions method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


} while (!batch.isEmpty());
} else {
// Process txIDs provided by the user in batches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user provided list of txIDs reaches SCM via RPC call, so it's ok to process this in single go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 915 to 918
AUDIT.logReadFailure(
buildAuditMessageForFailure(
SCMAction.GET_FAILED_DELETED_BLOCKS_TRANSACTION, auditMap, ex)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid unnecessary formatting.

Comment on lines 301 to 302
// These blocks cannot be found in the container, skip deleting them
// eventually these TX will success.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid unnecessary formatting.

@@ -129,7 +129,7 @@ public List<DeletedBlocksTransaction> getFailedTransactions(int count,
final List<DeletedBlocksTransaction> failedTXs = Lists.newArrayList();
try (TableIterator<Long,
? extends Table.KeyValue<Long, DeletedBlocksTransaction>> iter =
deletedBlockLogStateManager.getReadOnlyIterator()) {
deletedBlockLogStateManager.getReadOnlyIterator()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid unnecessary formatting.

@@ -176,18 +176,52 @@ public void incrementCount(List<Long> txIDs)
*/
@Override
public int resetCount(List<Long> txIDs) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify the whole logic

public int resetCount(List<Long> txIDs) throws IOException {

  if (txIDs != null) {
     try {
        lock.lock();
        transactionStatusManager.resetRetryCount(txIDs);
        return deletedBlockLogStateManager.resetRetryCountOfTransactionInDB(
            new ArrayList<>(txIDs));
     } finally {
        lock.unlock();
     } catch (Exception e) {
      throw new IOException("Error during transaction reset", e);
     }
  }

  final int batchSize = 1000;
  int totalProcessed = 0;
  long startTxId = 0;

  try {
      // If txIDs are not provided, fetch all failed transactions in batches
      List<DeletedBlocksTransaction> batch;
      do {
          // Fetch the batch of failed transactions
          batch = getFailedTransactions(batchSize, startTxId);
          List<Long> batchTxIDs = batch.stream().map(DeletedBlocksTransaction::getTxID).collect(Collectors.toList());
          totalProcessed += resetCount(batchTxIDs);

          // Update startTxId to continue from the last processed transaction in the next iteration
          startTxId = batch.get(batch.size() - 1).getTxID() + 1;
      } while (!batch.isEmpty());
   }
   retrun totalProcessed;
}

@nandakumar131
Copy link
Contributor

@aryangupta1998 the test failure seems related to this change, can you take a look at it?

@aryangupta1998
Copy link
Contributor Author

Thanks @nandakumar131, fixed the test case!

Copy link
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nandakumar131 nandakumar131 merged commit cfe56de into apache:master Feb 4, 2025
42 checks passed
@nandakumar131
Copy link
Contributor

Thanks @aryangupta1998 for the contribution. Thanks @sadanand48 for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Feb 5, 2025
* master: (168 commits)
  HDDS-12112. Fix interval used for Chunk Read/Write Dashboard (apache#7724)
  HDDS-12212. Fix grammar in decommissioning and observability documentation (apache#7815)
  HDDS-12195. Implement skip() in OzoneFSInputStream (apache#7801)
  HDDS-12200. Fix grammar in OM HA, EC and Snapshot doc (apache#7806)
  HDDS-12202. OpsCreate and OpsAppend metrics not incremented (apache#7811)
  HDDS-12203. Initialize block length before skip (apache#7809)
  HDDS-12183. Reuse cluster across safe test classes (apache#7793)
  HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster. (apache#7665)
  HDDS-12186. (addendum) Avoid array allocation for table iterator (apache#7799)
  HDDS-12186. Avoid array allocation for table iterator. (apache#7797)
  HDDS-11508. Decouple delete batch limits from Ratis request size for DirectoryDeletingService. (apache#7365)
  HDDS-12073. Don't show Source Bucket and Volume if null in DU metadata (apache#7760)
  HDDS-12142. Save logs from build check (apache#7782)
  HDDS-12163. Reduce number of individual getCapacity/getAvailable/getUsedSpace calls (apache#7790)
  HDDS-12176. Trivial dependency cleanup.(apache#7787)
  HDDS-12181. Bump jline to 3.29.0 (apache#7789)
  HDDS-12165. Refactor VolumeInfoMetrics to use getCurrentUsage (apache#7784)
  HDDS-12085. Add manual refresh button for DU page (apache#7780)
  HDDS-12132. Parameterize testUpdateTransactionInfoTable for SCM (apache#7768)
  HDDS-11277. Remove dependency on hadoop-hdfs in Ozone client (apache#7781)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
hadoop-ozone/dist/src/main/smoketest/admincli/container.robot
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ClosedContainerReplicator.java
errose28 added a commit to errose28/ozone that referenced this pull request Feb 6, 2025
* master: (168 commits)
  HDDS-12112. Fix interval used for Chunk Read/Write Dashboard (apache#7724)
  HDDS-12212. Fix grammar in decommissioning and observability documentation (apache#7815)
  HDDS-12195. Implement skip() in OzoneFSInputStream (apache#7801)
  HDDS-12200. Fix grammar in OM HA, EC and Snapshot doc (apache#7806)
  HDDS-12202. OpsCreate and OpsAppend metrics not incremented (apache#7811)
  HDDS-12203. Initialize block length before skip (apache#7809)
  HDDS-12183. Reuse cluster across safe test classes (apache#7793)
  HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster. (apache#7665)
  HDDS-12186. (addendum) Avoid array allocation for table iterator (apache#7799)
  HDDS-12186. Avoid array allocation for table iterator. (apache#7797)
  HDDS-11508. Decouple delete batch limits from Ratis request size for DirectoryDeletingService. (apache#7365)
  HDDS-12073. Don't show Source Bucket and Volume if null in DU metadata (apache#7760)
  HDDS-12142. Save logs from build check (apache#7782)
  HDDS-12163. Reduce number of individual getCapacity/getAvailable/getUsedSpace calls (apache#7790)
  HDDS-12176. Trivial dependency cleanup.(apache#7787)
  HDDS-12181. Bump jline to 3.29.0 (apache#7789)
  HDDS-12165. Refactor VolumeInfoMetrics to use getCurrentUsage (apache#7784)
  HDDS-12085. Add manual refresh button for DU page (apache#7780)
  HDDS-12132. Parameterize testUpdateTransactionInfoTable for SCM (apache#7768)
  HDDS-11277. Remove dependency on hadoop-hdfs in Ozone client (apache#7781)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
hadoop-ozone/dist/src/main/smoketest/admincli/container.robot
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ClosedContainerReplicator.java
errose28 added a commit to errose28/ozone that referenced this pull request Feb 7, 2025
…ee-improvements

* HDDS-10239-container-reconciliation: (168 commits)
  HDDS-12112. Fix interval used for Chunk Read/Write Dashboard (apache#7724)
  HDDS-12212. Fix grammar in decommissioning and observability documentation (apache#7815)
  HDDS-12195. Implement skip() in OzoneFSInputStream (apache#7801)
  HDDS-12200. Fix grammar in OM HA, EC and Snapshot doc (apache#7806)
  HDDS-12202. OpsCreate and OpsAppend metrics not incremented (apache#7811)
  HDDS-12203. Initialize block length before skip (apache#7809)
  HDDS-12183. Reuse cluster across safe test classes (apache#7793)
  HDDS-11714. resetDeletedBlockRetryCount with --all may fail and can cause long db lock in large cluster. (apache#7665)
  HDDS-12186. (addendum) Avoid array allocation for table iterator (apache#7799)
  HDDS-12186. Avoid array allocation for table iterator. (apache#7797)
  HDDS-11508. Decouple delete batch limits from Ratis request size for DirectoryDeletingService. (apache#7365)
  HDDS-12073. Don't show Source Bucket and Volume if null in DU metadata (apache#7760)
  HDDS-12142. Save logs from build check (apache#7782)
  HDDS-12163. Reduce number of individual getCapacity/getAvailable/getUsedSpace calls (apache#7790)
  HDDS-12176. Trivial dependency cleanup.(apache#7787)
  HDDS-12181. Bump jline to 3.29.0 (apache#7789)
  HDDS-12165. Refactor VolumeInfoMetrics to use getCurrentUsage (apache#7784)
  HDDS-12085. Add manual refresh button for DU page (apache#7780)
  HDDS-12132. Parameterize testUpdateTransactionInfoTable for SCM (apache#7768)
  HDDS-11277. Remove dependency on hadoop-hdfs in Ozone client (apache#7781)
  ...
nandakumar131 pushed a commit to nandakumar131/ozone that referenced this pull request Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants