Skip to content

Conversation

@kokonguyen191
Copy link
Contributor

Description of PR

Error logs

2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
java.lang.NullPointerException
    at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
    at org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
    at java.lang.Thread.run(Thread.java:748) 

The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's called on a block belonging to a volume already removed prior. Because the volume was already removed

private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
    String delHint, String storageUuid, boolean isOnTransientStorage) {
  checkBlock(block);
  final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
      block.getLocalBlock(), status, delHint);
  final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
  
  // storage == null here because it's already removed earlier.

  for (BPServiceActor actor : bpServices) {
    actor.getIbrManager().notifyNamenodeBlock(info, storage,
        isOnTransientStorage);
  }
} 

so IBRs with a null storage are now pending.

The reason why notifyNamenodeBlock can trigger on such blocks is up in DirectoryScanner#reconcile

  public void reconcile() throws IOException {
    LOG.debug("reconcile start DirectoryScanning");
    scan();

    // If a volume is removed here after scan() already finished running,
    // diffs is stale and checkAndUpdate will run on a removed volume

    // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too
    // long
    int loopCount = 0;
    synchronized (diffs) {
      for (final Map.Entry<String, ScanInfo> entry : diffs.getEntries()) {
        dataset.checkAndUpdate(entry.getKey(), entry.getValue());        
        ...
  } 

Inside checkAndUpdate, memBlockInfo is null because all the block meta in memory is removed during the volume removal, but diskFile still exists. Then DataNode#notifyNamenodeDeletedBlock (and further down the line, notifyNamenodeBlock) is called on this block.

This patch effectively contains 2 parts:

  • Preventing processing IBRs with null storage
  • Preventing FsDatasetImpl#checkAndUpdate from running on a block belonging to a removed storage

How was this patch tested?

Unit tests. Partially on production cluster.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 12m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 20s trunk passed
+1 💚 compile 1m 23s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 1m 14s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 1m 12s trunk passed
+1 💚 mvnsite 1m 24s trunk passed
+1 💚 javadoc 1m 6s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 46s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 18s trunk passed
+1 💚 shadedclient 35m 28s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 12s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 1m 14s the patch passed
+1 💚 compile 1m 7s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 1m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 1s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 230 unchanged - 0 fixed = 231 total (was 230)
+1 💚 mvnsite 1m 12s the patch passed
+1 💚 javadoc 0m 53s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 34s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 18s the patch passed
+1 💚 shadedclient 35m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 235m 27s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
388m 51s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestBPOfferService
hadoop.hdfs.server.datanode.TestLargeBlockReport
hadoop.hdfs.server.datanode.TestDirectoryScanner
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/1/artifact/out/Dockerfile
GITHUB PR #6759
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 34352d4fc452 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d5a87e1
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/1/testReport/
Max. process+thread count 4417 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 02s No case conflicting files found.
+0 🆗 spotbugs 0m 00s spotbugs executables are not available.
+0 🆗 codespell 0m 00s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 118m 49s trunk passed
+1 💚 compile 8m 15s trunk passed
+1 💚 checkstyle 6m 34s trunk passed
+1 💚 mvnsite 8m 28s trunk passed
+1 💚 javadoc 7m 31s trunk passed
+1 💚 shadedclient 190m 44s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 6m 57s the patch passed
+1 💚 compile 4m 54s the patch passed
+1 💚 javac 4m 54s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 3m 25s the patch passed
+1 💚 mvnsite 6m 07s the patch passed
+1 💚 javadoc 4m 56s the patch passed
+1 💚 shadedclient 207m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 9m 28s The patch does not generate ASF License warnings.
557m 52s
Subsystem Report/Notes
GITHUB PR #6759
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname MINGW64_NT-10.0-17763 9dbcaf4dc965 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / d5a87e1
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/1/testReport/
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@kokonguyen191
Copy link
Contributor Author

Hi @ZanderXu @Hexiaoqiao, can you help me take a look whenever you're free? Thanks!

Copy link
Contributor

@ZanderXu ZanderXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kokonguyen191 for your report. It's a nice catch.

Leave some comments, thanks

final ReceivedDeletedBlockInfo[] rdbi = perStorage.removeAll();
if (rdbi != null) {
reports.add(new StorageReceivedDeletedBlocks(entry.getKey(), rdbi));
if (rdbi == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (rdbi != null) {
  // Null storage, should not happen
  if (entry.getKey() == null) {
    dnMetrics.incrNullStorageBlockReports();
    continue;
  }
  reports.add(new StorageReceivedDeletedBlocks(entry.getKey(), rdbi));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally put it like this to reduce nesting and make the logic a bit easier to read

return pendingIBRs.size();
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can rollback this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Actually took me quite a while to find the setting to disable it in IDE

// Block is not finalized - ignore the difference
return;
}
if (!storageMap.containsKey(storageUuid)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this logic to line 2750?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the block up

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 12m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 32s trunk passed
+1 💚 compile 1m 18s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 1m 17s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 1m 9s trunk passed
+1 💚 mvnsite 1m 25s trunk passed
+1 💚 javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 43s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 18s trunk passed
+1 💚 shadedclient 36m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 15s the patch passed
+1 💚 compile 1m 18s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 1m 18s the patch passed
+1 💚 compile 1m 11s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 1m 11s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 59s the patch passed
+1 💚 mvnsite 1m 16s the patch passed
+1 💚 javadoc 0m 55s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 36s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 24s the patch passed
+1 💚 shadedclient 37m 30s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 227m 24s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 43s The patch does not generate ASF License warnings.
385m 19s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestBPOfferService
hadoop.hdfs.server.datanode.TestLargeBlockReport
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/2/artifact/out/Dockerfile
GITHUB PR #6759
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 770ac906a345 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4b773c6
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/2/testReport/
Max. process+thread count 3559 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 39s trunk passed
+1 💚 compile 0m 43s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 0m 40s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 0m 36s trunk passed
+1 💚 mvnsite 0m 45s trunk passed
+1 💚 javadoc 0m 40s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 5s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 44s trunk passed
+1 💚 shadedclient 20m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 33s the patch passed
+1 💚 compile 0m 37s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 0m 37s the patch passed
+1 💚 compile 0m 33s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 0m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 30s the patch passed
+1 💚 mvnsite 0m 36s the patch passed
+1 💚 javadoc 0m 29s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 0m 59s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 1m 44s the patch passed
+1 💚 shadedclient 21m 4s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 204m 49s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
293m 24s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestBPOfferService
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/artifact/out/Dockerfile
GITHUB PR #6759
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 258959f150db 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 07f35e4
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/testReport/
Max. process+thread count 4188 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 26s trunk passed
+1 💚 compile 1m 21s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 1m 16s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 1m 11s trunk passed
+1 💚 mvnsite 1m 22s trunk passed
+1 💚 javadoc 1m 7s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 50s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 21s trunk passed
+1 💚 shadedclient 36m 1s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 10s the patch passed
+1 💚 compile 1m 14s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 1m 14s the patch passed
+1 💚 compile 1m 9s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 1m 9s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 1s the patch passed
+1 💚 mvnsite 1m 13s the patch passed
+1 💚 javadoc 0m 53s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 36s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 17s the patch passed
+1 💚 shadedclient 36m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 232m 56s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
374m 52s
Reason Tests
Failed junit tests hadoop.hdfs.server.datanode.TestLargeBlockReport
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/4/artifact/out/Dockerfile
GITHUB PR #6759
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1f3c0fcbc898 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1f16698
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/4/testReport/
Max. process+thread count 4094 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@ZanderXu ZanderXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kokonguyen191 . LGTM

@Hexiaoqiao
Copy link
Contributor

@kokonguyen191 Thanks for your report and contributions. Sorry didn't get this issue completely.
As description, you mentioned that storage == null as following. I wonder why NPE not throw at getPerStorageIBR(storage).put(rdbi); first which at org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager#addRDBI.

private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
    String delHint, String storageUuid, boolean isOnTransientStorage) {
  checkBlock(block);
  final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
      block.getLocalBlock(), status, delHint);
  final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
  
  // storage == null here because it's already removed earlier.

  for (BPServiceActor actor : bpServices) {
    actor.getIbrManager().notifyNamenodeBlock(info, storage,
        isOnTransientStorage);
  }
} 

Another side, you mentioned that this issue is triggered by volume removed, so how the logic between volume removed to NPE. Thanks again.

@kokonguyen191
Copy link
Contributor Author

Hi @Hexiaoqiao, thanks for the review. For your first question, getPerStorageIBR(storage) doesn't do a null check on storage and assigns a new PerStorageIBG object to null. So it pretty much just considers null to be a new storage. For your second question, the NPE in IBRs is caused by a block report with null storage. The condition for this to happen is for a storage to be removed after DirectoryScanner has just finished a scan but hasn't started processing the blocks yet, so the scan result is now stale and contains removed storage.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 02s No case conflicting files found.
+0 🆗 spotbugs 0m 01s spotbugs executables are not available.
+0 🆗 codespell 0m 01s codespell was not available.
+0 🆗 detsecrets 0m 01s detect-secrets was not available.
+1 💚 @author 0m 00s The patch does not contain any @author tags.
+1 💚 test4tests 0m 01s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 111m 47s trunk passed
+1 💚 compile 7m 33s trunk passed
+1 💚 checkstyle 6m 01s trunk passed
+1 💚 mvnsite 8m 01s trunk passed
+1 💚 javadoc 7m 10s trunk passed
+1 💚 shadedclient 179m 43s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 5m 48s the patch passed
+1 💚 compile 4m 16s the patch passed
+1 💚 javac 4m 16s the patch passed
+1 💚 blanks 0m 01s The patch has no blanks issues.
+1 💚 checkstyle 2m 48s the patch passed
+1 💚 mvnsite 5m 19s the patch passed
+1 💚 javadoc 4m 32s the patch passed
+1 💚 shadedclient 198m 13s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 6m 53s The patch does not generate ASF License warnings.
523m 16s
Subsystem Report/Notes
GITHUB PR #6759
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname MINGW64_NT-10.0-17763 96a75b329453 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / 1f16698
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/5/testReport/
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/5/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hfutatzhanghb
Copy link
Contributor

@kokonguyen191 Thanks for your report and contributions. Sorry didn't get this issue completely. As description, you mentioned that storage == null as following. I wonder why NPE not throw at getPerStorageIBR(storage).put(rdbi); first which at org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager#addRDBI.

private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
    String delHint, String storageUuid, boolean isOnTransientStorage) {
  checkBlock(block);
  final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
      block.getLocalBlock(), status, delHint);
  final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
  
  // storage == null here because it's already removed earlier.

  for (BPServiceActor actor : bpServices) {
    actor.getIbrManager().notifyNamenodeBlock(info, storage,
        isOnTransientStorage);
  }
} 

Another side, you mentioned that this issue is triggered by volume removed, so how the logic between volume removed to NPE. Thanks again.

@Hexiaoqiao Hi, sir. please take a look at the description of #6730 . I found the same problem and explained in that PR's description.

@github-actions github-actions bot added the Common label May 6, 2024
@haiyang1987
Copy link
Contributor

LGTM.

Hi @Hexiaoqiao @hfutatzhanghb any other comments?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 02s No case conflicting files found.
+0 🆗 codespell 0m 02s codespell was not available.
+0 🆗 detsecrets 0m 02s detect-secrets was not available.
+0 🆗 markdownlint 0m 02s markdownlint was not available.
+0 🆗 spotbugs 0m 00s spotbugs executables are not available.
+1 💚 @author 0m 01s The patch does not contain any @author tags.
+1 💚 test4tests 0m 00s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 2m 14s Maven dependency ordering for branch
+1 💚 mvninstall 86m 40s trunk passed
+1 💚 compile 38m 18s trunk passed
+1 💚 checkstyle 5m 51s trunk passed
-1 ❌ mvnsite 4m 27s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 💚 javadoc 10m 13s trunk passed
+1 💚 shadedclient 157m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 2m 16s Maven dependency ordering for patch
+1 💚 mvninstall 9m 29s the patch passed
+1 💚 compile 35m 37s the patch passed
+1 💚 javac 35m 37s the patch passed
+1 💚 blanks 0m 01s The patch has no blanks issues.
+1 💚 checkstyle 5m 50s the patch passed
-1 ❌ mvnsite 4m 30s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 💚 javadoc 10m 18s the patch passed
+1 💚 shadedclient 167m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 5m 28s The patch does not generate ASF License warnings.
512m 54s
Subsystem Report/Notes
GITHUB PR #6759
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname MINGW64_NT-10.0-17763 f9e61a0ebff0 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / b42a03e
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/6/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/6/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@ZanderXu ZanderXu merged commit fb05192 into apache:trunk May 11, 2024
@ZanderXu
Copy link
Contributor

Merged. Thanks @kokonguyen191 for your contribution and thanks @Hexiaoqiao @haiyang1987 @hfutatzhanghb for your review.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
-1 ❌ patch 0m 55s #6759 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
GITHUB PR #6759
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/7/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

K0K0V0K pushed a commit to K0K0V0K/hadoop that referenced this pull request May 17, 2024
K0K0V0K pushed a commit to K0K0V0K/hadoop that referenced this pull request May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants