-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-17467. IncrementalBlockReportManager#getPerStorageIBR may throw NPE when remove volumes. #6730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…NPE when remove volumes.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@hfutatzhanghb thanks for your report. It looks like HDFS-17488. Let's review HDFS-17488 together to fix this bug if you have time. Thanks |
@ZanderXu OK, Sir. I will review HDFS-17488 soonly. Thanks for reminding. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Description of PR
Refer to HDFS-17467.
When we remove volumes, it may cause IncrementalBlockReportManager#getPerStorageIBR throws NPE.
Consider below situation:
1、we have down createRbw、finalizeBlock. But have not done datanode.closeBlock in method
BlockReceiver.PacketResponder#finalizeBlock.2、we remove volume which replica was written to and it executes such code:
storageMap.remove(storageUuid);3、 we begin to execute datanode.closeBlock which try to send IBR to NameNode. but when getting DatanodeStorage from storageMap using
storageUuid, we will get null because we have remove this storageUuid key from storageMap.
4、Throw NPE in getPerStorageIBR method, because ConcurrentHashMap don't allow null key.
So, we should check whether storage is null before we invoke CHM#get. If storage is null, we directly return. It's OK, because removeVolume will remove all replicas on that volume, please check FsDatasetImpl#removeVolumes method for below codes: