-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-16187. SnapshotDiff behaviour with Xattrs and Acls is not consistent across NN restarts with checkpointing #3340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tent across NN restarts with checkpointing.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
szetszwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. We may use java.util.Object.equals(a, b) to check null.
...project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java
Outdated
Show resolved
Hide resolved
...doop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectoryAttributes.java
Outdated
Show resolved
Hide resolved
...doop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectoryAttributes.java
Outdated
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
szetszwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bshashikant , thank you for the patch.
I see that the patch works by converting logic from reference-equals to value-equals. In AclStorage, we maintain a reference-counted mapping of every immutable instance of every AclFeature currently in use by the namesystem. Inodes with identical ACL entries all point to the same AclFeature, so reference equals should work fine. This was an intentional choice of the ACL design to lower memory footprint and provide an inexpensive equality operation.
Xattrs do not use this same flyweight pattern though, so I can see why deep equals would be necessary there. I wonder if this bug really only happens for xattrs and not ACLs? If the bug happens for ACLs, then that might mean there is really something wrong with reference counting in AclStorage, causing duplicate instances for the same ACL entries.
Also, if value equals is necessary, then there is similar logic in INodeFile and INodeFileAttributes.
Thanks @cnauroth for the explanation. I tested the acl feature and it seems the issue only seem to exist with Xattr feature not acl itself as you suggested. I have modified the patch accordingly. Thank you very much for the details. |
|
💔 -1 overall
This message was automatically generated. |
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, @bshashikant.
- Do you also want to update the similar logic in
INodeFileandINodeFileAttributes.SnapshotCopy? - Related to that, I wonder if the bug also exists for files. Right now, the unit test exercises a directory. Could you also write a test that covers snapshotting a directory without xattrs containing a file that has xattrs? If that test fails before your patch but passes afterwards, then we know it's a good test to keep.
- For each of the
metadataEqualsmethods that you are updating (4 total?), would you please add a brief comment stating that reference-equals is intentional for ACLs because of the reference counting inAclStorage? That might prevent confusion for future maintainers.
|
Thanks @cnauroth . The problem here seems to be very specific to snapshotRoot. public void testXattrOnFileWithSnapshotAndNNRestart() throws Exception { The same test with xattrs set on file inside the snapshotRoot/SubDirecory will just work with/without the patch. |
|
@bshashikant , thanks! In that case, let's please just add the comments I requested to |
Thanks @cnauroth . As the problem is very specific to snapshot root, "metedataEquals()" function is now overriden in Snapshot.Root() and added the comments as per your suggestion |
@szetszwo , can you have another look? |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks for incorporating the feedback, @bshashikant . I will hold off committing in case you want to hear back from @szetszwo first.
|
@cnauroth , thanks a lot for helping out.
In this case, it is a good idea to check == inside the equals(Object) method. Then, calling equals(Object) is equally cheap as checking ==. Let's merge this change as-is and do the further improvement separately. +1 @bshashikant , it somehow got "All checks have failed". Could you trigger another build? |
|
💔 -1 overall
This message was automatically generated. |
|
Thanks @szetszwo . I have retriggered the CI and the unit test failures in the report are not related. |
|
@bshashikant , thank you for the contribution and for incorporating the code review feedback. |
…tent across NN restarts with checkpointing (apache#3340) (cherry picked from commit 356ebbb) Conflicts: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java Change-Id: If76fd0d77fafc90fe2f2c19ab1d0c43a58510f6b
…tent across NN restarts with checkpointing (#3340) (#3640) (cherry picked from commit 356ebbb) Conflicts: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java Change-Id: If76fd0d77fafc90fe2f2c19ab1d0c43a58510f6b Co-authored-by: bshashikant <shashikant@apache.org>
…tent across NN restarts with checkpointing (#3340) (#3640) (cherry picked from commit 356ebbb) Conflicts: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java Change-Id: If76fd0d77fafc90fe2f2c19ab1d0c43a58510f6b Co-authored-by: bshashikant <shashikant@apache.org> (cherry picked from commit 5333e87)
…tent across NN restarts with checkpointing (apache#3340)
… and Acls is not consistent across NN restarts with checkpointing (apache#3340) (cherry picked from commit 9b7be58afe41a75a85511b3d95c00519ca5071be) Change-Id: Ie1d00f7b029e3182c7a2c2d288599e3fd4088b9c
…few important for 7.1.7 SP1 : [1] Upstream JIRA ID: HDFS-16187. SnapshotDiff behaviour with Xattrs and Acls is not consistent across NN restarts with checkpointing (apache#3340) (cherry picked from commit 356ebbb) Change-Id: Ic971626aa30f49af0aaa653ca5f43c9d8078fc17 (cherry picked from commit be77d73)
Please check https://issues.apache.org/jira/browse/HDFS-16187