-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-9680. Use md5 hash of multipart object part's content as ETag #5668
HDDS-9680. Use md5 hash of multipart object part's content as ETag #5668
Conversation
42cfbb9
to
9bcf0a1
Compare
d3e9f8f
to
2ac2e76
Compare
Can you address the merge conflicts? |
2ac2e76
to
fd3d762
Compare
Done |
hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
Outdated
Show resolved
Hide resolved
hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
Outdated
Show resolved
Hide resolved
Make an 'eTag' field optional for Part and PartInfo messages
@SaketaChalamchala can you please take a look? |
@SaketaChalamchala @tanvipenumudy can you please review? |
@SaketaChalamchala @tanvipenumudy please review |
…-uploaded-part-eTag-improvement
...n/java/org/apache/hadoop/ozone/om/request/s3/multipart/S3MultipartUploadCompleteRequest.java
Outdated
Show resolved
Hide resolved
The changes overall look good, we need to figure out how to handle older S3G connecting to OM and if there are any on going multipart uploads during upgrade. Open to discuss what is the right solution. This is a very important PR and we should try to get in asap. |
The change LGTM |
…-uploaded-part-eTag-improvement
I didn't see any question either. |
The notice from @adoroszlai was: proto.lock should not be updated as part of the PR. The reason for having this file is to ensure backwards compatibility of any changes to .proto definitions. Lock files are used as the reference against which proto definitions are validated. My reply to the notice above was:
|
@vtutrinov Thanks for re-posting, I don't see the same question anywhere else in PR. Regarding the question: proto definitions need to be changed in a way to be backwards compatible. For this PR, reverting the change of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not update the proto.lock. The OmClientProtocol.proto
file has new fields as optional and it should not need a change to the proto.lock.
…ences with the ref to it * replace "Md5" occurrences with the ref to OzoneConst.MD5_HASH const
@vtutrinov please check test failures:
https://github.com/vtutrinov/ozone/actions/runs/7687119786/job/20946884097#step:5:2502 |
Fixed |
hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
Outdated
Show resolved
Hide resolved
…nal field (revert change)
Thanks a lot @vtutrinov for updating the patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @vtutrinov for addressing the reviews.
I also saw that you addressed some concerns posed by @kerneltime regarding the incomplete multipart upload backward compatibilities. Thanks for that.
I have some comments, but mostly LGTM. Thanks again for your hard work @vtutrinov.
String dbPartETag = null; | ||
String dbPartName = null; | ||
if (partKeyInfo != null) { | ||
dbPartETag = partKeyInfo.getPartKeyInfo().getMetadata(0).getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this, I might miss something, but this assumes that the first element in the metadata is the eTag? Maybe we can explicitly get the metadata which key is equal to OzoneConsts.ETAG
?
Currently it seems that only ETAG is the only metadata stored on the multipart upload part so it should be fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update.
@@ -642,10 +661,36 @@ private String multipartUploadedKeyHash( | |||
StringBuffer keysConcatenated = new StringBuffer(); | |||
for (PartKeyInfo partKeyInfo: partsList) { | |||
keysConcatenated.append(KeyValueUtil.getFromProtobuf(partKeyInfo | |||
.getPartKeyInfo().getMetadataList()).get("ETag")); | |||
.getPartKeyInfo().getMetadataList()).get(OzoneConsts.ETAG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the incomplete multipart uploads compatibility, the parts that do not have "eTag" yet will return null. In StringBuffer
, it will append four characters "null". However, I think there is little we can do here, so I think it should be fine to handle the incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update. Instead of null, it will use partName if the md5 hash does not exist yet.
boolean eTagBasedValidationAvailable = partsList.stream().allMatch(OzoneManagerProtocolProtos.Part::hasETag); | ||
// Now do actual logic, and check for any Invalid part during this. | ||
for (OzoneManagerProtocolProtos.Part part : partsList) { | ||
currentPartCount++; | ||
int partNumber = part.getPartNumber(); | ||
String partName = part.getPartName(); | ||
|
||
PartKeyInfo partKeyInfo = partKeyInfoMap.get(partNumber); | ||
|
||
String dbPartName = null; | ||
if (partKeyInfo != null) { | ||
dbPartName = partKeyInfo.getPartName(); | ||
} | ||
if (!StringUtils.equals(partName, dbPartName)) { | ||
String omPartName = partKeyInfo == null ? null : dbPartName; | ||
MultipartCommitRequestPart requestPart = eTagBasedValidationAvailable ? | ||
eTagBasedValidator.apply(part, partKeyInfo) : partNameBasedValidator.apply(part, partKeyInfo); | ||
if (!requestPart.isValid()) { | ||
throw new OMException( | ||
failureMessage(requestedVolume, requestedBucket, keyName) + | ||
". Provided Part info is { " + partName + ", " + partNumber + | ||
"}, whereas OM has partName " + omPartName, | ||
". Provided Part info is { " + requestPart.getRequestPartId() + ", " + partNumber + | ||
"}, whereas OM has eTag " + requestPart.getOmPartId(), | ||
OMException.ResultCodes.INVALID_PART); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding: the idea seems to pass validation even if some of the completed parts do not have eTag field, in that case, we will only validate on the partName (since it always exist in both old and new versions), if all completed parts have eTag, the MPU complete will pass validation if the eTag metadata is equal to either the persisted partName and the eTag field (both partName and eTag should have the same value).
…x MPU commit request computing of eTag hash for old clients
@vtutrinov thanks for updating the patch. LGTM +1. |
@kerneltime can you please take another look? |
@vtutrinov @adoroszlai While @kerneltime is reviewing this, could you help resolve the merge conflicts? |
…-uploaded-part-eTag-improvement
…-uploaded-part-eTag-improvement
Thanks @vtutrinov for the patch, @ivandika3, @kerneltime, @SaketaChalamchala for the review. |
…pache#5668) (cherry picked from commit 7370676)
What changes were proposed in this pull request?
Replace the uploaded part's path with its content md5 hash for the part's ETag response field
In the scope of the HDDS-9115 HDDS-9114 jira tickets the feature to store key's ETag (content md5 hash) was implemented. But the ETag field in the response of the part loading request wasn't changed and showed the part's path as it takes part in completeMultipartUpload request (OM). The PR contains the next changes:
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9680
How was this patch tested?
Existing unit, integration, and smoke/acceptance/robot tests. The robot test smoketest/s3/MutipartUpload.robot/"Test Multipart Upload Complete" was patched to compare the part's ETag with md5 hash of their content