Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-9680. Use md5 hash of multipart object part's content as ETag #5668

Conversation

vtutrinov
Copy link
Contributor

@vtutrinov vtutrinov commented Nov 24, 2023

What changes were proposed in this pull request?

Replace the uploaded part's path with its content md5 hash for the part's ETag response field

In the scope of the HDDS-9115 HDDS-9114 jira tickets the feature to store key's ETag (content md5 hash) was implemented. But the ETag field in the response of the part loading request wasn't changed and showed the part's path as it takes part in completeMultipartUpload request (OM). The PR contains the next changes:

  • replace path-representation of the part's ETag by its content md5 hash
  • update org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest in the scope of parts validating - now we validate the part's ETag instead of its partName
  • add eTag field to OmClientProtocol.proto/MultipartCommitUploadPartResponse
  • add eTag field to OmClientProtocol.proto/Part (to make completeMultipartUpload request)
  • add eTag field to OmClientProtocol.proto/PartInfo (listParts)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9680

How was this patch tested?

Existing unit, integration, and smoke/acceptance/robot tests. The robot test smoketest/s3/MutipartUpload.robot/"Test Multipart Upload Complete" was patched to compare the part's ETag with md5 hash of their content

@vtutrinov vtutrinov force-pushed the HDDS-9680-multipart-uploaded-part-eTag-improvement branch 2 times, most recently from 42cfbb9 to 9bcf0a1 Compare November 27, 2023 13:53
@vtutrinov vtutrinov force-pushed the HDDS-9680-multipart-uploaded-part-eTag-improvement branch 2 times, most recently from d3e9f8f to 2ac2e76 Compare November 28, 2023 05:48
@vtutrinov vtutrinov marked this pull request as ready for review November 28, 2023 15:35
@kerneltime kerneltime requested a review from duongkame November 28, 2023 17:10
@kerneltime
Copy link
Contributor

Can you address the merge conflicts?

@vtutrinov vtutrinov force-pushed the HDDS-9680-multipart-uploaded-part-eTag-improvement branch from 2ac2e76 to fd3d762 Compare November 29, 2023 06:45
@vtutrinov
Copy link
Contributor Author

Can you address the merge conflicts?

Done

@kerneltime
Copy link
Contributor

Make an 'eTag' field optional for Part and PartInfo messages
@adoroszlai adoroszlai changed the title HDDS-9680. Provide a loaded key part's ETag as an md5 hash (S3G multipart upload) HDDS-9680. Use md5 hash of multipart object part's content as ETag Nov 30, 2023
@adoroszlai adoroszlai added the s3 S3 Gateway label Dec 1, 2023
@kerneltime
Copy link
Contributor

@SaketaChalamchala can you please take a look?

@adoroszlai
Copy link
Contributor

@SaketaChalamchala @tanvipenumudy can you please review?

@myskov
Copy link
Contributor

myskov commented Dec 19, 2023

@SaketaChalamchala @tanvipenumudy please review

@kerneltime
Copy link
Contributor

The changes overall look good, we need to figure out how to handle older S3G connecting to OM and if there are any on going multipart uploads during upgrade. Open to discuss what is the right solution. This is a very important PR and we should try to get in asap.

@SaketaChalamchala
Copy link
Contributor

The change LGTM

@adoroszlai
Copy link
Contributor

...once my question about the proto.lock will be answered

Sorry, I didn't see your question, which question are you referring to?

@ivandika3 the question was from @adoroszlai - https://github.com/apache/ozone/pull/5668/files/48a1af2d2a82ab6c4780b2fb88502530ddf33362#r1464366194

I didn't see any question either.

@vtutrinov
Copy link
Contributor Author

...once my question about the proto.lock will be answered

Sorry, I didn't see your question, which question are you referring to?

@ivandika3 the question was from @adoroszlai - https://github.com/apache/ozone/pull/5668/files/48a1af2d2a82ab6c4780b2fb88502530ddf33362#r1464366194

I didn't see any question either.

The notice from @adoroszlai was:

proto.lock should not be updated as part of the PR. The reason for having this file is to ensure backwards compatibility of any changes to .proto definitions. Lock files are used as the reference against which proto definitions are validated.

My reply to the notice above was:

How should we resolve the compatibility check failure in a maven build if the proto.lock won't be updated?

@adoroszlai
Copy link
Contributor

The notice from @adoroszlai was:

proto.lock should not be updated as part of the PR. The reason for having this file is to ensure backwards compatibility of any changes to .proto definitions. Lock files are used as the reference against which proto definitions are validated.

My reply to the notice above was:

How should we resolve the compatibility check failure in a maven build if the proto.lock won't be updated?

@vtutrinov Thanks for re-posting, I don't see the same question anywhere else in PR.

Regarding the question: proto definitions need to be changed in a way to be backwards compatible.

For this PR, reverting the change of proto.lock, backwards compatibility check is still passing for me. Are you encountering some failure?

Copy link
Contributor

@kerneltime kerneltime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not update the proto.lock. The OmClientProtocol.proto file has new fields as optional and it should not need a change to the proto.lock.

@adoroszlai
Copy link
Contributor

@vtutrinov please check test failures:

org.apache.hadoop.ozone.om.request.s3.multipart.TestS3MultipartUploadCompleteRequest
org.apache.hadoop.ozone.om.request.s3.multipart.TestS3MultipartUploadCompleteRequestWithFSO
org.apache.hadoop.ozone.om.response.s3.multipart.TestS3MultipartUploadCommitPartResponseWithFSO
org.apache.hadoop.ozone.om.response.s3.multipart.TestS3MultipartUploadCompleteResponseWithFSO

https://github.com/vtutrinov/ozone/actions/runs/7687119786/job/20946884097#step:5:2502

@vtutrinov
Copy link
Contributor Author

@vtutrinov please check test failures:

org.apache.hadoop.ozone.om.request.s3.multipart.TestS3MultipartUploadCompleteRequest
org.apache.hadoop.ozone.om.request.s3.multipart.TestS3MultipartUploadCompleteRequestWithFSO
org.apache.hadoop.ozone.om.response.s3.multipart.TestS3MultipartUploadCommitPartResponseWithFSO
org.apache.hadoop.ozone.om.response.s3.multipart.TestS3MultipartUploadCompleteResponseWithFSO

https://github.com/vtutrinov/ozone/actions/runs/7687119786/job/20946884097#step:5:2502

Fixed

@adoroszlai adoroszlai dismissed their stale review January 29, 2024 07:43

lockfile removed

@adoroszlai adoroszlai requested a review from kerneltime January 29, 2024 07:44
@adoroszlai
Copy link
Contributor

Thanks a lot @vtutrinov for updating the patch.

@adoroszlai adoroszlai requested a review from ivandika3 January 29, 2024 07:44
Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @vtutrinov for addressing the reviews.

I also saw that you addressed some concerns posed by @kerneltime regarding the incomplete multipart upload backward compatibilities. Thanks for that.

I have some comments, but mostly LGTM. Thanks again for your hard work @vtutrinov.

String dbPartETag = null;
String dbPartName = null;
if (partKeyInfo != null) {
dbPartETag = partKeyInfo.getPartKeyInfo().getMetadata(0).getValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this, I might miss something, but this assumes that the first element in the metadata is the eTag? Maybe we can explicitly get the metadata which key is equal to OzoneConsts.ETAG?

Currently it seems that only ETAG is the only metadata stored on the multipart upload part so it should be fine for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update.

@@ -642,10 +661,36 @@ private String multipartUploadedKeyHash(
StringBuffer keysConcatenated = new StringBuffer();
for (PartKeyInfo partKeyInfo: partsList) {
keysConcatenated.append(KeyValueUtil.getFromProtobuf(partKeyInfo
.getPartKeyInfo().getMetadataList()).get("ETag"));
.getPartKeyInfo().getMetadataList()).get(OzoneConsts.ETAG));
Copy link
Contributor

@ivandika3 ivandika3 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the incomplete multipart uploads compatibility, the parts that do not have "eTag" yet will return null. In StringBuffer, it will append four characters "null". However, I think there is little we can do here, so I think it should be fine to handle the incompatibility.

Copy link
Contributor

@ivandika3 ivandika3 Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. Instead of null, it will use partName if the md5 hash does not exist yet.

Comment on lines +518 to 531
boolean eTagBasedValidationAvailable = partsList.stream().allMatch(OzoneManagerProtocolProtos.Part::hasETag);
// Now do actual logic, and check for any Invalid part during this.
for (OzoneManagerProtocolProtos.Part part : partsList) {
currentPartCount++;
int partNumber = part.getPartNumber();
String partName = part.getPartName();

PartKeyInfo partKeyInfo = partKeyInfoMap.get(partNumber);

String dbPartName = null;
if (partKeyInfo != null) {
dbPartName = partKeyInfo.getPartName();
}
if (!StringUtils.equals(partName, dbPartName)) {
String omPartName = partKeyInfo == null ? null : dbPartName;
MultipartCommitRequestPart requestPart = eTagBasedValidationAvailable ?
eTagBasedValidator.apply(part, partKeyInfo) : partNameBasedValidator.apply(part, partKeyInfo);
if (!requestPart.isValid()) {
throw new OMException(
failureMessage(requestedVolume, requestedBucket, keyName) +
". Provided Part info is { " + partName + ", " + partNumber +
"}, whereas OM has partName " + omPartName,
". Provided Part info is { " + requestPart.getRequestPartId() + ", " + partNumber +
"}, whereas OM has eTag " + requestPart.getOmPartId(),
OMException.ResultCodes.INVALID_PART);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding: the idea seems to pass validation even if some of the completed parts do not have eTag field, in that case, we will only validate on the partName (since it always exist in both old and new versions), if all completed parts have eTag, the MPU complete will pass validation if the eTag metadata is equal to either the persisted partName and the eTag field (both partName and eTag should have the same value).

…x MPU commit request computing of eTag hash for old clients
@ivandika3
Copy link
Contributor

@vtutrinov thanks for updating the patch. LGTM +1.

@adoroszlai
Copy link
Contributor

@kerneltime can you please take another look?

@ivandika3
Copy link
Contributor

@vtutrinov @adoroszlai While @kerneltime is reviewing this, could you help resolve the merge conflicts?

@adoroszlai adoroszlai merged commit 7370676 into apache:master Feb 13, 2024
36 checks passed
@adoroszlai
Copy link
Contributor

Thanks @vtutrinov for the patch, @ivandika3, @kerneltime, @SaketaChalamchala for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s3 S3 Gateway
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants