Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10521. ETag field should not be returned during GetObject if the key does not contain ETag field #6377

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented Mar 14, 2024

What changes were proposed in this pull request?

A user encountered this error when it tries to download a key using the AWS S3 Java SDK (version 1.11.415).

ERROR FileOperationServiceImpl - s3Client.getObject invoke error, objectKey:<redacted>. (FileOperationServiceImpl.java:438)
java.lang.ArrayIndexOutOfBoundsException: 110
    at com.amazonaws.util.Base16Codec.pos(Base16Codec.java:96)
    at com.amazonaws.util.Base16Codec.decode(Base16Codec.java:87)
    at com.amazonaws.util.Base16Lower.decode(Base16Lower.java:53)
    at com.amazonaws.util.BinaryUtils.fromHex(BinaryUtils.java:48)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1456)
    ...

Although the key was able to be downloaded using AWS s3api

aws s3api --endpoint <redacted> get-object --bucket <redacted> --key <redacted> download.txt
{
    "AcceptRanges": "bytes",
    "LastModified": "Wed, 28 Feb 2024 11:06:30 GMT",
    "ContentLength": 1117055,
    "ETag": "\"null\"",
    "CacheControl": "no-cache",
    "ContentType": "application/octet-stream",
    "Expires": "Wed, 13 Mar 2024 08:10:05 GMT",
    "Metadata": {}
} 

The problem can be replicated even with the current version by uploading a file with ofs (which does not populate the ETag field) and downloading it using AWS S3 SDK.

It seems object without ETag field was able to be downloaded using AWS CLI, but not AWS Java SDK.

After looking at the AWS SDK code it seems that AWS SDK will do a post-processing step that will validate the ETag field of the downloaded object to the object's content (See BinaryUtils#fromHex in AmazonS3Client#postProcessS3Object). If the ETag field is null, the post-processing step will skip the validation.

Currently, S3G returns a string "null" for the ETag field if the ETag field does not exist, which should cause the AWS SDK to not be able to parse the string since it md5 string is longer than the "null" string. This is most probably why there is an ArrayIndexOutOfBoundsException

The current solution is to not return the ETag field at all if the key does not contain ETag to begin with. This way the post processing step in the AWS SDK will not validate the md5 hash. These are applied for GET operations since other operations do not seem to have this post processing steps.

Future improvement: We might need to add some test coverage for AWS SDK (maybe in the integration tests) since the behavior might be different than AWS CLI.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10521

How was this patch tested?

Manual tests: Creating a S3 key with ofs and downloading it with the AWS Java SDK GetObject.

In the future after S3G support in MiniOzoneCluster HDDS-10390, we can have S3 compatibility tests using AWS SDK (on top of AWS CLI in acceptance test and unit tests).

Clean CI run: https://github.com/ivandika3/ozone/actions/runs/8277395144

@ivandika3 ivandika3 changed the title ETag field should not be returned during GetObject if the key does not contain ETag field HDDS-10521. ETag field should not be returned during GetObject if the key does not contain ETag field Mar 14, 2024
@ivandika3 ivandika3 marked this pull request as ready for review March 14, 2024 09:40
@ivandika3
Copy link
Contributor Author

@myskov @vtutrinov Could you help take a look if you have time?

@myskov
Copy link
Contributor

myskov commented Mar 14, 2024

Thanks @ivandika3 for the patch, lgtm

Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @ivandika3. LGTM.
As a side note, for now, S3 functionalities can be tested via robot tests, but they're all on top of AWS CLI. It's important to be able to test and debug S3 via AWS Java SDK. Feel free to take over HDDS-10390 if you're interested.

@ferhui ferhui merged commit 3d193fc into apache:master Mar 15, 2024
42 of 54 checks passed
@ferhui
Copy link
Contributor

ferhui commented Mar 15, 2024

@ivandika3 Thanks for contribution. @myskov @duongkame Thanks for review. Merged.

@ivandika3
Copy link
Contributor Author

Thank you @myskov @duongkame for the reviews and @ferhui for the merge.

@ivandika3 ivandika3 added the s3 S3 Gateway label Apr 16, 2024
ivandika3 added a commit to ivandika3/ozone that referenced this pull request Apr 18, 2024
… key does not contain ETag field (apache#6377)

(cherry picked from commit 3d193fc)
@ivandika3 ivandika3 self-assigned this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s3 S3 Gateway
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants