Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 upload MD5 not automatically populated #1236

Closed
emarc-m opened this issue Oct 4, 2019 · 8 comments
Closed

S3 upload MD5 not automatically populated #1236

emarc-m opened this issue Oct 4, 2019 · 8 comments
Assignees
Labels
bug Something isn't working pending-release Code has been merged but pending release s3 Issues with the AWS Android SDK for Simple Storage Service (S3).

Comments

@emarc-m
Copy link

emarc-m commented Oct 4, 2019

State your question

I've read the documentation for the MD5 file integrity check when uploading files to S3. Please see: https://aws-amplify.github.io/aws-sdk-android/docs/reference/com/amazonaws/services/s3/model/ObjectMetadata.html#getContentMD5--

It says on the documentation that

The AWS S3 Java client will attempt to calculate this field automatically when uploading files to Amazon S3.

However, looking at the database entry for the transfer utility, it does not contain any data about the MD5. Also, checking UploadTask#createPutObjectRequest L:457, upload.md5 is null.

Does this MD5 header need to be computed and set manually? Thank you.

Which AWS Services are you utilizing?

AWS S3

Provide code snippets (if applicable)
Java docs snippet:

/**
     * <p>
     * Sets the base64 encoded 128-bit MD5 digest of the associated object
     * (content - not including headers) according to RFC 1864. This data is
     * used as a message integrity check to verify that the data received by
     * Amazon S3 is the same data that the caller sent. If set to null,then the
     * MD5 digest is removed from the metadata.
     * </p>
     * <p>
     * This field represents the base64 encoded 128-bit MD5 digest digest of an
     * object's content as calculated on the caller's side. The ETag metadata
     * field represents the hex encoded 128-bit MD5 digest as computed by Amazon
     * S3.
     * </p>
     * <p>
     * The AWS S3 Android client will attempt to calculate this field
     * automatically when uploading files to Amazon S3.
     * </p>
     *
     * @param md5Base64 The base64 encoded MD5 hash of the content for the
     *            object associated with this metadata.
     * @see ObjectMetadata#getContentMD5()
     */
    public void setContentMD5(String md5Base64) {
        if (md5Base64 == null) {
            metadata.remove(Headers.CONTENT_MD5);
        } else {
            metadata.put(Headers.CONTENT_MD5, md5Base64);
        }

    }

Environment(please complete the following information):

  • SDK Version: 2.16.+ (Gradle)

Device Information (please complete the following information):

  • Device: Android Emulator
  • Android Version: Android 8.1 Google Play
@TrekSoft TrekSoft added bug Something isn't working s3 Issues with the AWS Android SDK for Simple Storage Service (S3). labels Oct 7, 2019
@raphkim raphkim self-assigned this Oct 8, 2019
@raphkim
Copy link
Contributor

raphkim commented Oct 8, 2019

Hi @emarc-m

To answer your question, the S3 client will automatically set and attach content MD5 hash to object metadata without having to do so manually. Would you be able to provide us with the code snippet you are using to set up your transfer utility object and to upload the file?

@emarc-m
Copy link
Author

emarc-m commented Oct 8, 2019

Hi @raphkim. Thanks for following up.

This is the Transfer utility initialization code:

     try {
            val latch = CountDownLatch(1)
            AWSMobileClient.getInstance().initialize(context,
                object : Callback<UserStateDetails> {
                    override fun onResult(result: UserStateDetails) {
                        latch.countDown()
                    }

                    override fun onError(error: Exception) {
                        throw error
                    }
                }
            )
            latch.await()

            val mobileClient = AWSMobileClient.getInstance()
            val regionString = AWSConfiguration(context)
                .optJsonObject("S3TransferUtility")
                .getString("Region")
            val region = Region.getRegion(regionString)
            s3Client = AmazonS3Client(mobileClient, region)

            transferUtility = TransferUtility.builder()
                .context(context)
                .s3Client(s3Client)
                .awsConfiguration(AWSConfiguration(context))
                .build()
        } catch (error: Exception) {
            throw IllegalStateException("Unable to initialize FileSync")
            error.printStackTrace()
        }

Here's a code snippet of the usage of the upload API from TransferUtility

data class MediaRecord(
    val uploaded: Boolean = false,
    val hash: String = "",
    val size: Int = -1
)

@WorkerThread
private fun uploadMedia(mediaRecord: MediaRecord): MediaRecord {
    // Done uploading this media resource
    if (mediaRecord.uploaded) {
        return mediaRecord
    }

    val mediaFile = getFileForMedia(mediaRecord)

    if (!mediaFile.exists() || !mediaFile.isFile) {
        return mediaRecord
    }

    val latch = CountDownLatch(1)
    var isUploaded = false
    transferUtility.upload(mediaRecord.hash, mediaFile)
            .setTransferListener(object : TransferListener {
                override fun onProgressChanged(id: Int, bytesCurrent: Long, bytesTotal: Long) {
                    Timber.d("Upload update for ${mediaRecord.hash} with ID: $id, current: $bytesCurrent, total: $bytesTotal")
                }

                override fun onStateChanged(id: Int, state: TransferState?) {
                    state ?: return
                    Timber.d("Upload state change for ${mediaRecord.hash} with ID: $id, state: $state")
                    when (state) {
                        TransferState.COMPLETED -> {
                            isUploaded = true
                            latch.countDown()
                        }
                        else -> {
                            // do nothing
                        }
                    }
                }

                override fun onError(id: Int, error: Exception?) {
                    Timber.d(error, "Error uploading ${mediaRecord.hash} with ID: $id")
                    isUploaded = false
                    latch.countDown()
                }
            })
    latch.await()

    return mediaRecord.copy(uploaded = isUploaded)
}

@raphkim
Copy link
Contributor

raphkim commented Oct 8, 2019

May I ask your use-case for obtaining MD5? The MD5 hash is attached in the http request header and the integrity of file upload is checked automatically in the backend.

@emarc-m
Copy link
Author

emarc-m commented Oct 8, 2019

I'm trying to confirm that the SDK is doing what was stated in the Javadocs to ensure that the MD5 consistency check is actually happening. I wanted to be assured that the file will not be corrupted during transport.

I tried checking the existence of the computed MD5 value in the following places:

  • awss3transfertable.db, content_md5 entry is null
  • UploadTask#createPutObjectRequest L:457 upload#md5 is null as well

Is there something I miss with to check?

@raphkim
Copy link
Contributor

raphkim commented Oct 8, 2019

Thank you for reporting this issue. I identified the root cause to be that the high level client TransferUtility does not inherit metadata from the low level client AmazonS3Client. MD5 consistency check should still be performed by the low level client during putObject to confirm that the file was not corrupted during transport. This can be checked with a breakpoint in AmazonS3Client#putObject L:1978 to make sure that it enters the branch without throwing an error.

@emarc-m
Copy link
Author

emarc-m commented Oct 8, 2019

I understand. I can confirm that AmazonS3Client#putObject L:1978 does indeed do the MD5 verification check.

Will this issue be address in a future version of the SDK?

Thank you for looking into this.

@raphkim
Copy link
Contributor

raphkim commented Oct 8, 2019

@emarc-m
To further clarify, content-md5 field in TransferRecord database is used to store user's custom MD5 hash so that it can be later used in content validation process if user desires. It is not meant to store the content MD5 hash that was generated by S3 client (which is generated only if the user did not provide a custom MD5 hash). Therefore, you will not be able to access the client-generated content MD5 from TransferUtility layer.

Meanwhile, low level client's putObject request and server response's metadata should both have content MD5 field populated unless MD5 check was disabled. However, I realized that it fails to propagate content MD5 to server response's metadata, and instead only has ETag field populated, which is the server computed hash used for validation (basically the same data, except stored in different encoding format). This will be corrected in our future release to also include content MD5 in PutObjectResult object.

I hope this helps.

@emarc-m
Copy link
Author

emarc-m commented Oct 8, 2019

Thank you for clarifying.

Please feel free to close this once #1240 is merged.

@mutablealligator mutablealligator added the pending-release Code has been merged but pending release label Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending-release Code has been merged but pending release s3 Issues with the AWS Android SDK for Simple Storage Service (S3).
Projects
None yet
Development

No branches or pull requests

5 participants