Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azblob] file upload validation #20092

Closed
vijayrajah opened this issue Feb 23, 2023 · 9 comments
Closed

[azblob] file upload validation #20092

vijayrajah opened this issue Feb 23, 2023 · 9 comments
Assignees
Labels
AzBlob bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Milestone

Comments

@vijayrajah
Copy link

Bug Report

I'm trying to upload a file to azure storage (v2) using
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob

it is not clear (at least to me ) how to do upload validation using MD5.

I have added this in code

opts := azblob.UploadFileOptions{
		TransactionalContentMD5: md5sum,
		AccessTier:              to.Ptr(blob.AccessTierCool),
	}

I'm computing MD5 using

h := md5.New()
	if _, err := io.Copy(h, fil); err != nil {
		return errors.Wrap(err, "unable to compute MD5 for the file "+f)
	}
md5sum := h.Sum(nil)

The Md5 from service and my locally computed MD5 do not match, but the upload still proceeds, when it should fail.

Am I missing something? is there a code sample that I can refer to?

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 23, 2023
@ghost
Copy link

ghost commented Feb 23, 2023

Hi @vijayrajah. Thank you for your feedback and we will look into it soon. Meanwhile, feel free to share your experience using the Azure SDK in this survey.

@jhendrixMSFT jhendrixMSFT added Storage Storage Service (Queues, Blobs, Files) and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Feb 23, 2023
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Feb 23, 2023
@vibhansa-msft vibhansa-msft added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Feb 27, 2023
@siminsavani-msft
Copy link
Contributor

Hi @vijayrajah ! If it is possible, could you please provide your code from start to finish?

@vijayrajah
Copy link
Author

Here is the function that does the upload

import (
	"context"
	"crypto/md5"
	"encoding/hex"
	"fmt"
	"github.com/Azure/azure-sdk-for-go/sdk/azcore"
	"github.com/Azure/azure-sdk-for-go/sdk/azcore/policy"
	"github.com/Azure/azure-sdk-for-go/sdk/azcore/to"
	"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
	"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob"
	"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob"
	"github.com/pkg/errors"
	"io"
	"os"
	"strings"
	"time"
)

func uploadFileToStorage(ctx context.Context, f string) error {

	fil, err := os.Open(f)
	if err != nil {
		log.Error().Err(err).Msgf("unable to open file %s", f)
		return err
	}

	fn := strings.Split(f, "/")
	blobname := fn[len(fn)-1]

	// calculate md5 sum
	h := md5.New()
	if _, err := io.Copy(h, fil); err != nil {
		return errors.Wrap(err, "unable to compute MD5 for the file "+f)
	}
	md5sum := h.Sum(nil)

	opts := azblob.UploadFileOptions{
		TransactionalContentMD5: md5sum,
		AccessTier:              to.Ptr(blob.AccessTierCool),
	}

	resp, err := client.UploadFile(ctx, contName, blobname, fil, &opts)
	if err != nil {
		log.Error().Err(err).Msgf("Error while uploading file %s to azure storage", f)
		return err
	}

	md5sumStr := hex.EncodeToString(h.Sum(nil))
	respMd5 := hex.EncodeToString(resp.ContentMD5)

	log.Info().Msgf("Uploaded file %s with server md5 sum %s & local md5 %s", f, respMd5, md5sumStr)

	return nil
}

There are other funcs in the code. This is the one that does the upload

client is an package level variable of type *azblob.Client. We use Azure SP for auth & this auth process sets this client

log is zerolog.Logger

Let me know if you need any additional info.

@siminsavani-msft
Copy link
Contributor

Thanks for sharing this! What is the size of your file that is being uploaded?

@vijayrajah
Copy link
Author

vijayrajah commented Mar 2, 2023

@siminsavani-msft -- it varies from few MB's to few 10's of GB (depending on the app ). At this time I'm doing a POC for upload validation. It is important for us to validate the upload

@siminsavani-msft
Copy link
Contributor

I believe I was able to catch the bug. I will update this thread with a PR soon. Thanks!

@siminsavani-msft
Copy link
Contributor

siminsavani-msft commented May 16, 2023

Hi @vijayrajah ! Sorry for the late update. As I was working on the checksums in azblob package, I realized that UploadFile is a multipart upload and the way that UploadFile works is by uploading files in blocks to a block blob and checksums cannot be validated at a block level.

We are planning on deprecating the options TransactionalContentCRC64 and TransactionalContentMD5 in UploadFileOptions.

Thank you for reporting this issue!

@vijayrajah
Copy link
Author

@siminsavani-msft -- how do we validate the file upload completed successfully? How do we validate the file checksum?

@siminsavani-msft
Copy link
Contributor

@vijayrajah , I would recommend replacing UploadFile with Upload and UploadFileOptions with UploadOptions as seen in your original code.

Just as an FYI, this will likely not work after our next release as I am still working on cleaning up the Transactional checksums and we will be deprecating TransactionalContentMD5 in UploadOptions and replacing it with a different name and set up as seen in AppendBlob: https://github.com/Azure/azure-sdk-for-go/blob/36055639a4d318aa8aa0e4d9995a2eee60ced75e/sdk/storage/azblob/appendblob/models.go#LL75C1-L80C36

It won't be too much of a breaking change, just cleaner way to use MD5 and CRC64 checksums. I will update this thread once that happens!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AzBlob bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants