Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example putObjectWithProcess is incorrect. #2468

Closed
ibrt opened this issue Feb 20, 2019 · 4 comments · Fixed by #3377 or #3472
Closed

Example putObjectWithProcess is incorrect. #2468

ibrt opened this issue Feb 20, 2019 · 4 comments · Fixed by #3377 or #3472
Labels
bug This issue is a bug.

Comments

@ibrt
Copy link

ibrt commented Feb 20, 2019

Please fill out the sections below to help us address your issue.

Version of AWS SDK for Go?

Any

Version of Go (go version)?

go11.5

What issue did you see?

The sample code at https://github.com/aws/aws-sdk-go/tree/master/example/service/s3/putObjectWithProcess is incorrect.

Steps to reproduce

From #1868:

"the example is broken... What happens is that the uploader reads the whole thing twice, haven't dug into why but I think it has to do with determining the number of parts. The first read is very fast as the contents are discarded. The second actually seems to be piped to the upload.

The sample reports progress by dividing the number of bytes read by two, so what I'm seeing is that it jumps straight to 50%, and then proceeds to 100% at half the actual "speed". I "fixed" it by initializing read to -size, starting to report when read passes 0."

Sample fix:

type progressReader struct {
	progBar *pb.ProgressBar
	fd      *os.File
	size    int64
	read    int64
}

func (r *progressReader) Read(p []byte) (int, error) {
	return r.fd.Read(p)
}

func (r *progressReader) ReadAt(p []byte, off int64) (int, error) {
	n, err := r.fd.ReadAt(p, off)
	atomic.AddInt64(&r.read, int64(n))
	if read := atomic.LoadInt64(&r.read); read >= 0 {
		r.progBar.SetCurrent(read)
	}
	return n, errors.MaybeWrap(err)
}

func (r *progressReader) Seek(offset int64, whence int) (int64, error) {
	return r.fd.Seek(offset, whence)
}

// ...

stat, err := os.Stat(srcPath)
// ...

fd, err := os.Open(srcPath)
// ...

progBar := pb.New64(stat.Size()).Set(pb.Bytes, true)
progBar.Start()
defer progBar.Finish()

r := &progressReader{
	progBar: progBar,
	fd:      fd,
	size:    stat.Size(),
	read:    -stat.Size(),
}

_, err = s3manager.NewUploaderWithClient(awsS3).Upload(&s3manager.UploadInput{
	Bucket: &bucketName,
	Key:    &bucketKey,
	Body:   r,
})
@xinst
Copy link
Contributor

xinst commented Feb 22, 2019

I was confused before but now I think I got the answer.

when we upload a file use the API

output, err := uploader.Upload(&s3manager.UploadInput{
		Bucket: aws.String(bucket),
		Key:    aws.String(key),
		Body:   reader,
	})

first step: it will construct a chunk, it includes a SectionReader which will call the interface ReadAt, and this will increase the number of read, after that, the chunk will be send a channel.
second step: there are some go routines (the number you have set) read from the channel, and then it just to start sending data to S3, during the request you can find a Body io.ReadSeeker the Request structure, and the HTTP request reads data by this ReadSeeker, it will call the interface Read again, that's why length of read is double of file's length.

maybe the first step very fast, and you will see the progress jumps straight to 50%(if the read initializing to 0), so I think there is no obvious difference between by initializing read to -size and by dividing the number of bytes read by two. Personally, set the read to -size is a little strange.

Here is the code in the SDK

aws/aws-sdk-go/service/s3/s3manager/upload.go

// nextReader returns a seekable reader representing the next packet of data.
// This operation increases the shared u.readerPos counter, but note that it
// does not need to be wrapped in a mutex because nextReader is only called
// from the main thread.
func (u *uploader) nextReader() (io.ReadSeeker, int, []byte, error) {
	type readerAtSeeker interface {
		io.ReaderAt
		io.ReadSeeker
	}
	switch r := u.in.Body.(type) {
	case readerAtSeeker:
		var err error

		n := u.cfg.PartSize
		if u.totalSize >= 0 {
			bytesLeft := u.totalSize - u.readerPos

			if bytesLeft <= u.cfg.PartSize {
				err = io.EOF
				n = bytesLeft
			}
		}

		reader := io.NewSectionReader(r, u.readerPos, n)
		u.readerPos += n

		return reader, int(n), nil, err

	default:
		part := u.bufferPool.Get().([]byte)
		n, err := readFillBuf(r, part)
		u.readerPos += int64(n)

		return bytes.NewReader(part[0:n]), n, part, err
	}
}

aws-sdk-go/aws/request/request.go

// A Request is the service request to be made.
type Request struct {
	Config     aws.Config
	ClientInfo metadata.ClientInfo
	Handlers   Handlers

	Retryer
	Time                   time.Time
	Operation              *Operation
	HTTPRequest            *http.Request
	HTTPResponse           *http.Response
	Body                   io.ReadSeeker
	BodyStart              int64 // offset from beginning of Body that the request body starts
	Params                 interface{}
	Error                  error
	Data                   interface{}
	RequestID              string
	RetryCount             int
	Retryable              *bool
	RetryDelay             time.Duration
	NotHoist               bool
	SignedHeaderVals       http.Header
	LastSignedAt           time.Time
	DisableFollowRedirects bool

	// A value greater than 0 instructs the request to be signed as Presigned URL
	// You should not set this field directly. Instead use Request's
	// Presign or PresignRequest methods.
	ExpireTime time.Duration

	context aws.Context

	built bool

	// Need to persist an intermediate body between the input Body and HTTP
	// request body because the HTTP Client's transport can maintain a reference
	// to the HTTP request's body after the client has returned. This value is
	// safe to use concurrently and wrap the input Body for each HTTP request.
	safeBody *offsetReader
}

@diehlaws diehlaws added the bug This issue is a bug. label Feb 25, 2019
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Mar 1, 2019
Fixes the S3 Upload manager example to correctly track the progress
uploading a file to S3. This change is able to correct the issue of
double size counting by disabling computing the hash of the file to
upload.

Fix aws#2468
@jasdel
Copy link
Contributor

jasdel commented Mar 4, 2019

Thanks for the update and feedback. The double read is triggered by the SDK's need to compute a hash of the payload to upload, so that it can be included in the request's signature, along with determining the length of the payload to send.

We're looking at optimizations for the request signature's payload hash for S3, that we might be able to implement in the future removing the need to compute the hash for a payload over TLS, but this work is still outstanding and I don't have a timeline on it sorry.

With that said I think the example can workaround this double read of the payload by changing when the progress tracker is injected. Instead of wrapping the reader to be read from, we can inject a request handler into the Uploader that will keep track of the payload chunks as they individually are sent to S3.

I put together PR #2456 that fixes this.

@jasdel jasdel closed this as completed Mar 4, 2019
@diehlaws
Copy link
Contributor

Re-opening this to track fixing getObjectWithProgress and putObjectWithProgress examples.

@diehlaws diehlaws reopened this Jun 19, 2019
jasdel pushed a commit that referenced this issue Aug 10, 2020
… with progress (#3377)

Fixes #2468 by ignoring the first read of the progress reader wrapper. Since the first read is used for signing the request, not upload progress.
aws-sdk-go-automation pushed a commit that referenced this issue Aug 10, 2020
===

### Service Client Updates
* `service/ec2`: Updates service API and documentation
  * Remove CoIP Auto-Assign feature references.
* `service/glue`: Updates service API and documentation
  * Starting today, you can further control orchestration of your ETL workloads in AWS Glue by specifying the maximum number of concurrent runs for a Glue workflow.
* `service/savingsplans`: Updates service API

### SDK Enhancements
* `aws/credentials/stscreds`: Add optional expiry duration to WebIdentityRoleProvider ([#3356](#3356))
  * Adds a new optional field to the WebIdentityRoleProvider that allows you to specify the duration the assumed role credentials will be valid for.
* `example/service/s3/putObjectWithProgress`: Fix example for file upload with progress ([#3377](#3377))
  * Fixes [#2468](#2468) by ignoring the first read of the progress reader wrapper. Since the first read is used for signing the request, not upload progress.
  * Updated the example to write progress inline instead of newlines.
* `service/dynamodb/dynamodbattribute`: Fix typo in package docs ([#3446](#3446))
  * Fixes typo in dynamodbattribute package docs.
aws-sdk-go-automation added a commit that referenced this issue Aug 10, 2020
Release v1.34.1 (2020-08-10)
===

### Service Client Updates
* `service/ec2`: Updates service API and documentation
  * Remove CoIP Auto-Assign feature references.
* `service/glue`: Updates service API and documentation
  * Starting today, you can further control orchestration of your ETL workloads in AWS Glue by specifying the maximum number of concurrent runs for a Glue workflow.
* `service/savingsplans`: Updates service API

### SDK Enhancements
* `aws/credentials/stscreds`: Add optional expiry duration to WebIdentityRoleProvider ([#3356](#3356))
  * Adds a new optional field to the WebIdentityRoleProvider that allows you to specify the duration the assumed role credentials will be valid for.
* `example/service/s3/putObjectWithProgress`: Fix example for file upload with progress ([#3377](#3377))
  * Fixes [#2468](#2468) by ignoring the first read of the progress reader wrapper. Since the first read is used for signing the request, not upload progress.
  * Updated the example to write progress inline instead of newlines.
* `service/dynamodb/dynamodbattribute`: Fix typo in package docs ([#3446](#3446))
  * Fixes typo in dynamodbattribute package docs.
@u00t
Copy link

u00t commented Aug 25, 2023

is there any good example to track file uploading progress coded in golang?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
5 participants