-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploading small files of unknown size uses too much memory #848
Comments
Please use master to override the part size by providing the right size or use the min part size override. Or for now if you wish to use 3.0.3 - fork the code and change it locally. |
This is done to ensure that caller can control the amount of memory being used when uploading a stream without knowing prior length. Fixes minio#848
As stated above, the file size is unknown. I can't find anything in the documentation about a min part size override. What do you mean? |
File size cannot be unknown in S3 .We just allow it so that we can buffer and upload as a convenience. You shouldn't build an application around this behavior when you don't have enough RAM. In any case there is a PR submitted to allow this to be configurable.. so that the overall buffer limit can be set manually. NOTE: this will also change the total size of the object you can upload. |
Thanks for the PR, that looks perfect. |
We are not going to fix in the way mentioned by PR #849 instead i closed it. Here is a way to achieve what you want in a more idiomatic way. package main
import (
"bytes"
"fmt"
"io"
"log"
"os"
"github.com/cheggaaa/pb"
minio "github.com/minio/minio-go"
)
func main() {
s3Client, err := minio.New("play.minio.io:9000", "Q3AM3UQ867SPQQA43P2F", "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG", true)
if err != nil {
log.Fatalln(err)
}
var (
totalParts = 10000
partSize = int64(10 * 1024 * 1024)
i = 0
)
var srcs = make([]minio.SourceInfo, 10000)
progress := pb.New64(-1)
progress.Start()
var buf = make([]byte, partSize)
for i < totalParts {
length, rErr := io.ReadFull(os.Stdin, buf)
if rErr == io.EOF && i > 1 {
break
}
if rErr != nil && rErr != io.ErrUnexpectedEOF {
log.Fatalln(rErr)
}
s3Client.TraceOn(os.Stdout)
if _, err = s3Client.PutObject("testbucket", fmt.Sprintf("parts/%d", i), bytes.NewReader(buf[:length]), int64(len(buf[:length])),
minio.PutObjectOptions{
Progress: progress,
}); err != nil {
log.Fatalln(err)
}
// Source objects to concatenate.
srcs[i] = minio.NewSourceInfo("testbucket", fmt.Sprintf("parts/%d", i), nil)
i++
}
// Create destination info
dst, err := minio.NewDestinationInfo("testbucket", "final-object", nil, nil)
if err != nil {
log.Fatalln(err)
}
if err = s3Client.ComposeObject(dst, srcs[:i]); err != nil {
log.Fatalln(err)
}
objectsCh := make(chan string)
// Send object names that are needed to be removed to objectsCh
go func() {
defer close(objectsCh)
for j := 0; j < i+1; j++ {
objectsCh <- fmt.Sprintf("parts/%d", j)
}
}()
// Call RemoveObjects API
for e := range s3Client.RemoveObjects("my-bucketname", objectsCh) {
log.Fatalln("Failed to remove " + e.ObjectName + ", error: " + e.Err.Error())
}
} |
Your suggestion is for me to reimplement your mulitpart upload logic? Copying code out of a library just to change a buffer size is messy. The ability to bound minio-go's memory usage is a critical feature, IMHO. I suggest making the constants configurable via the Client object. These days there are many different S3-compatible services (I'm using Digital Ocean's Spaces), and they likely have different size limits anyway. |
No not at all. I am not talking about that at all. If you see I am not using low level APIs. What I am suggesting is redesign your application. |
Also notice here ComposeObject does a server side copy. So it is in fact quite beneficial and more efficient behavior for your requirement. Size being -1 for PutObject and also coupled with not having even 576MiB is unusual requirement. Changing client API for this will lead to more confusion for our users for an uncommon use case, we are not going to do that. So the above approach addresses your problem by letting you manage the buffer instead since you know the right buffer for your env and also allow the flexibility of having an upper bound for your total data transfer. |
Fair enough, and thanks for the example code. But I ran into a problem with it:
It looks like Digital Ocean's S3 implementation doesn't support object composition from multiple objects as you have it implemented in I need to implement this as an actual multipart upload (where you get an upload ID from S3). This means I'd make a custom version of Do you have a suggestion for how to proceed? |
CopyObjectPart is a proper locked operation so i don't see any storage implementation Minio or AWS S3 would implement replacing of a source object a racy situation while it is being read to be copied over on the other end @rwestlund. So this API is safe to be used even in concurrent situations. Now that digital ocean doesn't implement it yet you have two options..
|
The problem I see in your example is with uploading each chunk as a normal object called "parts/1" and "parts/2", etc. If I started a second upload, it would overwrite "parts/1" while the first upload was still uploading "parts/2". When the first upload finally starts the server-side copy, it could get parts of the first file and parts of the second file mixed together. I could generate a UUID for each upload to avoid object name collisions, but this is still a hacky way to upload a large file. S3's multipart API is there for a reason. I'll check whether the AWS SDK or any of the smaller S3 libraries allow multipart uploads with bounded memory, then fork minio-go as a last resort. |
That is just an example, you can indeed use a uuid. It is not a hacky way. It is pretty neat functionality used by some backup software solutions. You can also look for minio-go#Core API to get direct access to multipart API if needed. Feel free to close this issue, please reopen if you have further questions. |
Ah, the Core API is what I needed. Now I have multipart uploads with bounded memory :) Thanks for your help, I really appreciate it. |
I think we don't have to allocate We can start with some smaller amount, like 512 bytes, and increase it as we go, just like |
Version
minio-go 3.0.3
FreeBSD 11.1/amd64
Go 1.9
Background
My app accepts multipart file uploads over HTTP (thus the file size can not be known) and uses this library to stream them directly into an S3-compatible store. I run it on a small VPS with 512 MiB RAM and a tiny hard drive. The purpose of streaming them directly into S3 is to prevent buffering large files in RAM or on disk.
Expected Behavior
I should be able to stream small files of unknown size into S3 without using lots of memory.
Actual Behavior
When minio-go starts uploading a file of unknown size,
optimalPartInfo(-1)
assumes the worst case of needing to fit a 5 TiB file within 10000 chunks and allocates a 576 MiB buffer.The Problem
Allocating this massive buffer causes my server to grind to a halt for 60 seconds while the page daemon swaps to make room for this large allocation that can't fit in RAM. This is silly when I know I'll never handle a file larger than 10 GiB.
Temporary Workaround
Fork and modify
constants.go
.Solutions
PutObjectOptions
struct you're adding could have a field for the maximum buffer size or maximum expected file size.The text was updated successfully, but these errors were encountered: