-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streaming images doesnt work #94
Comments
+1 I ran into the exact same issue today. Based on #3 I assumed that streams were supported in the Body of a putObject, but it appears that this will only work if the stream is created from fs.createReadStream(). The code that is throwing the exception is in the util.js file: byteLength: function byteLength(string) {
if (string === null || string === undefined) return 0;
if (typeof string === 'string') string = new Buffer(string);
if (string.length !== undefined) {
return string.length;
} else if (string.path !== undefined) {
return require('fs').lstatSync(string.path).size;
} else {
throw AWS.util.error(new Error(), {
message: 'Cannot determine length of ' + string, object: string
});
}
} I think that supporting this should be fairly simple, but will require a change to the API so that ContentLength can be specified by the caller. |
An easier workaround for this would be to set the .length property on the stream that you created, for instance: gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').stream(function(err, stdout, stderr) {
var data, request;
// set stream length
stdout.length = SIZE_HERE;
data = {
Bucket: 'my-bucket',
Key: 'image.jpg',
Body: stdout,
ContentType: mime.lookup('image.jpg')
};
s3.client.putObject(data, function(err, res) {
console.log('done');
});
}); The above should work as advertised. It also avoids messing around with extra stream specific logic (something we unfortunately have to do for fs streams) and makes use of a fairly standard JS convention. Though I do agree exposing ContentLength in S3 would be useful, but note that if it were added it would be available on a per-service basis only. |
That little trick worked for me, thanks! |
so how do i do that in my example? if i try to get the filesize with the gm-library and stream inside a callback i get this error from the request library: You cannot pipe after data has been emitted from the response. what i tried was that: gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').filesize({ bufferStream: true }, function(err, filesize) { this.stream(function(err, stdout, stderr) { var data, request; stdout.length = filesize; data = { Bucket: 'my-bucket', Key: 'image.jpg', Body: stdout, ContentType: mime.lookup('image.jpg') }; s3.client.putObject(data, function(err, res) { console.log('done'); }); }); }); |
I'm not sure how I'm going to close this since it's specific to a third party library. S3 requires a Content-Length to be provided on all payload requests, so this is something that must be supported by whatever third party library you use. |
FWIW I see a "filesize()" property in the gm docs that might allow you to get this value: http://aheckmann.github.io/gm/docs.html#getters |
Another workaround using just gm, aws-sdk, http, fs
|
This last trick, using fs instead of http.get, worked for me |
thanks perfect :) |
@aws @lsegal You either need to accept readable streams as per the ReadableStream documentation (which does not specify a length) or document somewhere that your implementation is limited in this way. If I have to know the size then I either need to load (potentially) very large amounts of data into memory or write it to disk, neither of which is a good option in the environment I am operating in. |
@terribleplan the SDK supports ReadableStream per the docs, and you do not need to specify a length with the stream. The issue is that the underlying service (S3) needs to know how many bytes are in your PUT request. Checking the There is not much the SDK can do about this limitation in S3, as needing to know the size of the payload is a requirement of the service. If the only way you can determine the size is to load large amounts of data into memory, this is not something the SDK could do much about-- again, the restriction comes from the service-- though I would strongly recommend buffering in chunks if you can so as to not load all data into memory at once (potentially writing out to disk for larger files if you're doing some kind of transform on the stream). That said, thanks for the feedback. I agree that better documentation about S3's limitation could be useful here, I will add a note to look into making this limitation more explicit. I would also recommend visiting S3's forums to put in a feature request to remove the Content-Length restriction. The ability to support streaming payloads would be a huge benefit not just for the JS SDK, but other tools as well. Let them know you believe this would be an important improvement to the service! |
@lsegal It look like some of I want can be done through the multipart API, and that |
@rcmonteiro your example may work, but it defeats the purpose of streaming. (your putting everything into memory) |
Google ranks this issue highly, so perhaps it's worth noting for the next person that http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property handles streaming. |
Ultimately they should copy some code from here https://github.com/nathanpeck/s3-upload-stream , as it is done right . |
@nmccready, according to its README, that is made obsolete by the functionality I linked. |
Ah nice, misread your post. So this supports streaming. Can there be an example where |
@nmccready you still don't need to use pipe, simply pass the stream object as the |
Something like this. Beware that I haven't run this so it probably contains typos. var https = require("follow-redirects").https;
var s3 = new (require("aws-sdk")).S3();
var gitHubToken = "hex string from github personal token";
module.exports = function(callback) {
var request = https.get(
{
host: "api.github.com",
path: "/repos/aws/aws-sdk-js/tarball/4a404cb8c06bba6b7b00c323671376c6377889ed?access_token="+gitHubToken,
headers: {
"User-Agent": "Unique user agent string",
},
function(response) {
return handleTarball(null, response, callback);
}
}
);
request.on("error", function(err) {
return handleTarball(err, null, callback);
});
};
var handleTarball = function(err, response, callback) {
if (err) {
return callback(err);
}
if (response.statusCode != 200) {
return callback(new Error("unsuccessful status code: "+response.statusCode));
}
s3.upload({
Bucket: "some-bucket",
Key: "aws-sdk-js-latest.tar.gz",
Body: response, // <--------------------- it's that simple if you use s3.upload
ContentType: response.headers["content-type"], // shouldn't hurt
}, callback);
}; |
I was definitely hoping that would be the case from how it's described in the docs, but in trying to do this, I still end up with this error: {
"errorMessage": "Cannot determine length of [object Object]",
"errorType": "Error",
"stackTrace": [
"byteLength (/var/runtime/node_modules/aws-sdk/lib/util.js:179:26)",
"ManagedUpload.fillBuffer (/var/runtime/node_modules/aws-sdk/lib/s3/managed_upload.js:385:19)",
"ManagedUpload.send (/var/runtime/node_modules/aws-sdk/lib/s3/managed_upload.js:199:33)",
"/var/runtime/node_modules/aws-sdk/lib/util.js:799:25",
"new Promise (<anonymous>)",
"ManagedUpload.promise (/var/runtime/node_modules/aws-sdk/lib/util.js:798:14)",
"/var/task/index.js:55:106",
"handler (/var/task/index.js:19:54)",
"<anonymous>",
"process._tickDomainCallback (internal/process/next_tick.js:228:7)"
]
} Basic flow of my code is reading a list of 'filenames' from s3, then fetching each as a stream, which are then written out as a stream using the s3 upload function. I'm using highland to handle some of the higher level concepts, but it just returns a node ReadableStream at the end, so I can't see why this would be an issue. const fileList = await client.listResultsFiles();
const mergedFiles = highland(fileList)
.map(client.streamFile)
.sequence()
.toNodeStream();
return client.uploadStream('results-AAAAAA.json')(mergedFiles) Helpers for reference const listResultsFiles = async function(): Promise<string[]> {
// List
const s3Objects = await s3.listObjectsV2({
...commonParams,
Prefix: cfg.resultsKeyPrefix
}).promise();
// ..snip.. some bits that make the filenames sortable
// Sort
return lodash
.sortBy(files, ['prefix', 'start', 'end', 'suffix'])
.map(({filename}) => filename);
};
const streamFile = (key: string) => {
const fileStream = s3.getObject({...commonParams, Key: key}).createReadStream();
return highland(fileStream)
};
const uploadStream = (destinationKey: string) => (streamToUpload: ReadableStream) => {
return s3.upload({
...commonParams,
Key: destinationKey,
Body: streamToUpload,
}).promise()
}; Edit: I'm not sure of the 'why', but this solved (or at least worked around) the issue for me, and now works as expected.. #1713 (comment) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
so i tried to stream an image (manipulated it with gm first), and did something like that:
that throws following error: Cannot determine length of [object Object]
of course it does, because in the 'afterBuild' event it tries to get the byte-length of the httpRequest-body.
but if my body is a stream, of course he can not get the content-length.
but thats only part1 of the issue. if i remove the 'afterBuild' for testing purposes, streaming still doesnt work. (the writeBody-method is called every couple of seconds, but the request is not finished).
i installed the module from github (not npm), to have the latest available sources...
The text was updated successfully, but these errors were encountered: