Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming images doesnt work #94

Closed
hereandnow opened this issue Apr 5, 2013 · 21 comments
Closed

streaming images doesnt work #94

hereandnow opened this issue Apr 5, 2013 · 21 comments

Comments

@hereandnow
Copy link

so i tried to stream an image (manipulated it with gm first), and did something like that:

gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').stream(function(err, stdout, stderr) {
  var data, request;

  data = {
    Bucket: 'my-bucket',
    Key: 'image.jpg',
    Body: stdout,
    ContentType: mime.lookup('image.jpg')
  };
  s3.client.putObject(data, function(err, res) {
   console.log('done');
  });
});

that throws following error: Cannot determine length of [object Object]

of course it does, because in the 'afterBuild' event it tries to get the byte-length of the httpRequest-body.

but if my body is a stream, of course he can not get the content-length.

but thats only part1 of the issue. if i remove the 'afterBuild' for testing purposes, streaming still doesnt work. (the writeBody-method is called every couple of seconds, but the request is not finished).

i installed the module from github (not npm), to have the latest available sources...

@ajkerr
Copy link
Contributor

ajkerr commented Apr 5, 2013

+1 I ran into the exact same issue today. Based on #3 I assumed that streams were supported in the Body of a putObject, but it appears that this will only work if the stream is created from fs.createReadStream().

The code that is throwing the exception is in the util.js file:

    byteLength: function byteLength(string) {
      if (string === null || string === undefined) return 0;
      if (typeof string === 'string') string = new Buffer(string);

      if (string.length !== undefined) {
        return string.length;
      } else if (string.path !== undefined) {
        return require('fs').lstatSync(string.path).size;
      } else {
        throw AWS.util.error(new Error(), {
          message: 'Cannot determine length of ' + string, object: string
        });
      }
    }

I think that supporting this should be fairly simple, but will require a change to the API so that ContentLength can be specified by the caller.

@lsegal
Copy link
Contributor

lsegal commented Apr 5, 2013

An easier workaround for this would be to set the .length property on the stream that you created, for instance:

gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').stream(function(err, stdout, stderr) {
  var data, request;

  // set stream length
  stdout.length = SIZE_HERE;

  data = {
    Bucket: 'my-bucket',
    Key: 'image.jpg',
    Body: stdout,
    ContentType: mime.lookup('image.jpg')
  };
  s3.client.putObject(data, function(err, res) {
   console.log('done');
  });
});

The above should work as advertised. It also avoids messing around with extra stream specific logic (something we unfortunately have to do for fs streams) and makes use of a fairly standard JS convention.

Though I do agree exposing ContentLength in S3 would be useful, but note that if it were added it would be available on a per-service basis only.

@ajkerr
Copy link
Contributor

ajkerr commented Apr 5, 2013

That little trick worked for me, thanks!

@hereandnow
Copy link
Author

so how do i do that in my example? if i try to get the filesize with the gm-library and stream inside a callback i get this error from the request library:

You cannot pipe after data has been emitted from the response.

what i tried was that:

gm(request('http://www.some-domain.com/image.jpg'), 'image.jpg').filesize({ bufferStream: true }, function(err, filesize) {
  this.stream(function(err, stdout, stderr) {
    var data, request;
    stdout.length = filesize;
    data = {
      Bucket: 'my-bucket',
      Key: 'image.jpg',
      Body: stdout,
      ContentType: mime.lookup('image.jpg')
    };
    s3.client.putObject(data, function(err, res) {
     console.log('done');
    });
  });
});

@lsegal
Copy link
Contributor

lsegal commented Apr 6, 2013

I'm not sure how gm() works, but you're likely going to have to ask the gm stream, not the request stream, how many bytes it will be generating. Even if request() gave you a number, it won't be the same number that your manipulated image is. It may not even be possible to get the size via streams with that library, in which case you would probably want to write to disk first (or an in-memory buffer) and then stream from there. I don't know for sure though; it would be best to check the docs linked from https://npmjs.org/package/gm

I'm going to close this since it's specific to a third party library. S3 requires a Content-Length to be provided on all payload requests, so this is something that must be supported by whatever third party library you use.

@lsegal lsegal closed this as completed Apr 6, 2013
@lsegal
Copy link
Contributor

lsegal commented Apr 6, 2013

FWIW I see a "filesize()" property in the gm docs that might allow you to get this value: http://aheckmann.github.io/gm/docs.html#getters

@rcmonteiro
Copy link

Another workaround using just gm, aws-sdk, http, fs

  http.get('http://s3-sa-east-1.amazonaws.com/bucket/path/image.jpg', function(res) {
    if(res.statusCode != 200) {
      console.log("Err\n");
    } else {
      gm(res).resize(w, h, '^').gravity('Center').extent(w, h).quality(80).stream(function(err, stdout, stderr) {
        var buf = new Buffer(0);
        stdout.on('data', function(d) {
          buf = Buffer.concat([buf, d]);
        });
        stdout.on('end', function() {
          var data = {
            Bucket: bucket,
            Key: 'pathtoimage/thumb.jpg',
            Body: buf
          };
          s3.client.putObject(data, function(err, resp) {
            console.log("Done\n");
          });
        });
      });
    }
  });

@adeleinr
Copy link

adeleinr commented Dec 2, 2013

This last trick, using fs instead of http.get, worked for me

@petermilan
Copy link

thanks perfect :)

@terribleplan
Copy link

@aws @lsegal You either need to accept readable streams as per the ReadableStream documentation (which does not specify a length) or document somewhere that your implementation is limited in this way.

If I have to know the size then I either need to load (potentially) very large amounts of data into memory or write it to disk, neither of which is a good option in the environment I am operating in.

@lsegal
Copy link
Contributor

lsegal commented Sep 14, 2014

@terribleplan the SDK supports ReadableStream per the docs, and you do not need to specify a length with the stream. The issue is that the underlying service (S3) needs to know how many bytes are in your PUT request. Checking the .length property on the stream is just a convenience (since it is implemented in readable streams returned by the fs module)-- you can alternatively provide a regular vanilla stream and pass the byte length as the ContentLength parameter to the putObject operation.

There is not much the SDK can do about this limitation in S3, as needing to know the size of the payload is a requirement of the service. If the only way you can determine the size is to load large amounts of data into memory, this is not something the SDK could do much about-- again, the restriction comes from the service-- though I would strongly recommend buffering in chunks if you can so as to not load all data into memory at once (potentially writing out to disk for larger files if you're doing some kind of transform on the stream).

That said, thanks for the feedback. I agree that better documentation about S3's limitation could be useful here, I will add a note to look into making this limitation more explicit. I would also recommend visiting S3's forums to put in a feature request to remove the Content-Length restriction. The ability to support streaming payloads would be a huge benefit not just for the JS SDK, but other tools as well. Let them know you believe this would be an important improvement to the service!

@terribleplan
Copy link

@lsegal It look like some of I want can be done through the multipart API, and that s3-upload-stream wraps it nicely, so I doubt any change will be made since there are probably benefits on the s3 side knowing the size of the upload in advance.

@nmccready
Copy link

@rcmonteiro your example may work, but it defeats the purpose of streaming. (your putting everything into memory)

@kindlyseth
Copy link

Google ranks this issue highly, so perhaps it's worth noting for the next person that http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property handles streaming.

@nmccready
Copy link

Ultimately they should copy some code from here https://github.com/nathanpeck/s3-upload-stream , as it is done right .

@kindlyseth
Copy link

@nmccready, according to its README, that is made obsolete by the functionality I linked.

@nmccready
Copy link

Ah nice, misread your post. So this supports streaming. Can there be an example where .pipe is being used?

@lsegal
Copy link
Contributor

lsegal commented Jul 27, 2016

@nmccready you still don't need to use pipe, simply pass the stream object as the body parameter of the upload() call as shown in the doc examples. The stream object depends on the library you are using, but it's typically the thing you are trying to call .pipe on.

@kindlyseth
Copy link

Something like this. Beware that I haven't run this so it probably contains typos.

var https = require("follow-redirects").https;
var s3 = new (require("aws-sdk")).S3();

var gitHubToken = "hex string from github personal token";

module.exports = function(callback) {
  var request = https.get(
    {
      host: "api.github.com",
      path: "/repos/aws/aws-sdk-js/tarball/4a404cb8c06bba6b7b00c323671376c6377889ed?access_token="+gitHubToken,
      headers: {
        "User-Agent": "Unique user agent string",
      },
      function(response) {
        return handleTarball(null, response, callback);
      }
    }
  );
  request.on("error", function(err) {
    return handleTarball(err, null, callback);
  });
};
var handleTarball = function(err, response, callback) {
  if (err) {
    return callback(err);
  }
  if (response.statusCode != 200) {
    return callback(new Error("unsuccessful status code: "+response.statusCode));
  }
  s3.upload({
    Bucket: "some-bucket",
    Key: "aws-sdk-js-latest.tar.gz",
    Body: response, // <--------------------- it's that simple if you use s3.upload
    ContentType: response.headers["content-type"], // shouldn't hurt
  }, callback);
};

@0xdevalias
Copy link

0xdevalias commented Feb 21, 2019

it's that simple if you use s3.upload

I was definitely hoping that would be the case from how it's described in the docs, but in trying to do this, I still end up with this error:

{
  "errorMessage": "Cannot determine length of [object Object]",
  "errorType": "Error",
  "stackTrace": [
    "byteLength (/var/runtime/node_modules/aws-sdk/lib/util.js:179:26)",
    "ManagedUpload.fillBuffer (/var/runtime/node_modules/aws-sdk/lib/s3/managed_upload.js:385:19)",
    "ManagedUpload.send (/var/runtime/node_modules/aws-sdk/lib/s3/managed_upload.js:199:33)",
    "/var/runtime/node_modules/aws-sdk/lib/util.js:799:25",
    "new Promise (<anonymous>)",
    "ManagedUpload.promise (/var/runtime/node_modules/aws-sdk/lib/util.js:798:14)",
    "/var/task/index.js:55:106",
    "handler (/var/task/index.js:19:54)",
    "<anonymous>",
    "process._tickDomainCallback (internal/process/next_tick.js:228:7)"
  ]
}

Basic flow of my code is reading a list of 'filenames' from s3, then fetching each as a stream, which are then written out as a stream using the s3 upload function. I'm using highland to handle some of the higher level concepts, but it just returns a node ReadableStream at the end, so I can't see why this would be an issue.

  const fileList = await client.listResultsFiles();
  const mergedFiles = highland(fileList)
    .map(client.streamFile)
    .sequence()
    .toNodeStream();

  return client.uploadStream('results-AAAAAA.json')(mergedFiles)

Helpers for reference

const listResultsFiles = async function(): Promise<string[]> {
    // List
    const s3Objects = await s3.listObjectsV2({
      ...commonParams,
      Prefix: cfg.resultsKeyPrefix
    }).promise();

    // ..snip.. some bits that make the filenames sortable

    // Sort
    return lodash
      .sortBy(files, ['prefix', 'start', 'end', 'suffix'])
      .map(({filename}) => filename);
  };

  const streamFile = (key: string) => {
    const fileStream = s3.getObject({...commonParams, Key: key}).createReadStream();
    return highland(fileStream)
  };

  const uploadStream = (destinationKey: string) => (streamToUpload: ReadableStream) => {
    return s3.upload({
      ...commonParams,
      Key: destinationKey,
      Body: streamToUpload,
    }).promise()
  };

Edit: I'm not sure of the 'why', but this solved (or at least worked around) the issue for me, and now works as expected.. #1713 (comment)

@lock
Copy link

lock bot commented Sep 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants