Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit multiple data chunks from output Streams (was: Proper streaming) #179

Open
LinusU opened this issue Mar 6, 2015 · 20 comments
Open

Comments

@LinusU
Copy link
Contributor

LinusU commented Mar 6, 2015

It would be very cool if sharp could support proper streaming. As it is now, it will always emit exactly one data event which is the whole image. It kind of defeats the purposes of streams.

As I see it there is two ways to implement this.

  1. The hard, but robust way.

    Every time that libvips have data that should be written, it should call a function that we have defined. We then use this data to generate a data-event on the stream. It would be really cool if libvips could provide a state-machine that has a synchronous function that gives the next chunk. That way pause and unpause would work perfectly.

  2. The quick and easy fix.

    We can also create a named pipe and tell libvips to write it to that file via vips_image_write_to_file. To the user we would then return a ReadStream as returned by fs.createReadStream(path).

@lovell
Copy link
Owner

lovell commented Mar 6, 2015

If you've not already done so, have a look at the Stream discussion in #30.

At the time, @jcupitt created a load-from-stream branch with some initial ideas, mainly around the creation of a VipsInputStream/VipsOutputStream.

On a related note, I spotted this recent change in io.js that looks like it could make handing JavaScript Stream objects in C++ easier.

@jcupitt
Copy link
Contributor

jcupitt commented Mar 6, 2015

My final thought back then was in this comment: #30 (comment)

Thinking about it again, it might be possible to do a hybrid of buffering and streaming. You could read and store data off the input until there was enough there that you could get all the metadata you need to be able to decide how to handle the file.

Once you've picked an approach, close the image and start again. This time you feed the reader the stored data from the first pass and only start reading off the input again when that runs out. At that point (presumably) you are decoding actual pixels.

There are still some awful problems. What if you send 10 jpg images down a pipe to a resizer, but image 4 is corrupt. How will the reader recover from the error?

@LinusU
Copy link
Contributor Author

LinusU commented Mar 6, 2015

I'm sorry, I missed out the very vital part that is that I'm talking about output, not input.

I'm not that interested in being able to pipe more than one image and for that I think it would be more reasonable to use object-streams (objectMode = true) for. Anyhow, that's another discussion.

My use case is to stream the output image data to S3 (or to any other service for that matter...).

My ideal usage would be like this:

var out = sharp(bufferObject).resize(1080, 1080).min().toStream();
S3.upload({ Body: out, Key: 'my-image.jpg' }, cb);

This could be done as a said in number 2 by letting libvips write it to a named pipe and then createReadStream() on that one. That should give me data events as the data is normally written to disk, e.g. when fwrite(...) is called in C.

Actually, even more usual is probably piping directly back to the browser on http request.

server.on('request', function (req, res) {
  Sharp(myBuffer).resize(1080, 1080).min().toStream().pipe(res);
});

Why I like to have .toStream is because I think it makes it more explicit on what I want to do. In my humble opinion that is better than letting the sharp object always be a stream, which sometimes emits data.

@jcupitt
Copy link
Contributor

jcupitt commented Mar 9, 2015

Ah OK, that makes sense. I agree, this would be an nice cleanup to libvips. The load-from-stream branch does this already, I'll fix it up a bit and add it.

@joe1chen
Copy link

@jcupitt So what's the resolution on this issue? It looks like the load-from-stream branch was closed.

@lovell lovell changed the title Proper streaming Emit multiple data chunks from output Streams (was: Proper streaming) Jul 14, 2015
@jcupitt
Copy link
Contributor

jcupitt commented Jul 15, 2015

I spent a bit of time looking at the branch, but master has moved and merging it would have taken quite a lot of work.

Another alternative would be GIO. It has a thing called GFile which does more or less the same thing as VipsInputStream / VipsOutputStream, but better: a POSIX-y IO API which can have memory, files, sockets, URIs, or whatever you like as the backend:

https://developer.gnome.org/gio/stable/GFile.html

We could rework the png/jpg/webp read and write operations to use GFile. We'd then be able to open images directly over http or ftp, for example:

vips copy ftp://a.b.c/d.jpg x.jpg

No idea if that would be useful.

On the other hand ... It would mean adding yet another required dependency. And VIpsInputStream / VipsOutputStream were small and easy to understand and extend.

@gmaclennan
Copy link

@jcupitt great work on the streams branch, a shame it could never be merged. Just adding my support for any work on true streams, which would allow significantly less memory usage since right now sharp needs to load the source image into memory, and then hold the whole output buffer in memory, before writing out.

@mitra42
Copy link

mitra42 commented Dec 17, 2019

I'm trying to find a conclusion to this, did anyone succeed in getting sharp to write to a readableStream that could as suggested above be piped to a request from a browser.

@jcupitt
Copy link
Contributor

jcupitt commented Dec 17, 2019

libvips 8.9 (due rsn) has true streaming:

https://libvips.github.io/libvips/2019/11/29/True-streaming-for-libvips.html

eg.

aws s3 cp s3://mybucket/input.jpg - | \
    vips thumbnail_stream [descriptor=0] .jpg 128 | \
    aws s3 cp - s3://mybucket/output.jpg

Obviously you could send to a browser instead.

It would need some (unknown?) amount of work to integrate this in sharp. I think Lovell is planning to look at it when 8.9 finally ships.

@lovell
Copy link
Owner

lovell commented Dec 17, 2019

Yes, after there's a version of sharp that depends on the future libvips v8.9.0 we can look at the best way of handling its new Stream API.

A possible first implementation could involve accepting file descriptors as input/output, although that is likely to have to be via new API and not the standard, idiomatic Node.js Stream API (which will still be supported using the existing implementation).

A more complete solution might be to create custom libvips Stream objects and use these for passing data chunks to/from Node.js, but there are multi-threading considerations e.g. libvips will be running on a different thread to V8.

I'm planning to migrate the internals of sharp to use Node.js's newish N-API first - see #1282 - sadly I don't see anything in N-API for Node.js Stream objects, plus this migration will remove access to things like Node.js' StreamBase class.

@jcupitt
Copy link
Contributor

jcupitt commented Dec 17, 2019

Huh that sounds a little frustrating. Do let me know if there's anything I can do to help.

@mitra42
Copy link

mitra42 commented Dec 17, 2019

Thanks for the explanation - I'm not up enough on the details of the APIs to make intelligent comments on the choices made, but I think keeping a mind on the goal is important - i.e. to be able to interface with external (non-sharp) code.

In my case .... I had a Node stream with the image rather than a file, (being pulled from the net or a cache, or a P2P process) and needed to pass it back to the browser request. There are probably other requirements for other people but I'm guessing these node streams are the common theme to most of them rather than file descriptors.

Also ... can I suggest a minor change to the documentation to explain how to work with streams, it only took me a few minutes to find that sharp claimed to support streams, it took me a couple of hours - searching through git issues etc - to figure out how to plug it into streams, and I almost gave up when this thread implied (to me) that it wasn't supported (rather than what it actually says, which is that its supported but with bad memory handling).

I'd suggest an edit to https://sharp.pixelplumbing.com/en/stable/api-constructor/ maybe one of the examples, best I could figure out was the following code.

# Where s is the stream; and res is a browser Response. 
cost sharpie = sharp(); 
s.pipe(sharpie);
sharpie.extract({left,top,width,height})
    .pipe(res);

This works (subject to the memory issues) but wasn't intuitive to figure this out from the docs and I'm not at all sure its the right way to do it. The instinct was to look at "toFile", it took a while to realize that wasnt the right way.

Oh ... and thanks, once I figured this out, the library's saved me a bunch of work !

@lovell
Copy link
Owner

lovell commented Dec 18, 2019

@mitra42 Glad you got it working; is this for use with the Internet Archive?

The second example in https://sharp.pixelplumbing.com/en/stable/api-constructor/#examples which is generated from https://github.com/lovell/sharp/blob/master/lib/constructor.js#L53 demonstrates Stream-based I/O and I'm always happy to accept PRs with more examples and improvements to the JSDocs if you're able.

@mitra42
Copy link

mitra42 commented Dec 18, 2019

@lovell Oops - sorry I missed that example - not sure how.

Yes -this is for the Internet Archive - for the Offline Archive, a version of the archive running in devices as small as Raspberry Pi's in particular to support interaction between that box and a local copy of mediawiki running an extension built by David Kamholz for transcribing palm leaf books, it was using IIIF on the archive to zoom in on the full scan, and I needed to port that support to Javascript. I tried using IIIF via @jonallo 's iiif-image-server-node but the installer for that is broken (it fails to install sharp) and once I'd got sharp installed separately I realized sharp could probably be used better on its own than going through iiif-image-server-node > iiif-image

It took all of 25 lines of code ! So pretty happy with that. https://github.com/internetarchive/dweb-mirror/blob/master/mirrorHttp.js#L636

@summera
Copy link

summera commented Apr 1, 2020

Yes, after there's a version of sharp that depends on the future libvips v8.9.0 we can look at the best way of handling its new Stream API.

A possible first implementation could involve accepting file descriptors as input/output, although that is likely to have to be via new API and not the standard, idiomatic Node.js Stream API (which will still be supported using the existing implementation).

A more complete solution might be to create custom libvips Stream objects and use these for passing data chunks to/from Node.js, but there are multi-threading considerations e.g. libvips will be running on a different thread to V8.

I'm planning to migrate the internals of sharp to use Node.js's newish N-API first - see #1282 - sadly I don't see anything in N-API for Node.js Stream objects, plus this migration will remove access to things like Node.js' StreamBase class.

@lovell is this still on the roadmap given sharp is now using libvips 8.9?

@germansokolov13
Copy link

Do I understand correctly? The underlying cli command vips does supports streams and won't load entire file to start processing. But the Sharp library can't do it? So in Node.js I have to do all in Linux Shell land to achieve it? Maybe I could run vips shell command as child_process.spawn and connect the child process to Node streams via its stdin/stdout?

@CanRau
Copy link

CanRau commented Sep 19, 2023

As libvips/libvips#1443 has been merged is there any progress to make streams fully work in sharp as well? 😇

I'm currently already using sharp for profile image upload though they have a much lower max size than the images we want to allow in content so real streaming straight from an xhr upload straight from the browser and then piping to R2 would best.

@exentrich
Copy link

exentrich commented Feb 16, 2024

This improvement will greatly enhance the end user experience.

In our current process, we download the source image and pipe it to Sharp for resizing. Then, we pipe the resized image to an S3 bucket and simultaneously as a chunked http response. However, there is currently a delay because Sharp only emits one large chunk. By implementing proper stream emission, the final image will be delivered to end users more quickly.

@marcosc90
Copy link
Contributor

I'm currently working with large images, and true streaming would be ideal. I’ll start working on this unless someone else is already on it.

@marcosc90
Copy link
Contributor

marcosc90 commented Jul 29, 2024

I got streaming working, unfortunately I run into an issue that will require dropping Napi::AsyncWorker since code running in the Execute method uses libuv worker threads which cause the following code to deadlock because fs also uses libuv threads.

it('concurrency', async () => {
    const concurrency = 10; // any number equal or greater to UV_THREADPOOL_SIZE (4)
    const promises = Array(concurrency).fill().map(async() => {
      const file = fixtures.inputJpg320x240;
      const input = fs.createReadStream(file);
      const stream = sharp().rotate();

      input.pipe(stream);
      return buffer(stream);
    });
    const result = await Promise.all(promises);
    // ...
  });

Given that I rarely code in C++ and need to become more familiar with N-API, it might take me a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests