reduce memory copying by API change/addition #240

slajar · 2014-07-22T08:30:25Z

I really like the webcam library it is probably the most extensive and versatible lib for java. Nevertheless, it has one major (unfortunately common for java API mistake) it is designed to use too much copy processes. Let me clarify it a bit. You get retrieve a BufferedImage by a getSomeImage()-method that's great for using directly in swing applications. Unfortunately this copies the image once. After that you have to draw it to a graphics object. That copies it the second time.

In my case it's much worse. Over here we are in a opengl environment. That means we retrieve the image, transform it to a bytebuffer and copy it to the graphics card. That means at least three copies. As far as I understand the code below there is one more copy involved. The bigger, the image is, the more cpu is involved.

Therefore I ask for a function that let me specify a target ByteBuffer. With this change I would probably be able to remove at leat 2 copy processes. I am sure there are more applications that need the same performance.

sarxos · 2014-07-23T14:29:36Z

@slajar,

The functionality you are requesting is already there. Some WebcamDevice implementations gives possibility to access underlying memory through the direct memory buffers (see this method). One of those implementations is WebcamDefaultDevice which is used by default and implements BufferAccess (this is special interface which tells the API that it can return direct memory buffer, when implemented), when you did not specify other capture driver.

So, the solution for you would be to use it like, for example, here:

public static void main(String[] args) {
  Webcam w = Webcam.getDefault();
  w.open();
  do {
    ByteBuffer bb = w.getImageBytes();
    // process byte buffer
  } while (something);
}

In case you need more information about how the memory buffer is converted to BufferedImage you can see how it's done in WebcamDefaultDevice. Ah, and please note that not all of the devices from capture drivers implements BufferAccess, so if you switch from default one to e.g. VLCj, the situation can be even worst because it first store the image on disk, then it's read to BufferedImage, and then converted to ByteBuffer.

I hope this answers your question.

slajar · 2014-07-23T15:10:36Z

Well, not really. What you can do by this method is retrieving a ByteBuffer-Object from the capturer device. PLease correct me if I am wrong, this ByteBuffer-Object seems to be allocated everytime a new image is received. That means additionally overhead. What I would like to have is not a getter but a filler method something like

public static void main(String[] args) {
  Webcam w = Webcam.getDefault();
  w.open();
 ByteBuffer b = BufferUtils.allocate(w.getMaximumBufferSize());
  do {
     w.retrieveImageBytes(b);
    // process byte buffer
  } while (something);
}

sarxos · 2014-07-23T16:11:31Z

Yes, this is true. Call to the getImageBytes() gives you a reference to the newly created direct ByteBuffer pointing pre-allocated memory which reside outside of the normal garbage-collected heap.

New direct ByteBuffer reference is created within BridJ JNI:

jobject JNICALL Java_org_bridj_JNI_newDirectByteBuffer(JNIEnv *env, jobject jthis, jlong peer, jlong length) {
  jobject ret;
  ret = (*env)->NewDirectByteBuffer(env, JLONG_TO_PTR(peer), length);
  return ret;
}

And points to memory allocated inside the OpenIMAJGrabber native:

void process_image(VideoGrabber* grabber, void* buffer, size_t length) {
  if (grabber->rgb_buffer.length != length) {
    if (grabber->rgb_buffer.start != NULL) {
      free(grabber->rgb_buffer.start);
    }
    grabber->rgb_buffer.start = malloc(length);
    grabber->rgb_buffer.length = length;
  }
  memcpy(grabber->rgb_buffer.start, buffer, length);
}

There is a way to copy the memory into the target ByteBuffer (at least theoretically, I haven't test it), so this would be possible:

public static void main(String[] args) {
  Webcam w = Webcam.getDefault();
  w.open();
  ByteBuffer bb = BufferUtils.allocate(w.getImageBufferSize());
  do {
     w.getImageBytes(b);
    // process byte buffer
  } while (something);
}

But please note that this operation will copy the data from one buffer to the other and from what I understood, you do not want any more copies. Existing functionality does not copy allocated buffer to ByteBuffer, but instead it creates new direct ByteBuffer object which points to pre-allocated memory. This is done by JNIEnv->NewDirectByteBuffer(..) (doc):

Allocates and returns a direct java.nio.ByteBuffer referring to the block of memory starting at the memory address address and extending capacity bytes.

The above neither allocates memory for image, nor copy it from one place to the other, so the big question here is what is better:

To allocate small reference of direct ByteBuffer which points to pre-allocated memory fragment, or
To copy memory to pre-created ByteBuffer (~1MB for VGA, but the higher resolution is, the more bytes needs to be copied).

I do not have tools and knowledge to test which one is faster, but I guess it would be fairy simple to add new method which will do exactly what you want (point 2).

slajar · 2014-07-24T07:30:54Z

Hey sarxos,

thanks for your fast answer.

Best practice in image processing is always:

do not copy if possible
if copy is needed merge it with another task where you have to put your hands on the pixels as well
always try to reuse image buffers

These three axioms leeds to the best peformance that can be achieved.

If the grabber devices actually deliver a pointer (Bytebuffer or something like that), we as application developers either use this pointer directly for displaying or processing. That will only work correctly if there is some kind of synchroization feature in the underlaying grabber API. Otherwise you will have mostly async/cutted images while drawing.

As far as I can see in OpenIMAJGrabber native, this is not synchronized, so we should leave the pointer as fast as we can or is it done by the process callback? Either way, we will have to copy it. In this case it would be the best to copy the data directly in a pre defined Bytebuffer from the API user.

Your big question: I hope I understood you correctly.

have a bytebuffer that's exactly the size of the video image
have a bytebuffer that's somewhere at maximum size

Well, both have the same performance at memcpy since you will cut the copied bytes at the memcpy call. That means VGA will only copy 1MB where HD will copy 7,6MB.

drawbacks:

The only drawback a version one is that you will have to reallocate the bytebuffer evrytime, the user changes the image format.
version two has much mor memory footprint and wasts memory, because no one will ever use the entire memory.

I think this decision must be left to the user of the API, since it has either memory or performance impact. That depends on the application.

sarxos · 2014-07-24T11:51:17Z

@slajar,

I think there a gap in communication :) I'm not native English speaker, so I may described these two points incorrectly.

Let me describe again.

To allocate small reference of direct ByteBuffer which points to pre-allocated memory fragment, or

To copy memory to pre-created ByteBuffer (~1MB for VGA, but the higher resolution is, the more bytes needs to be copied).

The OpenIMAJGrabber always allocates new memory buffer (lets call it A, size = width * height * 3) and copy data from device internal working memory to the buffer A. Unfortunately this cannot be changed in a simple way, at least not in Webcam Capture API project (OpenIMAJGrabber is part of OpenIMAJ). I'm not even sure if this won't break anything. So copy from device internals to A is something we have to deal with..

In case of 1, there is no second copy. The Java reference ByteBuffer is created in-place from pre-allocated memory or A. Nothing is being copied, the only memory allocated here is small portion of Java heap which is used to handle ByteBuffer reference (maybe around 32 bytes: 8 for address, 8 for length and 16 for Object class).

In case of 2, you create ByteBuffer and allocate new memory (~1MB in heap, or direct - outside of the heap + ~32 bytes for Java reference), lets call this buffer B. In this case you will always copy from A to B.

So to summarise:

Always allocate small amount of heap (~32 bytes) to keep ByteBuffer reference. Only copy from device internals to A.
Create new ByteBuffer and allocate memory once. Besides copy from device internals to A, always copy from A to B.

If the grabber devices actually deliver a pointer (Bytebuffer or something like that), we as application developers either use this pointer directly for displaying or processing.

This is what is done in 1. At least this is my understanding after reading the source code of OpenIMAJGrabber and BridJ.

slajar · 2014-07-24T12:30:25Z

Okay, I see. I am not a native speaker as well (I am only 400km away from you ;)

So, back to the topic...

I don't really like version 1 since this seems to issue the described synchronization problems above. You should not access the pointer if it is not secured that no one else is writing it at the same time. Otherwise you will get cutted images.

Version 2 is probably the best what we can get without changing the underlaying API's. This has one copy operation too much but that would be okay. Maybe this is a good point to but image transformation or other processing plugins (pipelining) in this particular place.

sarxos · 2014-07-24T17:43:36Z

The low-level webcam API is internally synchronized by the WebcamProcessor class. You could use it to do your own synchronized job and ensure that the buffer will remain unchanged until you need next one.

Example:

import java.awt.Dimension;
import java.nio.ByteBuffer;

import com.github.sarxos.webcam.Webcam;
import com.github.sarxos.webcam.WebcamDevice.BufferAccess;
import com.github.sarxos.webcam.WebcamResolution;
import com.github.sarxos.webcam.WebcamTask;


public class Test {

    public static class MySynchronisedTask extends WebcamTask {

        private volatile boolean done = false;

        public MySynchronisedTask(Webcam w) {
            super(Webcam.getDriver(), w.getDevice());
        }

        @Override
        protected void handle() {

            ByteBuffer buffer = ((BufferAccess) getDevice()).getImageBytes();

            if (buffer != null) {

                // use the buffer to do something which needs to be
                // synchronized, webcam will not execute any other operation
                // until this method is in progress, e.g. you can copy from
                // buffer to graphic card memory

                done = true;
            }
        }

        public boolean doTheJob() {
            done = false;
            try {
                process();
            } catch (InterruptedException e) {
            }
            return done;
        }
    }

    public static void main(String[] args) throws InterruptedException {

        Dimension s = WebcamResolution.QQVGA.getSize();

        final Webcam w = Webcam.getDefault();
        w.setViewSize(s);
        w.open();

        Thread t1 = new Thread(new Runnable() {

            @Override
            public void run() {
                do {
                    if (!new MySynchronisedTask(w).doTheJob()) {
                        return;
                    }
                } while (true);
            }
        });
        t1.setDaemon(true);
        t1.start();

        Thread t2 = new Thread(new Runnable() {

            @Override
            public void run() {
                do {
                    if (!new MySynchronisedTask(w).doTheJob()) {
                        return;
                    }
                } while (true);
            }
        });
        t2.setDaemon(true);
        t2.start();

        Thread.sleep(2000);

        w.close();
    }
}

In the example above you can see both t1 and t2 running in parallel but due to the API mechanics the buffer is non-volatile for the time when you are in handle() method. That means you could possibly use it to copy the buffer content to the graphic card memory without unnecessary copying in the middle and worrying about synchronization issues.

sarxos · 2014-07-24T18:42:04Z

@slajar,

Except the functionality I described in my previous post, you can try the new method getImageBytes(ByteBuffer target), but please note that I performed only one simple test to verify if it works (there can be some bugs). In case of any problems please don't hesitate to report them back or to try fixing them on your own.

import java.awt.Dimension;
import java.awt.Transparency;
import java.awt.color.ColorSpace;
import java.awt.image.BufferedImage;
import java.awt.image.ComponentColorModel;
import java.awt.image.ComponentSampleModel;
import java.awt.image.DataBuffer;
import java.awt.image.DataBufferByte;
import java.awt.image.Raster;
import java.awt.image.WritableRaster;
import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;

import javax.imageio.ImageIO;

import com.github.sarxos.webcam.Webcam;
import com.github.sarxos.webcam.WebcamResolution;


public class Test2 {

    public static void main(String[] args) throws InterruptedException, IOException {

        Dimension size = WebcamResolution.QQVGA.getSize();
        int length = size.width * size.height * 3;
        ByteBuffer bb = ByteBuffer.allocate(length);

        Webcam w = Webcam.getDefault();
        w.setViewSize(size);
        w.open();
        w.getImageBytes(bb);

        byte[] bytes = new byte[length];
        bb.rewind();
        bb.get(bytes);

        int[] BAND_OFFSETS = new int[] { 0, 1, 2 };
        int[] BITS = { 8, 8, 8 };
        int[] OFFSET = new int[] { 0 };
        int DATA_TYPE = DataBuffer.TYPE_BYTE;
        ColorSpace COLOR_SPACE = ColorSpace.getInstance(ColorSpace.CS_sRGB);
        ComponentSampleModel smodel = new ComponentSampleModel(DATA_TYPE, size.width, size.height, 3, size.width * 3, BAND_OFFSETS);
        ComponentColorModel cmodel = new ComponentColorModel(COLOR_SPACE, BITS, false, false, Transparency.OPAQUE, DATA_TYPE);

        byte[][] data = new byte[][] { bytes };

        DataBufferByte dbuf = new DataBufferByte(data, bytes.length, OFFSET);
        WritableRaster raster = Raster.createWritableRaster(smodel, dbuf, null);

        BufferedImage bi = new BufferedImage(cmodel, raster, false, null);
        bi.flush();

        ImageIO.write(bi, "jpg", new File("test.jpg"));
    }
}

sarxos · 2014-07-26T10:18:56Z

@slajar, please let me know if this satisfies your needs.

sarxos closed this as completed Jul 23, 2014

sarxos added the question label Jul 23, 2014

sarxos reopened this Jul 23, 2014

sarxos closed this as completed in 5742ff2 Jul 24, 2014

sarxos added the enhancement label Jul 24, 2014

sarxos added a commit that referenced this issue Jul 24, 2014

Remove buffer access from FsWebcamDevice, refs #240

6e4923b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce memory copying by API change/addition #240

reduce memory copying by API change/addition #240

slajar commented Jul 22, 2014

sarxos commented Jul 23, 2014

slajar commented Jul 23, 2014

sarxos commented Jul 23, 2014

slajar commented Jul 24, 2014

sarxos commented Jul 24, 2014

slajar commented Jul 24, 2014

sarxos commented Jul 24, 2014

sarxos commented Jul 24, 2014

sarxos commented Jul 26, 2014

reduce memory copying by API change/addition #240

reduce memory copying by API change/addition #240

Comments

slajar commented Jul 22, 2014

sarxos commented Jul 23, 2014

slajar commented Jul 23, 2014

sarxos commented Jul 23, 2014

slajar commented Jul 24, 2014

sarxos commented Jul 24, 2014

slajar commented Jul 24, 2014

sarxos commented Jul 24, 2014

sarxos commented Jul 24, 2014

sarxos commented Jul 26, 2014