Add version of Image::getArray (and Mesh::getField) that only wraps data in numpy.ndarray rather than copying it #903

cchriste · 2021-01-16T00:04:45Z

It's inefficient to copy an image's data in order to modify it with numpy and then reconstruct the image from the copied data (which probably copies it again). Instead, we can use this version with a capsule, except our capsule will not free the data when the numpy array goes out of scope since it's actually owned by the itk image.

This will make operations like the ones @jadie1 is performing much more efficient.
The seeming danger is the image being free'd and the numpy array trying to be used afterwards. So add a note to the function's documentation (to be defined as a lambda in ShapeworksPython.cpp).

#include <iostream>
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

namespace py = pybind11;

PYBIND11_PLUGIN(numpywrap) {
    py::module m("numpywrap");
    m.def("f", []() {
        // Allocate and initialize some data; make this big so
        // we can see the impact on the process memory use:
        constexpr size_t size = 100*1000*1000;
        double *foo = new double[size];
        for (size_t i = 0; i < size; i++) {
            foo[i] = (double) i;
        }

        // Create a Python object that will free the allocated
        // memory when destroyed:
        py::capsule free_when_done(foo, [](void *f) {
            double *foo = reinterpret_cast<double *>(f);
            std::cerr << "Element [0] = " << foo[0] << "\n";
            std::cerr << "freeing memory @ " << f << "\n";
            delete[] foo;
        });

        return py::array_t<double>(
            {100, 1000, 1000}, // shape
            {1000*1000*8, 1000*8, 8}, // C-style contiguous strides for double
            foo, // the data pointer
            free_when_done); // numpy array references this parent
    });
    return m.ptr();
}

>>> import numpywrap
>>> z = numpywrap.f()
>>> # the python process is now taking up a bit more than 800MB memory
>>> z[1,1,1]
1001001.0
>>> z[0,0,100]
100.0
>>> z[99,999,999]
99999999.0
>>> z[0,0,0] = 3.141592
>>> del z
Element [0] = 3.14159
freeing memory @ 0x7fd769f12010
>>> # python process memory size has dropped back down

From https://stackoverflow.com/questions/44659924/returning-numpy-arrays-via-pybind11

Note: This topic seems to confirm that itkImage::GetBufferPointer() does not copy the data: http://itk-insight-users.2283740.n2.nabble.com/ITK-users-ITK-Image-data-pointer-and-GPU-algorithms-td7589655.html
(Our current Image.toArray function does copy the data because that's the default behavior of the py::array constructor.)

Lots of info here useful to ensure data is deallocated when no longer needed: pybind/pybind11#1042
For example, it may be possible to use a simpler version, or to ensure data is not deleted until it goes out of scope in both C++ and Python. But since it's the itkImage's data, I'm not sure we'd have that control, so the simplest method might be to just not delete it, recognizing the potential risk on the Python side, and gaining the (significant) advantage of not copying back and forth.

This is probably not difficult, but I want to make sure we have a use case, so @jadie1 when you have time please add one to this issue. I'll implement it and throw it back over the fence for you to test. Thanks!

cchriste · 2021-01-17T22:37:24Z

I just want to reiterate: this is probably not difficult and there could be very significant performance benefits.
But it will be more constructive to go in with a concrete use case.

jadie1 · 2021-02-11T19:18:16Z

One use case is in the ShapeCohortGenPackage. In the generate_images() function in CohortGenUtils.py we take a segmentation, add a foreground and background with noise and blur the boundary to create a synthetic image.

As a very simple test:

Take any image and get a numpy array from it
Add a constant value to the numpy array
Write the numpy array back to an image (this is the part that doesn't work currently)

The output image should look the same as the input image but brighter.

cchriste · 2021-02-12T17:04:19Z

Write the numpy array back to an image (this is the part that doesn't work currently)
This issue is meant to remove this necessity by simply passing a reference to the original image's memory so that modification thereof will modify the image itself.

Keep in mind that if the numpy image is reallocated, say by changing the image size, it will no longer point to this memory (though the memory can be passed without passing ownership thereof, so it wouldn't introduce a crash potential).
https://pybind11.readthedocs.io/en/stable/advanced/functions.html#return-value-policies

cchriste · 2021-05-27T22:53:55Z

HOWTO go from C++ to Python withut copying:
// Pass pointer to c++ data to Python without copying.
// Also see the bindings for Physical/IndexRegion.min/max in ShapeworksPython.cpp
void addInput(std::shared_ptr<const MyFancyMatrix> packet)
{
  // When a valid object is passed as 'base', it tells pybind not to take ownership of the data,
  // because 'base' will own it. In fact 'packet' will own it, but - psss! - , we don't tell it to
  // pybind... Alos note that ANY valid object is good for this purpose, so I chose "str"...
  py::str dummyDataOwner;
  py::array input{getNumpyDTypeFromElementType(packet->elementType()), packet->shape(),
                  packet->strides(), packet->data(), dummyDataOwner};

  assert(!input.owndata()); // if fires, s.g. is not OK with ownership (probably matrix data was copied)
  // ...
}
(from https://github.com/pybind/pybind11/issues/323#issuecomment-575717041)

HOWTO to go from Python to C++ without copying:
// Also see our Image init function that copies from a py::array in ShapeworksPython.cpp.
// That function may be able to steal or share the data from numpy rather than copying it.
/**
 * \brief Returns span<T> from py:array_T<T>. Efficient as zero-copy.
 * \tparam T Type.
 * \param passthrough Numpy array.
 * \return Span<T> that with a clean and safe reference to contents of Numpy array.
 */
template<class T=float32_t>
inline std::span<T> toSpan(const py::array_t<T>& passthrough)
{
	py::buffer_info passthroughBuf = passthrough.request();
	if (passthroughBuf.ndim != 1) {
		throw std::runtime_error("Error. Number of dimensions must be one");
	}
	size_t length = passthroughBuf.shape[0];
	T* passthroughPtr = static_cast<T*>(passthroughBuf.ptr);
	std::span<T> passthroughSpan(passthroughPtr, length);
	return passthroughSpan;
}
(from https://github.com/pybind/pybind11/issues/1042#issuecomment-663154709)

cchriste assigned jadie1 and cchriste Jan 16, 2021

This was referenced Jan 18, 2021

remove sw-specific types from Python API that can simply be numpy arrays or eigen matrices #830

Closed

Problem with writing image to array and back #895

Closed

jadie1 mentioned this issue Feb 11, 2021

Replace python ITK usage with ShapeWorks image read #716

Closed

This was referenced Feb 12, 2021

Fix Python Image copy-from-numpy constructor #965

Closed

(Python) add Image.wrapArray that only wraps data from a numpy.ndarray rather than copying it #966

Closed

sheryjoe modified the milestones: 6.0 Leftover, 6.1 Release (May 2021) Feb 15, 2021

sheryjoe modified the milestones: 6.1 Release (June 2021), 6.2 Release (August 2021) May 25, 2021

cchriste mentioned this issue Jun 17, 2021

Update shapeworks.sw2vtkMesh to export mesh vertices, edges, and polys rather than read/write to disk #825

Closed

iyerkrithika21 assigned iyerkrithika21 and unassigned iyerkrithika21 Jul 15, 2021

cchriste mentioned this issue Jul 28, 2021

Consolidation of ComputeMeanNormals #1399

Merged

cchriste changed the title ~~Add version of Image::getArray that only wraps data in numpy.ndarray rather than copying it~~ Add version of Image::getArray (and Mesh::getField) that only wraps data in numpy.ndarray rather than copying it Aug 3, 2021

cchriste mentioned this issue Aug 3, 2021

[Mesh] Create independent class for mesh fields #935

Closed

cchriste linked a pull request Sep 10, 2021 that will close this issue

Image initialization and sharing without copying from/to numpy arrays in Python #1427

Merged

akenmorris closed this as completed in #1427 Sep 10, 2021

This was referenced Sep 29, 2021

Mesh API for Geodesic Distance computation #1494

Merged

Share data arrays between Python and C++ rather than copying them. #1495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add version of Image::getArray (and Mesh::getField) that only wraps data in numpy.ndarray rather than copying it #903

Add version of Image::getArray (and Mesh::getField) that only wraps data in numpy.ndarray rather than copying it #903

cchriste commented Jan 16, 2021

cchriste commented Jan 17, 2021

jadie1 commented Feb 11, 2021

cchriste commented Feb 12, 2021 •

edited

Loading

cchriste commented May 27, 2021 •

edited

Loading

Add version of Image::getArray (and Mesh::getField) that only wraps data in numpy.ndarray rather than copying it #903

Add version of Image::getArray (and Mesh::getField) that only wraps data in numpy.ndarray rather than copying it #903

Comments

cchriste commented Jan 16, 2021

cchriste commented Jan 17, 2021

jadie1 commented Feb 11, 2021

cchriste commented Feb 12, 2021 • edited Loading

cchriste commented May 27, 2021 • edited Loading

cchriste commented Feb 12, 2021 •

edited

Loading

cchriste commented May 27, 2021 •

edited

Loading