Computer Vision

This page is intended to organize the development of a Computer Vision library for Julia.

Similar libraries for inspiration

Goals

Mostly type-independent algorithms
Support for high-dimensional data: "5D images", with 3 spatial dimensions, multichannel (multicolor), over time (a color movie would be a 5D image)
Ability to handle images larger than can be resident in memory at once, including different storage schemes (see this thread on the mailing list)
Robust handling of missing data (e.g., from bad camera pixels, dropped frames, or in registered images)
Lots of algorithms, efficiently implemented
optional GPU support (CuMatrix)
DICOM support

Current status

Several basic image processing algorithms are already implemented in examples/image.jl, though these are limited in terms of types that can be used. Also, there probably are more efficient solutions for some of them.

Next steps

The most important step will be a proper design, for which we should take a look at the libraries mentioned above.
consistent naming convention
Document functions

Other random points:

Image transformations (will need interpolation methods!)
Gaussian/Laplacian pyramids
optimal threshold (Otsu)
Histogram equalization
Morphological operations (dilate, erode, opening, closing)
TGV denoising
Non-linear filtering (median, alpha-trimmed mean,...)
Canny edge detection
Simulation of noise

Already implemented functions (implemented for 2d grayscale and RGB memory-resident images)

imread
imwrite
ppmwrite (should be replaced by imwrite since #328 is closed)
imshow (depends on feh for now)
ftshow (logarithmic view of Fourier spectrum (nice to have for MRI))
rgb2gray
rgb2hsi
hsi2rgb
rgb2ntsc
ntsc2rgb
imcomplement
imlineardiffusion
imROF (TV denoising)
imedge (partial)
Filters: gaussian2d, sobel, imlaplacian, imdog, imlog, prewitt, imaverage
imadjustintensity
similarity metrics: ssd, ssdn, sad, sadn, ncc
imthresh
imgaussiannoise
imstretch
forward/backward differences: backdiffx, backdiffy, forwarddiffx, forwarddiffy

Design proposal

An image will be represented as one of the following main types (other types will be introduced later below):

ImageArray: an "in memory" image stored as a multidimensional array
ImageFileArray: an image stored on disk in multidimensional array format (note: this "raw" format is not intended for general-purpose export, formats like .png, .ppm, .tif will still be used for that)
ImageFileBricks: an image stored on disk as an "array of arrays", for example representing a 256x256 image as a 4x4 array of sub-images with size 64x64. This format is designed to support local operations on images of arbitrarily large size.

These will be composite types whose fields specify details about the representation (more detail below).

Image library functions will have a syntax illustrated by the following:

copy(image_out,image_in): converts from one format to another (or simply copies the data, if the type isn't changing)
imfilter(image_out,image_in,kernel): spatial/temporal filtering

One issue to discuss is whether the output should come first, or last. Here I have shown it first, because additional arguments (as in the imfilter example) are likely to be best thought of as "inputs" and hence should perhaps not be split from image_in.

This syntax leverages Julia's ability to modify its input arguments. This has several distinct advantages:

The most efficient algorithm is likely to depend upon the storage format of both the input and the output. Julia's multiple dispatch will make this much easier to optimize.
This obviates the need to pass additional arguments specifying the desired output format, because all the details about how you want to format the output image are already stored as the fields of image_out.
You can readily specify that you only need a sub-region:
```
`image_out.coordinate_ranges = [20:50,30:85];
copy(image_out,image_in)`
```
will snip out a rectangular portion of image_in and store it in image_out. As a bonus, image_out automatically keeps track of which region its data came from.
Sub-region strategies will make it easier to implement many algorithms for ImageFile and ImageFileBricks types. For example, for an ImageFileBricks type you can create an ImageArray object corresponding to a single output brick, call the version of the algorithm written for an output of ImageArray type, and use the result as one of the data bricks. Likewise, this same strategy should make it straightforward to implement multithreaded operations, where each thread processes a block of pixels.
If you don't need to keep the original image, you can specify an in-place operation by passing an ImageNil (another image type not yet introduced) as the first argument. This can save memory.

Finally, a key component of the library will be to provide pixel iterators of different types. Those used in VIGRA ("iterators") and ImgLib2 ("cursors") will be the models. However, currently I'd propose that we only implement iterators that work on ImageArray objects; we then use Julia's multiple-dispatch capabilities to iterate over bricks of more complicated types. The virtue of this strategy is that we don't need to try to write "one true iterator" for complex formats; we can adapt the iteration strategy to the algorithm. For example, a filtering operation with a kernel that has large extent along the z axis but small extent along x and y might benefit from a different "bricking strategy" than one with a kernel of different shape.

Here is a very rough beginning for the ImageArray type:

abstract ImageCoordinate
abstract Space <: ImageCoordinate
abstract Time <: ImageCoordinate
abstract Channel <: ImageCoordinate

abstract Image
type ImageArray{T<:Number} <: Image
    data::Array{T}
    coordinate_types::Vector{ImageCoordinate}
    coordinate_units::Vector{Any}  # vector of strings, "microns" or I"\mu m"
    coordinate_names::Vector{Any}  # vector of strings, "X" or "Y"
    coordinate_ranges::Vector{Range1}
    space_directions::Matrix{Float64} # e.g., 0.15*eye(2) for single image with 0.15 micron pixels
    valid::Array{Any,1}      # can be used to store bad frame/pixel data
    metadata::CompositeKind  # arbitrary metadata, like acquisition time, etc.

    ImageArray{T}() = new()
end

Supplying a constructor that just takes an array input and provides defaults for the other fields will make it easier for people who don't want/need to worry about all these other fields.

One important detail concerns the data type T and the specification of bad pixels via the valid field. When T is Float32 or Float64 and the type is ImageArray, it is straightforward to use a NaN to represent a known bad pixel: just have data[i,j,...] = NaN. However, when T is an integer type (e.g., common on-disk formats), this is not an option. In such cases, the valid field provides a way of marking bad pixels. valid can be a Bool array of the same size as the image. Alternatively, in raw acquired data sets, bad pixels are frequently separable: you have certain pixels on your camera that you know are bad, and perhaps something went wrong during the acquisition of a particular frame or image stack. For example, suppose stack 20 is entirely bad, and frame 14 of stack 37 is also bad. valid could be specified in the following way:

# Assume goodpixels is an array of the size of one camera frame, true for the good pixels
goodframes = trues(1,1,n_frames_per_stack,n_stacks) # first 2 coords are camera x and y
goodframes[1,1,:,20] = false
goodframes[1,1,14,37] = false
img.valid = cell(2)
img.valid[1] = goodpixels
img.valid[2] = goodframes

The notion here is that the 4-dimensional valid array is being represented as the outer product of goodpixels and goodframes, but for reasons of memory-efficiency we don't directly compute the outer product.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly