input data with many points saturate output image #22

jjguy · 2014-07-31T18:43:28Z

In the current implementation, each input point is directly translated into an image of DOTSIZE pixels and blended in. If there are a large number of points relative to the chosen dotsize and image resolution, the output image is completely saturated.

jjguy · 2014-07-31T18:58:49Z

The natural solution here is a normalization pass of the input data first, then plot the relative density of that dataset -- with a dial to increase or decrease the overall thresholds as desired.

This is a well-studied problem - it's effectively a 2D histogram. Some links, including a numpy implementation:

I think the simplest is the 2d histogram calculated via the binning method. Preference is to avoid the numpy dependency and implement the algorithm in heatmap.c. The bones:

After calculating MAX_X, MAX_Y, MIN_X, MIN_Y, divide that input space into NxM fixed-width bins
Iterate over input points, dropping into one of the bins (optionally weighting each point)
Run through the existing greyscale point blending code, but use the NxM bindata vs. the original input data

The only tricky thing here is how to select the number of bins. This is another well-studied problem. Some links:

...but the general approaches described include considerations for the heatmap output format. When deciding number of bins, we'll have to consider dotsize and output resolution -- perhaps to the exclusion of the "traditional" selection criteria. My starting point would be to have enough bins such that in the output image each dot overlaps 80% from it's neighbors.

For example, given a dotsize of 150px and a resolution of 1024px, then use (1024/150)*5 = 34 bins.

(shrug) it'll take some tuning and study.

kwauchope · 2014-08-02T17:18:30Z

A naive approach (my original thought for a hack) would be to have the original density array as an array of floats and use a simple additive combination function. After that normalise and put in the range from 0-255 before final colorisation as before. It would almost double memory consumption of this stage though pixels_1B + pixels_4B to pixels_4B + pixels_4B.

I'd be interested to see if it took much longer for floating point arithmetic as it is only basic addition. There is an extra normalisation step at the end though that normalisation step currently occurs for every pixel around a point: pixels[ndx] = (pixels[ndx] * pixVal) / 255. There may even be a performance increase for this solution compared to the current one when pixels < (dotsize^2*numpoints).

Look forward to seeing what you come up with :)

kwauchope · 2014-08-23T08:17:51Z

@jjguy I've implemented my naive approach, as it skips doing the normalisation at each step it doesn't seem to impact performance as far as time goes.

nosaturation

Added customisable weights for the intensity decay and with the new way they are less black magic to make look right.

Looking to return the max and min values from the normalisation so they can be used to create a legend.

Let me know what you think. It will break backwards compatibility to some extent as images will look different though the same API is still used.

jjguy mentioned this issue Jul 31, 2014

Weighted Heatmap #14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input data with many points saturate output image #22

input data with many points saturate output image #22

jjguy commented Jul 31, 2014

jjguy commented Jul 31, 2014

kwauchope commented Aug 2, 2014

kwauchope commented Aug 23, 2014

input data with many points saturate output image #22

input data with many points saturate output image #22

Comments

jjguy commented Jul 31, 2014

jjguy commented Jul 31, 2014

kwauchope commented Aug 2, 2014

kwauchope commented Aug 23, 2014