Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input data with many points saturate output image #22

Open
jjguy opened this issue Jul 31, 2014 · 3 comments
Open

input data with many points saturate output image #22

jjguy opened this issue Jul 31, 2014 · 3 comments

Comments

@jjguy
Copy link
Owner

jjguy commented Jul 31, 2014

In the current implementation, each input point is directly translated into an image of DOTSIZE pixels and blended in. If there are a large number of points relative to the chosen dotsize and image resolution, the output image is completely saturated.

@jjguy
Copy link
Owner Author

jjguy commented Jul 31, 2014

The natural solution here is a normalization pass of the input data first, then plot the relative density of that dataset -- with a dial to increase or decrease the overall thresholds as desired.

This is a well-studied problem - it's effectively a 2D histogram. Some links, including a numpy implementation:

I think the simplest is the 2d histogram calculated via the binning method. Preference is to avoid the numpy dependency and implement the algorithm in heatmap.c. The bones:

  • After calculating MAX_X, MAX_Y, MIN_X, MIN_Y, divide that input space into NxM fixed-width bins
  • Iterate over input points, dropping into one of the bins (optionally weighting each point)
  • Run through the existing greyscale point blending code, but use the NxM bindata vs. the original input data

The only tricky thing here is how to select the number of bins. This is another well-studied problem. Some links:

...but the general approaches described include considerations for the heatmap output format. When deciding number of bins, we'll have to consider dotsize and output resolution -- perhaps to the exclusion of the "traditional" selection criteria. My starting point would be to have enough bins such that in the output image each dot overlaps 80% from it's neighbors.

For example, given a dotsize of 150px and a resolution of 1024px, then use (1024/150)*5 = 34 bins.

(shrug) it'll take some tuning and study.

@kwauchope
Copy link

A naive approach (my original thought for a hack) would be to have the original density array as an array of floats and use a simple additive combination function. After that normalise and put in the range from 0-255 before final colorisation as before. It would almost double memory consumption of this stage though pixels_1B + pixels_4B to pixels_4B + pixels_4B.

I'd be interested to see if it took much longer for floating point arithmetic as it is only basic addition. There is an extra normalisation step at the end though that normalisation step currently occurs for every pixel around a point: pixels[ndx] = (pixels[ndx] * pixVal) / 255. There may even be a performance increase for this solution compared to the current one when pixels < (dotsize^2*numpoints).

Look forward to seeing what you come up with :)

@kwauchope
Copy link

@jjguy I've implemented my naive approach, as it skips doing the normalisation at each step it doesn't seem to impact performance as far as time goes.

nosaturation

Added customisable weights for the intensity decay and with the new way they are less black magic to make look right.

Looking to return the max and min values from the normalisation so they can be used to create a legend.

Let me know what you think. It will break backwards compatibility to some extent as images will look different though the same API is still used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants