-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Always downsample histogram for Imviz in Plot Options #2735
BUG: Always downsample histogram for Imviz in Plot Options #2735
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2735 +/- ##
==========================================
+ Coverage 88.67% 88.83% +0.15%
==========================================
Files 108 108
Lines 15886 15928 +42
==========================================
+ Hits 14087 14149 +62
+ Misses 1799 1779 -20 ☔ View full report in Codecov by Sentry. |
arr = comp.data[y_min:y_max, x_min:x_max] | ||
if self.config == "imviz": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason this is only for imviz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It assumes 2D, so yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it have to assume 2D (can we do this for large cubes as well)? And if so, maybe we can just check the data shape so that this will be available to image viewers in non-imviz configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is large cube a problem? The histogram only compute for the current slice, right? It is very seldom to have very large slice dimension, at least for the use cases we have so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The histogram should show whatever is being used to compute stretch percentiles, in my opinion. We're soon allowing toggling between current slice and full cube, so I'd expect the histogram to follow that same choice in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am torn on this because it sounds like the ultimate goal is you want true random sampling from glue-core (see #2735 (comment)). This means my fix here would be temporary and the immediate problem I am fixing is large image in Imviz, so the more I localize this now, the easier it is to revert in the future when the actual fix you want is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fair, but I meant it should at least represent the same underlying data (even if we don't do exactly the same right now). I think this will eventually need to be generalized to the cube case, so if that's easy to do now, great, otherwise I guess we can do it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline and agreed to confine this to Imviz for now since there isn't any reported problem for Cubeviz yet.
because really there is no reason to sample everything.
b790011
to
aad7319
Compare
"source": [ | ||
"fig2, ax2 = plt.subplots()\n", | ||
"_ = ax2.hist(subarr, bins=10)\n", | ||
"_ = ax2.set_title(\"Gridded\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I see what @camipacifici is saying about the "frame" being wiped out. But then again, are the percentile vmin/vmax calculated using all the data, randomly sampled data, or gridded downsampled data in Imviz? Maybe @astrofrog knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if the cons of gridding matters in the end if it is the vmin/vmax that you are worried about. This is what Plot Options actually look like with the framed data in this notebook.
imviz = Imviz()
imviz.load_data(arr, data_label="Original")
imviz.show()
No gridding (main) | With gridding (this PR) |
---|---|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To repeat what I said on Slack here: As things currently stands on main
, gridding improves performance and does not seem to affect significantly how vmin/vmax is computed. If the real goal is to accurately have histogram and vmin/vmax reflect the real (full) data via true random sampling, I think the fix is actually in glue-core and would be quite involved. (Though my understanding of glue internals is very limited so feel free to correct me.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me so far, just a few questions:
arr = comp.data[y_min:y_max, x_min:x_max] | ||
if self.config == "imviz": | ||
# Downsample input data to about 400px (as per compass.vue) for performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't matter that this can change the aspect ratio of the image, right? It's just being used for statistics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data is flattened anyway (using ravel()
) for stats, so no, I don't think aspect ratio is important here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Another thought: this is basically making the maximum image size roughly 600x600 (which is when round(size/400) becomes greater than 1). If you have a 1000 x 50 array or something like that, the y dimension will be downsampled even though the image area is less than the maximum 600x600. Should the constraint be on area rather than an individual dimension size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the constraint be on area rather than an individual dimension size?
I am not sure. In the case of 1000 x 50 (nx
x ny
), it would become 500 x 50. This means X is sampled every other column, but Y is fully sampled. So if this is a star, imagine taking every other vertical strip off it and then doing stats on them.
Does it really affect the final histogram or vmin/vmax in a meaningful way? Not sure. If you have a science image like this, you can see how the histogram and vmin/vmax behave with and without this patch.
Also, if Tom R is going to do it properly upstream soon, then maybe we shouldn't waste too much time on this. I don't remember if he gave a concrete timeline this morning or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good questions though! Do you want to investigate the science image case with 1000 x 50? If not, I can try to do it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I did a quick check and posting my results here. You can decide whether these are good or bad.
I make a 1000 x 50 using one of the data in ImvizDitheredExample notebook and displays it:
from astropy import units as u
from astropy.io import fits
from astropy.nddata import NDData
from astropy.utils.data import download_file
from astropy.wcs import WCS
from jdaviz import Imviz
imviz = Imviz()
imviz.show()
acs_47tuc_1 = download_file(
'https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/jbqf03gjq_flc.fits', cache=True)
pf = fits.open(acs_47tuc_1)
a = NDData(pf[1].data[:50, :1000] * u.electron, wcs=WCS(pf[1].header, pf))
imviz.load_data(a, data_label="long_image")
On main | This PR |
---|---|
p.s. The aspect ratio thingy does affect the Compass visualization though given we have plan to refactor that to use something completely different, I am not too worried about it.
arr = image[image.main_components[0]][::ystep, ::xstep] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the test image it doesn't really impact the final histogram, and if this is being changed upstream then i agree not to waste time.
does this need a test? some lines are not covered |
I am not sure how to properly test this. Any idea? |
Hmmm im not sure. Does this code get called as soon as you open plot options? if so, the you could write a test that opens the plugin which would then run all the code, and then if the histogram bins or vmin/vmax are accessible just verify they are what you expected? If its not possible then thats fine it's only a few lines of coverage |
The main problem is that I don't know what is expected here, or to what degree of tolerance is acceptable. If you have an input and answers, I can probably write something up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we can replace this with a more robust upstream solution in the near future, but until then this works for now and is an important performance improvement. Thanks!
… Imviz in Plot Options
Thanks for the thorough reviews. Much appreciated! |
…5-on-v3.8.x Backport PR #2735 on branch v3.8.x (BUG: Always downsample histogram for Imviz in Plot Options)
…pacetelescope#2735)" This reverts commit fbef31f.
…pacetelescope#2735)" This reverts commit fbef31f.
…pacetelescope#2735)" This reverts commit fbef31f.
…pacetelescope#2735)" This reverts commit fbef31f.
…pacetelescope#2735)" This reverts commit fbef31f.
* Revert "BUG: Always downsample histogram for Imviz in Plot Options (#2735)" This reverts commit fbef31f. * Use glue's ability to randomly sample values when computing histograms and image statistics, and remove code to handle NaN values that caused the whole array to be loaded into memory. * Use order='K' in ravel() to avoid copy * Avoid calling ravel() since this causes a copy for cutouts * Add comment to explain percentile values * Fix issue when array len() match but .shape doesn't * Adjust reference values for test * Bumped minimum required version of glue-core and glue-jupyter * Fix case where data is not a Numpy array * Add change log
… (spacetelescope#2771) * Revert "BUG: Always downsample histogram for Imviz in Plot Options (spacetelescope#2735)" This reverts commit fbef31f. * Use glue's ability to randomly sample values when computing histograms and image statistics, and remove code to handle NaN values that caused the whole array to be loaded into memory. * Use order='K' in ravel() to avoid copy * Avoid calling ravel() since this causes a copy for cutouts * Add comment to explain percentile values * Fix issue when array len() match but .shape doesn't * Adjust reference values for test * Bumped minimum required version of glue-core and glue-jupyter * Fix case where data is not a Numpy array * Add change log
Description
This pull request is to address serious performance issues in Imviz when large image is loaded and someone opens up Plot Options and everything waits for Histogram to compute.
Change log entry
CHANGES.rst
? If you want to avoid merge conflicts,list the proposed change log here for review and add to
CHANGES.rst
before merge. If no, maintainershould add a
no-changelog-entry-needed
label.Checklist for package maintainer(s)
This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.
trivial
label.