Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when chunking large arrays with a subset defined #2302

Merged
merged 9 commits into from
Jun 24, 2022

Conversation

rosteen
Copy link
Contributor

@rosteen rosteen commented May 26, 2022

Description

We were running into a bug for large cubes in Jdaviz, where we couldn't retrieve the collapsed spectrum if the cube was above a certain size (above n_chunk_max in compute_statistic as it turned out). I'm not sure if this is the best fix, but it works to avoid the broadcast error we were encountering on line 1689 where there was an array size mismatch with a RangeSubsetState defined and number of array elements > n_chunk_max.

@codecov
Copy link

codecov bot commented May 26, 2022

Codecov Report

Merging #2302 (27c7d88) into main (bc93828) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2302      +/-   ##
==========================================
- Coverage   88.13%   88.13%   -0.01%     
==========================================
  Files         247      247              
  Lines       23365    23370       +5     
==========================================
+ Hits        20592    20596       +4     
- Misses       2773     2774       +1     
Impacted Files Coverage Δ
glue/core/data.py 91.71% <100.00%> (+0.05%) ⬆️
glue/plugins/dendro_viewer/data_factory.py 97.72% <0.00%> (-2.28%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc93828...27c7d88. Read the comment docs.

@pllim
Copy link
Contributor

pllim commented May 26, 2022

🐱

@pllim
Copy link
Contributor

pllim commented May 26, 2022

Woohoo... looking green!

@pllim
Copy link
Contributor

pllim commented May 26, 2022

py310-test-pyqt515-all failure is not our fault, right?

==> macos OS detected
https://uploader.codecov.io/latest/macos/codecov.SHA256SUM
Error: Codecov: Error validating uploader: Misformed armored text

@rosteen
Copy link
Contributor Author

rosteen commented May 26, 2022

Definitely not the fault of this PR, I ran into that once when trying to pin Codecov uploader version. Likely on their end.

@pllim
Copy link
Contributor

pllim commented May 26, 2022

p.s. I thought this was the astropy conda-forge PR. Sorry for the noise wrt CI status. 😆

@dhomeier
Copy link
Collaborator

What works for SliceSubsetState should work for RangeSubsetState, so if there are no other memory issues to be aware of...

@rosteen
Copy link
Contributor Author

rosteen commented May 27, 2022

What works for SliceSubsetState should work for RangeSubsetState, so if there are no other memory issues to be aware of...

Well, a little further down in the code there's an if case for SliceSubsetState that uses subset_state.to_array, and it didn't seem like RangeSubsetState has to_array, just to_mask, so apparently they aren't completely equivalent. It seemed to work fine dropping into the else case that uses to_mask though.

@dhomeier
Copy link
Collaborator

I was more concerned about the memory footprint if chunking is bypassed for huge arrays, but at the location of the broadcast error you pointed to the chunking seems already to be taken care of in the iterate_chunks loop.
Leaves the question if compute_statistic might anywhere be called directly on a (huge) RangeSubsetState.

@dhomeier
Copy link
Collaborator

It seemed to work fine dropping into the else case that uses to_mask though.

Yes, that would have already worked in the self.size <= n_chunk_max case, so should not be affected by larger array sizes.

@astrofrog
Copy link
Member

I'd like to understand a bit better why this is not working currently - in principe it is possible one would use a RangeSubsetState with a huge array. The reason we special case SliceSubsetState is because it can be applied efficiently to a huge array using the standard to_mask, but this is not the case with RangeSubsetState.

@dhomeier - would you have time to investigate if we can fix this without special-casing RangeSubsetState?

@rosteen
Copy link
Contributor Author

rosteen commented May 27, 2022

I'm pretty sure that one of result[chunk_view[axis_index]] and values (on the line linked in the original comment) has the length of the subset and the other has the length of the full axis, resulting in a mismatch error. Agreed that it's better to resolve that than just bail on chunking for RangeSubsetState.

@dhomeier
Copy link
Collaborator

@dhomeier - would you have time to investigate if we can fix this without special-casing RangeSubsetState?

Sure. Not sure how long it will take; I am not that familiar with that code yet – just with regard to the bugfix release.

@pllim
Copy link
Contributor

pllim commented May 27, 2022

Is it too controversial to merge this as-is to get the bug fix release out and then investigate a better fix as a follow-up PR? A working but horrible performance solution is better than completely broken?

@dhomeier
Copy link
Collaborator

dhomeier commented May 27, 2022

I'm pretty sure that one of result[chunk_view[axis_index]] and values (on the line linked in the original comment) has the length of the subset and the other has the length of the full axis, resulting in a mismatch error. Agreed that it's better to resolve that than just bail on chunking for RangeSubsetState.

Yes, from the exception values apparently has length of the original self.shape[axis_index], while result[chunk_view[axis_index]] has the same divided by n_chunks.
I could also confirm the exception for an InequalitySubsetState, so special-casing RangeSubsetState will not get us very far. I guess this more likely applies to any SubsetState other than SliceSubsetState that is already separately treated.
Looks like specifying subset_state somehow interferes with setting view (which unfortunately is not documented in the API).

A working but horrible performance solution is better than completely broken?

It's not just performance per se, since the function potentially doubles-triples the memory footprint.

If that is manageable in your Jdaviz application, maybe a better workaround for now would be to set n_chunk_max=data.size for those cases?

@pllim
Copy link
Contributor

pllim commented May 27, 2022

a better workaround for now would be to set n_chunk_max=data.size for those cases?

The only call of compute_statistic I can find in Jdaviz is here but Ricky would know better if that is the right place to do it or not.

https://github.com/spacetelescope/jdaviz/blob/70eb5f16aa78e8de68faa3c7b8afbb7096f067ad/jdaviz/app.py#L709

@rosteen rosteen changed the title Add additional check for RangeSubsetState in chunking Fix bug when chunking large arrays with a subset defined Jun 23, 2022
@rosteen
Copy link
Contributor Author

rosteen commented Jun 23, 2022

@dhomeier @astrofrog I did some more investigating and implemented a fix to actually generate the correct result shape rather than skipping the chunking altogether for RangeSubsetStates. Let me know if this looks good to you, and if so @dhomeier if you want to add your additional tests from #2304 to this PR, or make a followup PR with them.

@orifox
Copy link

orifox commented Jun 23, 2022

Close! This is working much better than before! I can scrub through the cube. But when I draw a spectral subset, it doesn't seem to register in the plugin tool subset dropdown.

Screen Shot 2022-06-23 at 11 51 34 AM

@rosteen
Copy link
Contributor Author

rosteen commented Jun 23, 2022

Strange, that works for me. Maybe try updating your Jdaviz install to the latest dev version? I had a PR related to the subset plugin merged recently that might fix this for you.

Screen Shot 2022-06-23 at 12 06 59 PM

I'll investigate the test failure soon.

@orifox
Copy link

orifox commented Jun 23, 2022

Try a subset with 2 separated spectral regions. This might be an unrelated bug.

@rosteen
Copy link
Contributor Author

rosteen commented Jun 23, 2022

@orifox Yes, that's exactly the case my recent PR fixed/improved - again, are you up to date with main? If so this is probably a new bug to track down, if not that will probably solve your issue.

Screen Shot 2022-06-23 at 12 41 01 PM

@orifox
Copy link

orifox commented Jun 23, 2022

I just installed the dev branch an hour ago.

@orifox
Copy link

orifox commented Jun 23, 2022

2.6.1.dev79+gcf22f84d

@rosteen
Copy link
Contributor Author

rosteen commented Jun 23, 2022

Got it, good to know - probably a real bug then.

@pllim
Copy link
Contributor

pllim commented Jun 23, 2022

Does fixing failing test also magically fixes Ori's use case?

@rosteen
Copy link
Contributor Author

rosteen commented Jun 23, 2022

Hmm....I'd rate that as "doubtful, but possible". Worth checking.

@dhomeier
Copy link
Collaborator

It does fix the extended test_compute_statistic_chunks in #2304, but not the added test_compute_statistic_shape_view, so some progress showing up, thanks!

Co-authored-by: Derek Homeier <dhomeie@gwdg.de>
@rosteen
Copy link
Contributor Author

rosteen commented Jun 24, 2022

@dhomeier I added both of your tests here, and they both pass locally. Maybe you had some additional differences in your local environment? I added them here to check that they pass on CI and they look ok - there are some failures but they seem to be unrelated to this PR (and I see some failures on main as well...).

glue/core/data.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dhomeier dhomeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out I messed up my own tests; apart from the minor suggestions this looks great, thanks! The PySide2 crashes are known issues and unrelated.

CHANGES.md Outdated Show resolved Hide resolved
glue/core/data.py Outdated Show resolved Hide resolved
Co-authored-by: Derek Homeier <dhomeie@gwdg.de>
@rosteen
Copy link
Contributor Author

rosteen commented Jun 24, 2022

Turned out I messed up my own tests

Ha, it happens! Thanks for the review, I committed your remaining suggestions.

@dhomeier dhomeier merged commit 31327f3 into glue-viz:main Jun 24, 2022
@dhomeier
Copy link
Collaborator

Thanks again for the fix @rosteen!

@pllim
Copy link
Contributor

pllim commented Jun 24, 2022

Yay! @orifox should try again with the latest dev version of glue-core 🤞

@orifox
Copy link

orifox commented Jun 27, 2022

Works beautifully! I have confirmed I can now do all functionality I originally set out to do under this set of issues/tickets. Screenshot below. I sign off on it. If there are other things that come up in the future, we can open new issues.

Screen Shot 2022-06-27 at 7 22 13 AM

@pllim
Copy link
Contributor

pllim commented Jun 27, 2022

@dhomeier , is it possible to do a release with this patch soon? 🙏

@dhomeier
Copy link
Collaborator

@astrofrog what do you think? I'll merge #2235 today; could make a 1.5.0 then.

@astrofrog
Copy link
Member

Sounds good, thanks!

@dhomeier
Copy link
Collaborator

v1.5.0 is on PyPI now; thanks all again for their contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants