Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve efficiency of compute_statistic by minimizing data access #2147

Merged
merged 7 commits into from
May 28, 2020

Conversation

astrofrog
Copy link
Member

This improves the efficiency of compute_statistic, especially in the context of the profile viewer, when subsets are applied, by first finding the minimal bounding box for the selection and then extracting data using this bounding box only. In simple tests, this can improve performance by 30x or more. This works especially well when loading CASA datasets since those are very sensitive to disk access.

This needs tests and a changelog entry, and the runtime errors need to be addressed.

cc @keflavich

@astrofrog astrofrog added this to the v0.16.0 milestone May 21, 2020
subarray_slices = []
for idim in range(mask.ndim):
collapse_axes = tuple(index for index in range(mask.ndim) if index != idim)
valid = mask.any(axis=collapse_axes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step can be arbitrarily expensive, no? It will loop over all of the mask data? Or is mask already sliced down at a previous step?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already access np.any on the mask above so this won't change much, and we typically operate in chunks for big arrays.

@astrofrog
Copy link
Member Author

When we use the subcube approach, we need to pad out the result so the shape matches the result without optimization. I've added a regression test for this, but still need to push up a fix once I have time.

@codecov
Copy link

codecov bot commented May 27, 2020

Codecov Report

Merging #2147 into master will decrease coverage by 0.01%.
The diff coverage is 79.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2147      +/-   ##
==========================================
- Coverage   87.87%   87.86%   -0.02%     
==========================================
  Files         246      246              
  Lines       22596    22708     +112     
==========================================
+ Hits        19857    19952      +95     
- Misses       2739     2756      +17     
Impacted Files Coverage Δ
glue/viewers/matplotlib/qt/toolbar.py 94.52% <60.00%> (-2.59%) ⬇️
glue/core/data.py 90.64% <80.95%> (-0.74%) ⬇️
glue/viewers/matplotlib/viewer.py 93.92% <0.00%> (-1.17%) ⬇️
glue/viewers/matplotlib/state.py 90.18% <0.00%> (-0.66%) ⬇️
glue/viewers/matplotlib/qt/widget.py 81.81% <0.00%> (+3.40%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a746d6f...0515ba1. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants