Improve efficiency of compute_statistic by minimizing data access #2147

astrofrog · 2020-05-21T14:48:21Z

This improves the efficiency of compute_statistic, especially in the context of the profile viewer, when subsets are applied, by first finding the minimal bounding box for the selection and then extracting data using this bounding box only. In simple tests, this can improve performance by 30x or more. This works especially well when loading CASA datasets since those are very sensitive to disk access.

This needs tests and a changelog entry, and the runtime errors need to be addressed.

cc @keflavich

…by computing minimal sub-cube for mask

keflavich · 2020-05-21T14:58:16Z

glue/core/data.py

+                    subarray_slices = []
+                    for idim in range(mask.ndim):
+                        collapse_axes = tuple(index for index in range(mask.ndim) if index != idim)
+                        valid = mask.any(axis=collapse_axes)


This step can be arbitrarily expensive, no? It will loop over all of the mask data? Or is mask already sliced down at a previous step?

We already access np.any on the mask above so this won't change much, and we typically operate in chunks for big arrays.

[ci skip]

astrofrog · 2020-05-26T14:43:53Z

When we use the subcube approach, we need to pad out the result so the shape matches the result without optimization. I've added a regression test for this, but still need to push up a fix once I have time.

codecov · 2020-05-27T10:56:23Z

Codecov Report

Merging #2147 into master will decrease coverage by 0.01%.
The diff coverage is 79.41%.

@@            Coverage Diff             @@
##           master    #2147      +/-   ##
==========================================
- Coverage   87.87%   87.86%   -0.02%     
==========================================
  Files         246      246              
  Lines       22596    22708     +112     
==========================================
+ Hits        19857    19952      +95     
- Misses       2739     2756      +17

Impacted Files	Coverage Δ
glue/viewers/matplotlib/qt/toolbar.py	`94.52% <60.00%> (-2.59%)`	⬇️
glue/core/data.py	`90.64% <80.95%> (-0.74%)`	⬇️
glue/viewers/matplotlib/viewer.py	`93.92% <0.00%> (-1.17%)`	⬇️
glue/viewers/matplotlib/state.py	`90.18% <0.00%> (-0.66%)`	⬇️
glue/viewers/matplotlib/qt/widget.py	`81.81% <0.00%> (+3.40%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a746d6f...0515ba1. Read the comment docs.

Improve efficiency of compute_statistic when applying subset states, …

6fe6944

…by computing minimal sub-cube for mask

astrofrog added the performance label May 21, 2020

astrofrog added this to the v0.16.0 milestone May 21, 2020

keflavich reviewed May 21, 2020

View reviewed changes

astrofrog added 4 commits May 22, 2020 16:28

Fix implementation of minimal mask subcube in compute_statistic

3b6cd57

Added tests for efficient subsetting

3d0a60f

Fix histogram viewer with dask arrays

1a6eb9c

Added test for shape of result from compute_statistic

108b151

[ci skip]

astrofrog added 2 commits May 27, 2020 11:55

Fix result shape in compute_statistic when using efficient approach

f688a19

Avoid AttributeError in Matplotlib

0515ba1

astrofrog merged commit 7fcd828 into glue-viz:master May 28, 2020

dhomeier mentioned this pull request May 29, 2022

Fix return shape of compute_statistic with view set or computed in chunks #2304

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve efficiency of compute_statistic by minimizing data access #2147

Improve efficiency of compute_statistic by minimizing data access #2147

astrofrog commented May 21, 2020

keflavich May 21, 2020

astrofrog May 22, 2020

astrofrog commented May 26, 2020

codecov bot commented May 27, 2020 •

edited

Loading

Improve efficiency of compute_statistic by minimizing data access #2147

Improve efficiency of compute_statistic by minimizing data access #2147

Conversation

astrofrog commented May 21, 2020

keflavich May 21, 2020

Choose a reason for hiding this comment

astrofrog May 22, 2020

Choose a reason for hiding this comment

astrofrog commented May 26, 2020

codecov bot commented May 27, 2020 • edited Loading

Codecov Report

codecov bot commented May 27, 2020 •

edited

Loading