Add dask-friendly EWA resampler class (DaskEWAResampler) #284

djhoese · 2020-06-01T21:29:14Z

See #281 for other details. This is an initial implementation that tries to take more advantage of dask for the EWA resampling algorithm. So far I've been testing with the worst case for this algorithm and comparing it to the existing/old implementation in Satpy where this case is almost a best case. That case is a 10 granule VIIRS SDR case of resampling I04 to a target area that includes all input data and has no empty chunks. This is worst case for this dask-based implementation because:

All source chunks are used.
No target chunks are empty.
We aren't resampling multiple products at the same time which this algorithm could probably benefit from by having dask manage how much memory is being used (number of workers and chunk size).

This is best case for the old implementation because of the oppose of all of the above. With that said, at the time of writing, this implementation takes a couple seconds longer (18s -> 20s) and about the same amount of memory if not a little more (~4GB-4.5GB).

Closes Dask-ify Elliptical Weighted Averaging (EWA) resampling #281
Tests added
Tests passed
Passes git diff origin/master **/*py | flake8 --diff
Fully documented

Optimizations TODO:

Allow persisting and checking source chunks to have them not loaded during the second stage of resampling (fornav)
Cache the above results properly so resampling multiple datasets can reuse this source chunk usage information.
Chain the last stages of resampling instead of providing all inputs at once to reduce memory consumption (I assume)
Pass averaging (accums / weights) to the existing C code to handle edge cases and hopefully improve performance...maybe.

Other TODOs to make this use best practices:

Use dask's tokenizing to come up with output array names/keys so avoid collisions/conflicts.

coveralls · 2020-06-01T21:45:02Z

Coverage increased (+1.3%) to 92.571% when pulling bf4b07b on djhoese:feature-dask-ewa into 3acb6aa on pytroll:master.

codecov · 2020-06-01T21:45:57Z

Codecov Report

Merging #284 (bf4b07b) into master (3acb6aa) will increase coverage by 1.28%.
The diff coverage is 87.72%.

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
+ Coverage   91.29%   92.58%   +1.28%     
==========================================
  Files          45       48       +3     
  Lines        9505     9826     +321     
==========================================
+ Hits         8678     9097     +419     
+ Misses        827      729      -98

Flag	Coverage Δ
unittests	`92.58% <87.72%> (+1.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pyresample/ewa/__init__.py	`71.42% <66.66%> (+1.78%)`	⬆️
pyresample/resampler.py	`60.34% <71.42%> (+25.56%)`	⬆️
pyresample/ewa/ewa.py	`76.78% <76.78%> (ø)`
pyresample/ewa/_legacy_dask_ewa.py	`84.53% <84.53%> (ø)`
pyresample/geometry.py	`83.91% <89.28%> (+0.09%)`	⬆️
pyresample/ewa/dask_ewa.py	`89.45% <89.45%> (ø)`
pyresample/test/test_dask_ewa.py	`97.56% <97.56%> (ø)`
pyresample/test/test_geometry.py	`97.99% <100.00%> (+<0.01%)`	⬆️
pyresample/test/utils.py	`71.87% <100.00%> (+0.59%)`	⬆️
pyresample/version.py
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3acb6aa...bf4b07b. Read the comment docs.

# Conflicts: # pyresample/ewa/_fornav.cpp # pyresample/ewa/_ll2cr.c

pyresample/ewa/_legacy_dask_ewa.py

pyresample/test/test_dask_ewa.py

ghost · 2021-01-17T15:57:53Z

Congratulations 🎉. DeepCode analyzed your code in 0.366 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

pyresample/ewa/dask_ewa.py

pyresample/test/test_dask_ewa.py

pyresample/ewa/dask_ewa.py

djhoese · 2021-02-01T22:38:33Z

I disagree or can't fix the 2 issues that deepcode is complaining about. This is ready for review.

Note the related satpy PR: pytroll/satpy#1522

pnuu

Just few inline comments. There are few functions/methods that could be refactored to smaller units, and maybe have more descriptive naming, but as I understand there will be further optimization to be done, so I'll let them be for now as my time is a bit limited..

docs/source/swath.rst

pyresample/ewa/__init__.py

pyresample/ewa/dask_ewa.py

mraspaud

Nice effort, thanks for putting this together!

I trust you on the algorithm, so I haven't checked it.
I have some styling comments inline.
What I'm most concerned about is the coverage drop, do you think this is just a false alarm?
Moreover, there are now three classes for ewa resampling. Do they really need to exist? Can't there just be one, given that the results are identical to the others?

mraspaud · 2021-02-02T07:35:25Z

pyresample/geometry.py

@@ -1887,7 +1889,8 @@ def projection_y_coords(self):
    def outer_boundary_corners(self):
        """Return the lon,lat of the outer edges of the corner points."""
        from pyresample.spherical_geometry import Coordinate
-        proj = Proj(**self.proj_dict)
+        proj_def = self.crs_wkt if hasattr(self, 'crs_wkt') else self.proj_dict


Should we have a shortcut method for this? This seems to be used in multiple places

I think it would be better long term to just say from now on pyresample requires pyproj 2.2+ (I think that's what we need for all of the CRS stuff to work). This would drop the possibility of not having a .crs attribute. Maybe not in this PR though? Maybe we do one pyresample release with this resampler (and some of the other existing PRs) and then force pyproj versions?

If you don't want to force the pyproj version in this PR, could you then factorize this line and it's duplicates?

Ok I refactored this into two properties: one for WKT/string and one for CRS/dict. I also added an is_geostationary property to workaround one of the required uses of proj_dict.

pyresample/resampler.py

pyresample/test/test_dask_ewa.py

djhoese · 2021-02-02T15:45:27Z

@mraspaud I'm still working on refactoring the tests but thought I'd comment on your other concerns before you are done for the day (soon). About the coverage, if you are referring to the loss of ~2%, It looks like this coverage comment was back in June before I added any tests. I'm not sure why there hasn't been a more recent one and right now github isn't even showing me all the PR checks. I'll investigate more after my next commit.

About the number of resamplers: there are actually only 2, the new one and the legacy one. The other section in the sphinx docs refers to the old-style calling of just the low-level functions (no resampler class). The legacy one was the original workaround to make EWA resampling work with Satpy's switch to xarray/dask. I wanted to keep it around until Kathy at SSEC and I can do more thorough testing of the new algorithm. My initial testing showed the new algorithm had comparable timing but with less memory usage, but not significantly. I hope to improve performance over time as well. My hope is to completely remove the legacy resampler, but right now I'm not confident enough in the new algorithm to completely throw it away. I'd be OK completely not advertising it and only letting people know about it when they complain about the performance, but it seemed easier to keep it mentioned now. I should probably add a note about its deprecation in the sphinx docs. Given that I want to remove it eventually I'd rather not put the work into combining the two classes into one class or even having a single wrapper function/class that uses the correct class based on a keyword (ex use_legacy=True). I can probably be convinced otherwise, but I think the two separate resamplers right now will be easiest for satpy and polar2grid integration and testing.

mraspaud · 2021-02-02T18:40:24Z

Sounds good with 2 resamplers

djhoese · 2021-02-02T19:49:14Z

Ok all changes are done. I realize now why the coverage hasn't been updating...we don't have github actions added to this repository! Azure tests are the only thing running. Hhhmmm I suppose I need to add them. Maybe a separate PR.

djhoese · 2021-02-03T21:36:54Z

NOTE: The failing unstable environment is going to be that way until conda-forge switches to proj 7.2+ for all packages (they pin it to a specific version).

mraspaud

LGTM

djhoese added 2 commits June 1, 2020 12:39

Add initial working dask-friendly EWA functions

5009b28

Add dask-based EWA resampler class (DaskEWAResampler)

e22f22c

djhoese added the enhancement label Jun 1, 2020

djhoese self-assigned this Jun 1, 2020

djhoese added 10 commits June 2, 2020 15:25

Switch dask EWA to use dask reduction function

19134fc

Allow dask EWA to persist ll2cr results

f634825

Add xarray recreation to EWA dask resampler

06779ee

Replace more proj_dict usage in AreaDefinition with crs_wkt

8ae2a05

Add ability to run multi-band datasets through EWA dask resampling

b8b92c6

Fix EWA dask dimensions being incorrect

42beea2

Fix geostationary extents tests not mocking crs properly

6cfd3ed

Fix stickler styling issues

45aba2f

Use partial functions to pass keyword arguments to dask fornav

f61451c

Add maximum_weight_mode support for EWA dask resampler

8129b81

djhoese mentioned this pull request Jun 11, 2020

Using Dask in reprojection corteva/rioxarray#119

Closed

djhoese added 2 commits January 15, 2021 12:54

Merge branch 'master' into feature-dask-ewa

6f47381

# Conflicts: # pyresample/ewa/_fornav.cpp # pyresample/ewa/_ll2cr.c

Add legacy dask-based EWA resampler from Satpy

775d6fd

stickler-ci reviewed Jan 17, 2021

View reviewed changes

pyresample/ewa/_legacy_dask_ewa.py Outdated Show resolved Hide resolved

pyresample/test/test_dask_ewa.py Show resolved Hide resolved

djhoese added 7 commits January 18, 2021 13:01

Switch to pytest for EWA tests

0fce0de

Switch legacy EWA resampler tests to use parametrize

85aadd4

Update legacy ewa tests to use real ll2cr/fornav calls

d84ef1f

Add tests for new dask ewa code

1d562a8

Remove leftover TODO comment

f27ed5d

Add basic numpy support to DaskEWAResampler

41f3618

Fix integer handling in DaskEWAResampler

38140db

stickler-ci reviewed Jan 25, 2021

View reviewed changes

pyresample/ewa/dask_ewa.py Outdated Show resolved Hide resolved

pyresample/test/test_dask_ewa.py Outdated Show resolved Hide resolved

pyresample/test/test_dask_ewa.py Outdated Show resolved Hide resolved

Fix stickler issues

e475842

djhoese added 2 commits January 29, 2021 14:03

Fix pyproj warnings related to EWA and AreaDefinition hashing

61c4b3b

Fix dask ewa not calling the right function for reductions

1c15abc

stickler-ci reviewed Jan 29, 2021

View reviewed changes

pyresample/ewa/dask_ewa.py Outdated Show resolved Hide resolved

Fix stickler line too long

fdff3fe

djhoese mentioned this pull request Jan 29, 2021

Switch to 'ewa' and 'ewa_legacy' resamplers from pyresample pytroll/satpy#1522

Merged

4 tasks

Add resampler specific documentation about EWA

9cdc62b

djhoese marked this pull request as ready for review February 1, 2021 22:24

djhoese added 2 commits February 1, 2021 16:31

Fix styling issues

a385bac

Cleanup if branch in add_xy_coords

262017b

djhoese requested review from mraspaud and pnuu February 1, 2021 22:38

pnuu reviewed Feb 2, 2021

View reviewed changes

docs/source/swath.rst Outdated Show resolved Hide resolved

pyresample/ewa/__init__.py Show resolved Hide resolved

pyresample/ewa/__init__.py Show resolved Hide resolved

pyresample/ewa/dask_ewa.py Outdated Show resolved Hide resolved

mraspaud requested changes Feb 2, 2021

View reviewed changes

Resolve one set of reviewer comments

4d84edd

djhoese added 2 commits February 2, 2021 10:12

Refactor resampler and ewa tests

4c7f7d8

Refactor dask ewa tests to be more parametrized

7eb28dc

djhoese added 4 commits February 2, 2021 12:46

More dask ewa test refactoring

5e1f163

Fix flake8 issues in ewa

142f6ad

Refactor crs handling in AreaDefinition to be a property

b1aee57

Fix redundant imports in dask ewa tests

6ba4847

Merge branch 'master' into feature-dask-ewa

0348f3c

Remove .coveragerc since setup.cfg was already being used

bf4b07b

djhoese requested a review from mraspaud February 3, 2021 21:55

mraspaud approved these changes Feb 4, 2021

View reviewed changes

mraspaud merged commit c440d1b into pytroll:master Feb 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dask-friendly EWA resampler class (DaskEWAResampler) #284

Add dask-friendly EWA resampler class (DaskEWAResampler) #284

djhoese commented Jun 1, 2020 •

edited

Loading

coveralls commented Jun 1, 2020 •

edited

Loading

codecov bot commented Jun 1, 2020 •

edited

Loading

ghost commented Jan 17, 2021 •

edited by ghost

Loading

djhoese commented Feb 1, 2021 •

edited

Loading

pnuu left a comment

mraspaud left a comment

mraspaud Feb 2, 2021

djhoese Feb 2, 2021

mraspaud Feb 2, 2021

djhoese Feb 2, 2021

djhoese commented Feb 2, 2021

mraspaud commented Feb 2, 2021

djhoese commented Feb 2, 2021

djhoese commented Feb 3, 2021

mraspaud left a comment

Add dask-friendly EWA resampler class (DaskEWAResampler) #284

Add dask-friendly EWA resampler class (DaskEWAResampler) #284

Conversation

djhoese commented Jun 1, 2020 • edited Loading

coveralls commented Jun 1, 2020 • edited Loading

codecov bot commented Jun 1, 2020 • edited Loading

Codecov Report

ghost commented Jan 17, 2021 • edited by ghost Loading

👉 View analysis in DeepCode’s Dashboard | Configure the bot

djhoese commented Feb 1, 2021 • edited Loading

pnuu left a comment

Choose a reason for hiding this comment

mraspaud left a comment

Choose a reason for hiding this comment

mraspaud Feb 2, 2021

Choose a reason for hiding this comment

djhoese Feb 2, 2021

Choose a reason for hiding this comment

mraspaud Feb 2, 2021

Choose a reason for hiding this comment

djhoese Feb 2, 2021

Choose a reason for hiding this comment

djhoese commented Feb 2, 2021

mraspaud commented Feb 2, 2021

djhoese commented Feb 2, 2021

djhoese commented Feb 3, 2021

mraspaud left a comment

Choose a reason for hiding this comment

djhoese commented Jun 1, 2020 •

edited

Loading

coveralls commented Jun 1, 2020 •

edited

Loading

codecov bot commented Jun 1, 2020 •

edited

Loading

ghost commented Jan 17, 2021 •

edited by ghost

Loading

djhoese commented Feb 1, 2021 •

edited

Loading