Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dask-friendly EWA resampler class (DaskEWAResampler) #284

Merged
merged 39 commits into from
Feb 4, 2021

Conversation

djhoese
Copy link
Member

@djhoese djhoese commented Jun 1, 2020

See #281 for other details. This is an initial implementation that tries to take more advantage of dask for the EWA resampling algorithm. So far I've been testing with the worst case for this algorithm and comparing it to the existing/old implementation in Satpy where this case is almost a best case. That case is a 10 granule VIIRS SDR case of resampling I04 to a target area that includes all input data and has no empty chunks. This is worst case for this dask-based implementation because:

  1. All source chunks are used.
  2. No target chunks are empty.
  3. We aren't resampling multiple products at the same time which this algorithm could probably benefit from by having dask manage how much memory is being used (number of workers and chunk size).

This is best case for the old implementation because of the oppose of all of the above. With that said, at the time of writing, this implementation takes a couple seconds longer (18s -> 20s) and about the same amount of memory if not a little more (~4GB-4.5GB).

Optimizations TODO:

  • Allow persisting and checking source chunks to have them not loaded during the second stage of resampling (fornav)
  • Cache the above results properly so resampling multiple datasets can reuse this source chunk usage information.
  • Chain the last stages of resampling instead of providing all inputs at once to reduce memory consumption (I assume)
  • Pass averaging (accums / weights) to the existing C code to handle edge cases and hopefully improve performance...maybe.

Other TODOs to make this use best practices:

  • Use dask's tokenizing to come up with output array names/keys so avoid collisions/conflicts.

@djhoese djhoese self-assigned this Jun 1, 2020
@coveralls
Copy link

coveralls commented Jun 1, 2020

Coverage Status

Coverage increased (+1.3%) to 92.571% when pulling bf4b07b on djhoese:feature-dask-ewa into 3acb6aa on pytroll:master.

@codecov
Copy link

codecov bot commented Jun 1, 2020

Codecov Report

Merging #284 (bf4b07b) into master (3acb6aa) will increase coverage by 1.28%.
The diff coverage is 87.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #284      +/-   ##
==========================================
+ Coverage   91.29%   92.58%   +1.28%     
==========================================
  Files          45       48       +3     
  Lines        9505     9826     +321     
==========================================
+ Hits         8678     9097     +419     
+ Misses        827      729      -98     
Flag Coverage Δ
unittests 92.58% <87.72%> (+1.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyresample/ewa/__init__.py 71.42% <66.66%> (+1.78%) ⬆️
pyresample/resampler.py 60.34% <71.42%> (+25.56%) ⬆️
pyresample/ewa/ewa.py 76.78% <76.78%> (ø)
pyresample/ewa/_legacy_dask_ewa.py 84.53% <84.53%> (ø)
pyresample/geometry.py 83.91% <89.28%> (+0.09%) ⬆️
pyresample/ewa/dask_ewa.py 89.45% <89.45%> (ø)
pyresample/test/test_dask_ewa.py 97.56% <97.56%> (ø)
pyresample/test/test_geometry.py 97.99% <100.00%> (+<0.01%) ⬆️
pyresample/test/utils.py 71.87% <100.00%> (+0.59%) ⬆️
pyresample/version.py
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3acb6aa...bf4b07b. Read the comment docs.

# Conflicts:
#	pyresample/ewa/_fornav.cpp
#	pyresample/ewa/_ll2cr.c
pyresample/ewa/_legacy_dask_ewa.py Outdated Show resolved Hide resolved
pyresample/test/test_dask_ewa.py Show resolved Hide resolved
@ghost
Copy link

ghost commented Jan 17, 2021

Congratulations 🎉. DeepCode analyzed your code in 0.366 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

pyresample/ewa/dask_ewa.py Outdated Show resolved Hide resolved
pyresample/test/test_dask_ewa.py Outdated Show resolved Hide resolved
pyresample/test/test_dask_ewa.py Outdated Show resolved Hide resolved
@djhoese djhoese marked this pull request as ready for review February 1, 2021 22:24
@djhoese djhoese requested review from mraspaud and pnuu February 1, 2021 22:38
@djhoese
Copy link
Member Author

djhoese commented Feb 1, 2021

I disagree or can't fix the 2 issues that deepcode is complaining about. This is ready for review.

Note the related satpy PR: pytroll/satpy#1522

Copy link
Member

@pnuu pnuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just few inline comments. There are few functions/methods that could be refactored to smaller units, and maybe have more descriptive naming, but as I understand there will be further optimization to be done, so I'll let them be for now as my time is a bit limited..

docs/source/swath.rst Outdated Show resolved Hide resolved
pyresample/ewa/__init__.py Show resolved Hide resolved
pyresample/ewa/__init__.py Show resolved Hide resolved
pyresample/ewa/dask_ewa.py Outdated Show resolved Hide resolved
Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice effort, thanks for putting this together!

I trust you on the algorithm, so I haven't checked it.
I have some styling comments inline.
What I'm most concerned about is the coverage drop, do you think this is just a false alarm?
Moreover, there are now three classes for ewa resampling. Do they really need to exist? Can't there just be one, given that the results are identical to the others?

@@ -1887,7 +1889,8 @@ def projection_y_coords(self):
def outer_boundary_corners(self):
"""Return the lon,lat of the outer edges of the corner points."""
from pyresample.spherical_geometry import Coordinate
proj = Proj(**self.proj_dict)
proj_def = self.crs_wkt if hasattr(self, 'crs_wkt') else self.proj_dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a shortcut method for this? This seems to be used in multiple places

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better long term to just say from now on pyresample requires pyproj 2.2+ (I think that's what we need for all of the CRS stuff to work). This would drop the possibility of not having a .crs attribute. Maybe not in this PR though? Maybe we do one pyresample release with this resampler (and some of the other existing PRs) and then force pyproj versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to force the pyproj version in this PR, could you then factorize this line and it's duplicates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I refactored this into two properties: one for WKT/string and one for CRS/dict. I also added an is_geostationary property to workaround one of the required uses of proj_dict.

pyresample/resampler.py Outdated Show resolved Hide resolved
pyresample/resampler.py Outdated Show resolved Hide resolved
pyresample/resampler.py Show resolved Hide resolved
pyresample/resampler.py Outdated Show resolved Hide resolved
pyresample/test/test_dask_ewa.py Show resolved Hide resolved
pyresample/test/test_dask_ewa.py Outdated Show resolved Hide resolved
@djhoese
Copy link
Member Author

djhoese commented Feb 2, 2021

@mraspaud I'm still working on refactoring the tests but thought I'd comment on your other concerns before you are done for the day (soon). About the coverage, if you are referring to the loss of ~2%, It looks like this coverage comment was back in June before I added any tests. I'm not sure why there hasn't been a more recent one and right now github isn't even showing me all the PR checks. I'll investigate more after my next commit.

About the number of resamplers: there are actually only 2, the new one and the legacy one. The other section in the sphinx docs refers to the old-style calling of just the low-level functions (no resampler class). The legacy one was the original workaround to make EWA resampling work with Satpy's switch to xarray/dask. I wanted to keep it around until Kathy at SSEC and I can do more thorough testing of the new algorithm. My initial testing showed the new algorithm had comparable timing but with less memory usage, but not significantly. I hope to improve performance over time as well. My hope is to completely remove the legacy resampler, but right now I'm not confident enough in the new algorithm to completely throw it away. I'd be OK completely not advertising it and only letting people know about it when they complain about the performance, but it seemed easier to keep it mentioned now. I should probably add a note about its deprecation in the sphinx docs. Given that I want to remove it eventually I'd rather not put the work into combining the two classes into one class or even having a single wrapper function/class that uses the correct class based on a keyword (ex use_legacy=True). I can probably be convinced otherwise, but I think the two separate resamplers right now will be easiest for satpy and polar2grid integration and testing.

@mraspaud
Copy link
Member

mraspaud commented Feb 2, 2021

Sounds good with 2 resamplers

@djhoese
Copy link
Member Author

djhoese commented Feb 2, 2021

Ok all changes are done. I realize now why the coverage hasn't been updating...we don't have github actions added to this repository! Azure tests are the only thing running. Hhhmmm I suppose I need to add them. Maybe a separate PR.

@djhoese
Copy link
Member Author

djhoese commented Feb 3, 2021

NOTE: The failing unstable environment is going to be that way until conda-forge switches to proj 7.2+ for all packages (they pin it to a specific version).

@djhoese djhoese requested a review from mraspaud February 3, 2021 21:55
Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mraspaud mraspaud merged commit c440d1b into pytroll:master Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dask-ify Elliptical Weighted Averaging (EWA) resampling
5 participants