-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dask-friendly EWA resampler class (DaskEWAResampler) #284
Conversation
Codecov Report
@@ Coverage Diff @@
## master #284 +/- ##
==========================================
+ Coverage 91.29% 92.58% +1.28%
==========================================
Files 45 48 +3
Lines 9505 9826 +321
==========================================
+ Hits 8678 9097 +419
+ Misses 827 729 -98
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
# Conflicts: # pyresample/ewa/_fornav.cpp # pyresample/ewa/_ll2cr.c
Congratulations 🎉. DeepCode analyzed your code in 0.366 seconds and we found no issues. Enjoy a moment of no bugs ☀️. 👉 View analysis in DeepCode’s Dashboard | Configure the bot |
I disagree or can't fix the 2 issues that deepcode is complaining about. This is ready for review. Note the related satpy PR: pytroll/satpy#1522 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few inline comments. There are few functions/methods that could be refactored to smaller units, and maybe have more descriptive naming, but as I understand there will be further optimization to be done, so I'll let them be for now as my time is a bit limited..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice effort, thanks for putting this together!
I trust you on the algorithm, so I haven't checked it.
I have some styling comments inline.
What I'm most concerned about is the coverage drop, do you think this is just a false alarm?
Moreover, there are now three classes for ewa resampling. Do they really need to exist? Can't there just be one, given that the results are identical to the others?
pyresample/geometry.py
Outdated
@@ -1887,7 +1889,8 @@ def projection_y_coords(self): | |||
def outer_boundary_corners(self): | |||
"""Return the lon,lat of the outer edges of the corner points.""" | |||
from pyresample.spherical_geometry import Coordinate | |||
proj = Proj(**self.proj_dict) | |||
proj_def = self.crs_wkt if hasattr(self, 'crs_wkt') else self.proj_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a shortcut method for this? This seems to be used in multiple places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better long term to just say from now on pyresample requires pyproj 2.2+ (I think that's what we need for all of the CRS stuff to work). This would drop the possibility of not having a .crs
attribute. Maybe not in this PR though? Maybe we do one pyresample release with this resampler (and some of the other existing PRs) and then force pyproj versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't want to force the pyproj version in this PR, could you then factorize this line and it's duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I refactored this into two properties: one for WKT/string and one for CRS/dict. I also added an is_geostationary
property to workaround one of the required uses of proj_dict
.
@mraspaud I'm still working on refactoring the tests but thought I'd comment on your other concerns before you are done for the day (soon). About the coverage, if you are referring to the loss of ~2%, It looks like this coverage comment was back in June before I added any tests. I'm not sure why there hasn't been a more recent one and right now github isn't even showing me all the PR checks. I'll investigate more after my next commit. About the number of resamplers: there are actually only 2, the new one and the legacy one. The other section in the sphinx docs refers to the old-style calling of just the low-level functions (no resampler class). The legacy one was the original workaround to make EWA resampling work with Satpy's switch to xarray/dask. I wanted to keep it around until Kathy at SSEC and I can do more thorough testing of the new algorithm. My initial testing showed the new algorithm had comparable timing but with less memory usage, but not significantly. I hope to improve performance over time as well. My hope is to completely remove the legacy resampler, but right now I'm not confident enough in the new algorithm to completely throw it away. I'd be OK completely not advertising it and only letting people know about it when they complain about the performance, but it seemed easier to keep it mentioned now. I should probably add a note about its deprecation in the sphinx docs. Given that I want to remove it eventually I'd rather not put the work into combining the two classes into one class or even having a single wrapper function/class that uses the correct class based on a keyword (ex |
Sounds good with 2 resamplers |
Ok all changes are done. I realize now why the coverage hasn't been updating...we don't have github actions added to this repository! Azure tests are the only thing running. Hhhmmm I suppose I need to add them. Maybe a separate PR. |
NOTE: The failing unstable environment is going to be that way until conda-forge switches to proj 7.2+ for all packages (they pin it to a specific version). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
See #281 for other details. This is an initial implementation that tries to take more advantage of dask for the EWA resampling algorithm. So far I've been testing with the worst case for this algorithm and comparing it to the existing/old implementation in Satpy where this case is almost a best case. That case is a 10 granule VIIRS SDR case of resampling I04 to a target area that includes all input data and has no empty chunks. This is worst case for this dask-based implementation because:
This is best case for the old implementation because of the oppose of all of the above. With that said, at the time of writing, this implementation takes a couple seconds longer (18s -> 20s) and about the same amount of memory if not a little more (~4GB-4.5GB).
git diff origin/master **/*py | flake8 --diff
Optimizations TODO:
Other TODOs to make this use best practices: