Skip to content

Commit

Permalink
Merge pull request #284 from djhoese/feature-dask-ewa
Browse files Browse the repository at this point in the history
Add dask-friendly EWA resampler class (DaskEWAResampler)
  • Loading branch information
mraspaud authored Feb 4, 2021
2 parents 3acb6aa + bf4b07b commit c440d1b
Show file tree
Hide file tree
Showing 17 changed files with 23,895 additions and 1,536 deletions.
2 changes: 0 additions & 2 deletions .coveragerc

This file was deleted.

3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@ include docs/Makefile
recursive-include docs/source *
include pyresample/test/test_files/*
include LICENSE.txt
include pyresample/ewa/*.h
include pyresample/ewa/_fornav_templates.*
include versioneer.py
include pyresample/version.py
include README.md
recursive-include pyresample *.pyx
146 changes: 84 additions & 62 deletions docs/source/swath.rst
Original file line number Diff line number Diff line change
Expand Up @@ -449,14 +449,16 @@ algorithms that pyresample provides. The KDTree-based algorithms
process each output grid pixel by searching for all "nearby" input
pixels and applying a certain interpolation (nearest neighbor, gaussian, etc).
The EWA algorithm processes each input pixel mapping it to one or more output
pixels. Once each input pixel has been analyzed the intermediate results are
pixels. Once each input pixel has been analyzed, the intermediate results are
averaged to produce the final gridded result.

The EWA algorithm also has limitations on how the input data is structured
The EWA algorithm also has limitations on how the input data are structured
compared to the generic KDTree algorithms. EWA assumes that data in the array
is organized geographically; adjacent data in the array is adjacent data
geographically. The algorithm uses this to configure parameters based on the
size and location of the swath pixels.
size and location of the swath pixels. It also assumes that data are
scan-based, recorded by a orbiting satellite scan by scan, and the user must
provide scan size with the ``rows_per_scan`` option.

The EWA algorithm consists of two
steps: ll2cr and fornav. The algorithm was originally part of the
Expand All @@ -467,73 +469,93 @@ quality images from VIIRS and AVHRR data with the right parameters.

.. note::

This code was originally part of the Polar2Grid project. This
This code was originally part of the CSPP Polar2Grid project. This
documentation and the API documentation for this algorithm may still
use references or concepts from Polar2Grid until everything can
be updated.

Gridding
********

The first step is called 'll2cr' which stands for "longitude/latitude to
column/row". This step maps the pixel location (lon/lat space) into area (grid)
space. Areas in pyresample are defined by a PROJ.4 projection specification.
An area is defined by the following parameters:

- Grid Name
- PROJ.4 String (either lat/lon or metered projection space)
- Width (number of pixels in the X direction)
- Height (number of pixels in the Y direction)
- Cell Width (pixel size in the X direction in grid units)
- Cell Height (pixel size in the Y direction in grid units)
- X Origin (upper-left X coordinate in grid units)
- Y Origin (upper-left Y coordinate in grid units)

Resampling
**********

The second step of EWA remapping is called "fornav", short for
"forward navigation". This EWA algorithm processes one input scan line
at a time. The algorithm weights the effect of an input pixel on an output
pixel based on its location in the scan line and other calculated
coefficients. It can also handle swaths that are not scan based by specifying
`rows_per_scan` as the number of rows in the entire swath.
How the algorithm treats the data can be configured with various
keyword arguments, see the API documentation for more information.
Both steps provide additional information to inform the user how much data
was used in the result. The first returned value of ll2cr tells you how many
of the input swath pixels overlap the grid. The first returned value of fornav
tells you how many grid points have valid data values in them.

Example
*******
Resampler
*********

The :class:`~pyresample.ewa.DaskEWAResampler` is the easiest way to use the
EWA resampling algorithm. Internally this resampler uses the ``dask`` library
to perform all of its operations in parallel. This will typically provide
better performance than any of the below methods, but does require the
``dask`` library to be installed. The below code assumes you have a
``swath_def`` object (:class:`~pyresample.geometry.SwathDefinition`), an
``area_def`` object (:class:`~pyresample.geometry.AreaDefinition`), and
some array data in ``data``. Data can be a numpy array, a dask array, or
an xarray DataArray object.

.. testsetup::

import numpy as np
from pyresample import geometry
import dask.array as da
import xarray as xr
area_def = geometry.AreaDefinition('areaD', 'Europe (3km, HRV, VTC)', 'areaD',
{'a': '6378144.0', 'b': '6356759.0',
'lat_0': '50.00', 'lat_ts': '50.00',
'lon_0': '8.00', 'proj': 'stere'},
800, 800,
[-1370912.72, -909968.64,
1029087.28, 1490031.36])
data = np.fromfunction(lambda y, x: y*x, (50, 10))
dask_data = da.from_array(data, chunks='auto')
xr_data = xr.DataArray(dask_data)
lons = np.fromfunction(lambda y, x: 3 + x, (50, 10))
lats = np.fromfunction(lambda y, x: 75 - y, (50, 10))
swath_def = geometry.SwathDefinition(lons=lons, lats=lats)
dask_lons = da.from_array(lons, chunks='auto')
dask_lats = da.from_array(lats, chunks='auto')
xr_lons = xr.DataArray(dask_lons)
xr_lats = xr.DataArray(dask_lats)
xr_swath_def = geometry.SwathDefinition(lons=xr_lons, lats=xr_lats)

.. testcode::

from pyresample.ewa import DaskEWAResampler
resampler = DaskEWAResampler(swath_def, area_def)

# if the data are scan based, specify how many data rows make up one scan
rows_per_scan = 5
result = resampler.resample(data, rows_per_scan=rows_per_scan)

Legacy Dask Resampler
*********************

.. note::
This resampler is similar to the above, but only works on xarray DataArray
objects backed by dask arrays. Although it uses dask underneath, it doesn't
use it optimally and in most cases will use a lot of memory.

EWA resampling in pyresample is still in an alpha stage. As development
continues, EWA resampling may be called differently.
.. testcode::

.. doctest::
from pyresample.ewa import LegacyDaskEWAResampler
resampler = LegacyDaskEWAResampler(xr_swath_def, area_def)

# if the data are scan based, specify how many data rows make up one scan
rows_per_scan = 5
result = resampler.resample(xr_data, rows_per_scan=rows_per_scan)

Legacy Function Interface
*************************

It is recommended to use the Resampler interfaces described above whenever
possible. However, the low-level ``ll2cr`` and ``fornav`` functions can be
used if desired. These functions will only work on basic numpy arrays and
although fast, they will use a lot of memory for large input arrays or large
target areas.

.. testcode::

from pyresample.ewa import ll2cr, fornav
# ll2cr converts swath longitudes and latitudes to grid columns and rows
swath_points_in_grid, cols, rows = ll2cr(swath_def, area_def)
# if the data are scan based, specify how many data rows make up one scan
rows_per_scan = 5
# fornav resamples the swath data to the gridded area
num_valid_points, gridded_data = fornav(cols, rows, area_def, data, rows_per_scan=rows_per_scan)

>>> import numpy as np
>>> from pyresample.ewa import ll2cr, fornav
>>> area_def = geometry.AreaDefinition('areaD', 'Europe (3km, HRV, VTC)', 'areaD',
... {'a': '6378144.0', 'b': '6356759.0',
... 'lat_0': '50.00', 'lat_ts': '50.00',
... 'lon_0': '8.00', 'proj': 'stere'},
... 800, 800,
... [-1370912.72, -909968.64,
... 1029087.28, 1490031.36])
>>> data = np.fromfunction(lambda y, x: y*x, (50, 10))
>>> lons = np.fromfunction(lambda y, x: 3 + x, (50, 10))
>>> lats = np.fromfunction(lambda y, x: 75 - y, (50, 10))
>>> swath_def = geometry.SwathDefinition(lons=lons, lats=lats)
>>> # ll2cr converts swath longitudes and latitudes to grid columns and rows
>>> swath_points_in_grid, cols, rows = ll2cr(swath_def, area_def)
>>> # if the data is scan based, specify how many data rows make up one scan
>>> rows_per_scan = 5
>>> # fornav resamples the swath data to the gridded area
>>> num_valid_points, gridded_data = fornav(cols, rows, area_def, data, rows_per_scan=rows_per_scan)

pyresample.bucket
-----------------
Expand Down
Loading

0 comments on commit c440d1b

Please sign in to comment.