You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
UPDATE 2022-09-28: This discussion is out-of-date, with a new top level API centered around a single generate_mosaic function for simple use cases, and a Pangeo Forge inspired recipe framework for more complicated ones.
Over the next two months, OpenMosaic will be seeing more intensive development under grant-sponsored funding. This work will be focused on making the package ready-to-use for research applications. A large part of that will be designing and implementing the API of the first released version (the other being testing). And so, what follows are my current high-level API plans for OpenMosaic after some initial brainstorming. This discussion is mostly placed here for reference's sake, however, if you happen to come across it before implementation occurs, feel free to contribute your thoughts and suggestions!
Set local files to use or choose autoloading of Level II volumes
Call the mosaicing function that best fits your use case
Apply post gridding operations and save/use output (if not already done in packaged function)
Main Components/Objects
"The Grid"
Define the target points on which to map Level II data. Can use a memory-performant (but regular grid only) openmosaic.GridDefinition or supply a custom grid xarray.Dataset.
Canned/default grids (such as those matching other common datasets) may be included in future releases.
"Data Loaders"
We need some way to fetch (and possibly munge and/or cache) remote data such as Level II volumes from AWS S3 or ERA5 reanalyses from NCAR RDA, given input requirements from the gridding operations. Simple use cases can rely on the defaults, but many other use cases will need some degree of custom configuration in how these data are fetched and operated on. These also need to be Dask serializable to enable full parallelization.
Examples of loaders to be available in initial release
RDA.ERA5Loader (default analysis dataset...in a module since I anticipate more RDA stuff down the road that can share useful things)
S3.LevelIILoader (default Level II dataset...again in a module since I anticipate more things loaded through S3 later on)
"Calculations"
We need to be able to calculate fields on both the source data's radial grid/volumes and the output cartesian grid. Due to the complexities involved in presenting these with the simplest user-level options, while also handling all the ways they can be operated with internally, they will be classes rather than functions. They will be able to applied with or without Dask, and can themselves have data loaders they use.
A key concept for users: specification order of calculations in all (or most) cases will be important! For example, later calculations can take previously computed fields as inputs, and dropped fields are no longer available for later calculations.
Examples of calculations to be available in initial release:
Radial Grid
VelocityDerivatives (given relation, handle both azimuthal shear and radial divergence together...also includes dealiasing)
SpecificDifferentialPhase (KDP)
PyARTFilter (wrap a pyart filter)
Output/Cartesian Grid
MESH (and so also SHI on which it depends)
LayerMaximum (e.g., for both composite reflectivity and low-/mid-level azishear)
EchoTopHeight
Future calculations that may be supported include:
SL3D
Gridding Functions
The core operations of OpenMosaic will be the gridding/mosaicing functions. The Gridder object-based approach of the prior API proved unsatisfactory, and instead, different functions for different common workflows will be provided, all of which are controlled by the previously specified objects. The internals are still likely to involve some kind of "task manager" class to assign dask futures with an understanding of the workflows, but I no longer have plans for that to be part of the user-facing API.
Examples of gridding functions to be available in initial release
generate_mosaic (mosaic(s) at given time(s) using existing NEXRAD files or PyART radar objects, output in-memory as xarray.Dataset. Times, if multiple, are looped over.)
generate_mosaic_with_autoload (mosaic(s) at given time(s) using Level II data automatic loaded from a remote source, output in-memory as xarray.Dataset. Times, if multiple, are looped over.)
write_mosiacs_with_autoload (mosaics at given times using automatically loaded Level II data, output directly to disk. Includes option to specify post gridding calculations to be applied before writing to disk. Dispatches all tasks with Dask, rather than looping over time.)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
UPDATE 2022-09-28: This discussion is out-of-date, with a new top level API centered around a single
generate_mosaic
function for simple use cases, and a Pangeo Forge inspired recipe framework for more complicated ones.Over the next two months, OpenMosaic will be seeing more intensive development under grant-sponsored funding. This work will be focused on making the package ready-to-use for research applications. A large part of that will be designing and implementing the API of the first released version (the other being testing). And so, what follows are my current high-level API plans for OpenMosaic after some initial brainstorming. This discussion is mostly placed here for reference's sake, however, if you happen to come across it before implementation occurs, feel free to contribute your thoughts and suggestions!
Existing API
Very provisional prototype that leaves way too much to the user to handle (see https://github.com/jthielen/OpenMosaic/blob/main/examples/process_full_conus.py). While (for now) low-level routines seem to be decent, the user-level API needs to be completely redone.
Mockups of Draft API for Primary Use-Cases
Example 1
"I want a mosaic over a given region at a single time using default data sources as simply as possible"
Example 2
"What if I want that single time mosaic, but for lots of fields and not just composite reflectivity?"
What if I already have my files on disk? And no dask? And lat/lon grid?
xref #10
What if I want a workflow that distributes mosaicing across many times intelligently with dask? And how about a visual progress indicator?
(fields taken from the intermediate data target for SVRIMG for Detailed Morphology)
Core Concepts of Draft API
Typical Script Outline
Main Components/Objects
"The Grid"
Define the target points on which to map Level II data. Can use a memory-performant (but regular grid only)
openmosaic.GridDefinition
or supply a custom gridxarray.Dataset
.Canned/default grids (such as those matching other common datasets) may be included in future releases.
"Data Loaders"
We need some way to fetch (and possibly munge and/or cache) remote data such as Level II volumes from AWS S3 or ERA5 reanalyses from NCAR RDA, given input requirements from the gridding operations. Simple use cases can rely on the defaults, but many other use cases will need some degree of custom configuration in how these data are fetched and operated on. These also need to be Dask serializable to enable full parallelization.
Examples of loaders to be available in initial release
RDA.ERA5Loader
(default analysis dataset...in a module since I anticipate more RDA stuff down the road that can share useful things)S3.LevelIILoader
(default Level II dataset...again in a module since I anticipate more things loaded through S3 later on)"Calculations"
We need to be able to calculate fields on both the source data's radial grid/volumes and the output cartesian grid. Due to the complexities involved in presenting these with the simplest user-level options, while also handling all the ways they can be operated with internally, they will be classes rather than functions. They will be able to applied with or without Dask, and can themselves have data loaders they use.
A key concept for users: specification order of calculations in all (or most) cases will be important! For example, later calculations can take previously computed fields as inputs, and dropped fields are no longer available for later calculations.
Examples of calculations to be available in initial release:
VelocityDerivatives
(given relation, handle both azimuthal shear and radial divergence together...also includes dealiasing)SpecificDifferentialPhase
(KDP)PyARTFilter
(wrap a pyart filter)MESH
(and so alsoSHI
on which it depends)LayerMaximum
(e.g., for both composite reflectivity and low-/mid-level azishear)EchoTopHeight
Future calculations that may be supported include:
SL3D
Gridding Functions
The core operations of OpenMosaic will be the gridding/mosaicing functions. The
Gridder
object-based approach of the prior API proved unsatisfactory, and instead, different functions for different common workflows will be provided, all of which are controlled by the previously specified objects. The internals are still likely to involve some kind of "task manager" class to assign dask futures with an understanding of the workflows, but I no longer have plans for that to be part of the user-facing API.Examples of gridding functions to be available in initial release
generate_mosaic
(mosaic(s) at given time(s) using existing NEXRAD files or PyART radar objects, output in-memory asxarray.Dataset
. Times, if multiple, are looped over.)generate_mosaic_with_autoload
(mosaic(s) at given time(s) using Level II data automatic loaded from a remote source, output in-memory asxarray.Dataset
. Times, if multiple, are looped over.)write_mosiacs_with_autoload
(mosaics at given times using automatically loaded Level II data, output directly to disk. Includes option to specify post gridding calculations to be applied before writing to disk. Dispatches all tasks with Dask, rather than looping over time.)Beta Was this translation helpful? Give feedback.
All reactions