-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign NWM Client Subpackage #138
Conversation
@aaraney @hellkite500 Still working through your feedback, but it seems an interested member of the public uploaded the RouteLink files to a public data repository here: https://www.hydroshare.org/resource/7ce5f87bc1904d0c8f297389be5fa169/ |
@aaraney OK, I think I've addressed the major concerns. I'd like to hold off on the typing issues until I have a better grasp on |
python/nwm_client_new/src/hydrotools/nwm_client_new/NWMClient.py
Outdated
Show resolved
Hide resolved
Passed all tests. I'm going to merge this in and relegate further updates to separate PRs. This is one is already too big. |
This is a significant refactor and redesign of
nwm_client
. It implements all existing functionality and compatibility with Google Cloud Platform and generic http servers like NOMADS. The design is so different that I've added it to the repository as a new subpackage undernwm_client_new
. The plan would be to transition the oldnwm_client
to this one after some further testing "in the field". This new package adopts a more component based design spread across 5 modules:NWMClient
: Top level interface responsible for corralling the other four components.NWMFileCatalog
: Interfaces to GCP and HTTP servers used to discover files based on simple queries.FileDownloader
: Asynchronous file downloader.NWMFileProcessor
: Processes raw NetCDF files to datasets and dataframes.ParquetCache
: Implements a parquet version ofHDFCache
to store processed dataframes.As the tool is based on
dask
, I expect it to scale better than the existing client tool. However, per dask's best practices, the tool assumes most requests will fit in memory and therefore defaults to retrievingpandas.DataFrame
with a simple switch to retrievedask.dataframe.DataFrame
for larger-than-memory datasets.Closes #127
Example usage
Inviting @aaraney @hellkite500 @christophertubbs to review.
Testing
nwm_client
because many of those tests were redundant.Notes
dask
.Todos
Checklist