Crunching the National Water Model #200
Closed
rsignell-usgs
started this conversation in
Show and tell
Replies: 1 comment
-
I'm closing old discussions in a repo cleanup effort. Feel free to re-open if needed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The National Water Model Reanalysis v2.0 is a 26 year simulation of 2.7 million rivers in the US at hourly intervals. The data was delivered as part of the NOAA Big Data Program to AWS as 227,000+ hourly NetCDF files.
I downloaded (!) and then converted the streamflow files from the reanalysis to a single Zarr dataset with chunks that had a dimension of 100 in the time dimension to facilitate the extraction of time series data. I used rechunker, and to deal with potential input data problems, I looped through the data in month-long chunks, writing and then appending to Zarr at the end of every month. This way I could correct issues with the input data (missing data and bad time stamps), the try again, and on success, append the chunk. See the full notebook for details on the conversion.
The result is a single Zarr dataset on AWS that can be used for time series extraction as well as mapping.
Here's proof: A sample analysis notebook using this new Zarr dataset proof.
In this notebook we use a cluster of 20 workers on a Qhub Dask Gateway cluster to both extract time series and compute the annual mean river discharge for a specific year in less than 2 minutes of wall clock.
Beta Was this translation helpful? Give feedback.
All reactions