Crunching the National Water Model #200

rsignell-usgs · 2020-12-10T11:47:59Z

rsignell-usgs
Dec 10, 2020

The National Water Model Reanalysis v2.0 is a 26 year simulation of 2.7 million rivers in the US at hourly intervals. The data was delivered as part of the NOAA Big Data Program to AWS as 227,000+ hourly NetCDF files.

I downloaded (!) and then converted the streamflow files from the reanalysis to a single Zarr dataset with chunks that had a dimension of 100 in the time dimension to facilitate the extraction of time series data. I used rechunker, and to deal with potential input data problems, I looped through the data in month-long chunks, writing and then appending to Zarr at the end of every month. This way I could correct issues with the input data (missing data and bad time stamps), the try again, and on success, append the chunk. See the full notebook for details on the conversion.

The result is a single Zarr dataset on AWS that can be used for time series extraction as well as mapping.

Here's proof: A sample analysis notebook using this new Zarr dataset proof.

In this notebook we use a cluster of 20 workers on a Qhub Dask Gateway cluster to both extract time series and compute the annual mean river discharge for a specific year in less than 2 minutes of wall clock.

Adam-D-Lewis · 2024-09-17T17:06:14Z

Adam-D-Lewis
Sep 17, 2024
Maintainer

I'm closing old discussions in a repo cleanup effort. Feel free to re-open if needed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nebari-dev

Crunching the National Water Model #200

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

nebari-dev

Crunching the National Water Model #200

rsignell-usgs Dec 10, 2020

Replies: 1 comment

Adam-D-Lewis Sep 17, 2024 Maintainer

rsignell-usgs
Dec 10, 2020

Adam-D-Lewis
Sep 17, 2024
Maintainer