The Research DataStream is an array of daily NextGen-based hydrolgic simulations in the AWS cloud. An exciting aspect of the Research DataStream is the NextGen configuration is open-sourced and community editable, which allows any member of the hydrologic community to contribute to improving streamflow predictions. By making the NextGen forcings, outputs, and configuration publicly available, it is now possible to leverage regional expertise and incrementally improve streamflow predictions configured with the NextGen Framework. See the Research DataStream related documentation:
- Find daily output data at: https://datastream.ciroh.org/index.html
- Make improvements to NextGen configuration: Find out how you can contribute here!
- Current status and configuration: Read here!
- Infrastructure as Code: See the template AWS architecture here, which users can deploy within their own AWS account to issue and manage AWS server-based jobs.
- The actual research datastream deployment, which builds upon the template AWS infra, exists here and is available for reference only.
The software backend of the Research DataStream is DataStreamCLI, which is a stand alone tool that automates the process of collecting and formatting input data for NextGen, orchestrating the NextGen run through NextGen In a Box (NGIAB), and handling outputs. This software allows users to run NextGen in an efficient, relatively painless, and reproducible fashion while providing flexibility and integrations like hfsubset, NextGen In A Box, and TEEHR.
- Installation: Follow the Installation Guide to prepare your environment for
DataStreamCLI
. - Guide: Start by running the DataStreamCLI guide! It is an interactive script that will provide a tour of the repo as well as help you form a command with
DataStreamCLI
. - Docs: Make sure to review the documentation for
- Available NextGen models and automated BMI configuration generation
- Datastream options
- Input and output directory structure
- A usage guide for executing
DataStreamCLI
effectively - A step-by-step breakdown of
DataStreamCLI
's internal workflow - An explanation of the Research DataStream
This example will execute a 24 hour NextGen simulation over the Palisade, Colorado watershed with CFE, SLOTH, PET, NOM, and t-route configuration distributed over 4 processes. The forcings used are the National Water Model v3 Retrospective.
First, obtain a hydrofabric file for the gage you wish to model. There are several tooling options to use to obtain a geopackage. One of which, hfsubset, is maintained by the Office of Water Prediction and it integrated in DataStreamCLI.
For Palisade, Colorado:
hfsubset -w medium_range \
-s nextgen \
-v 2.1.1 \
-l divides,flowlines,network,nexus,forcing-weights,flowpath-attributes,model-attributes \
-o palisade.gpkg \
-t hl "Gages-09106150"
Then feed the hydrofabric file to DataStreamCLI along with a few cli args to define the time domain and NextGen configuration
./scripts/datastream -s 202006200100 \
-e 202006210000 \
-C NWM_RETRO_V3 \
-d $(pwd)/data/datastream_test \
-g $(pwd)/palisade.gpkg \
-R $(pwd)/configs/ngen/realization_sloth_nom_cfe_pet_troute.json \
-n 4
And that's it! Outputs will exist at $(pwd)/data/datastream_test/ngen-run/outputs
The entirety of ngen-datastream
is distributed under GNU General Public License v3.0 or later