[EPIC] Ripple Integration Project #1042

LorneLeonard-NOAA · 2025-01-15T20:08:40Z

This is an EPIC card. As items from this list are addressed, their active cards will be linked.

This is a evolving document. Expect many changes over the next few weeks.

Overall Design / Tasks

Source Data -> Process & Prepare Data ->Publish Services

Learning Ras2Fim previous processing

Phase 1: Start with example dataset

Step 3: Source Code Analysis
Step 4: Process, data integration, loading for Viz.
Step 5: Publish, various tests, not public, internal use only.
Feedback from leadership
Feedback from end-users

After Phase 1

Scale the amount of data (1/3 of total)
Fine tuning hardware and software
Increase the amount of data (2/3 of total)
Fine tuning hardware and software
Full scale testing

DEV(TI) -> UAT -> PRD

[1] Sensitive Information Locations [WILL NOT BE visible in github]

Secret manager
Google Drive

[2] Infrastructure Setup for Development Environment

S3 Bucket(s)

S3 bucket created for development
Folder naming conventions
Optimizing Performance
Measuring Performance

EC2 Instance(s) for design evaluation and testing

Windows
[In progress] Linux for Viz and FIM dev. Multiple EC2's with different sizes for benchmark analysis.
[In progress] Optimizing Performance (low volume)
[In progress] Measuring Performance (low volume)

Folder structure conventions

Data
Measuring
Optimizing
Software

Software conventions

Install Locations
Code Version control (likely in HydroVIS github. Code version not in use for HydroVIS)

[3] Source Data

*** Maybe change this section is more about tool / data analysis and should be pulled to another card?

Heidi Safa has been evaluating and analyzing Ripple data.

Heidi copying example datasets into S3 bucket
[in progress] Deciphering data we need (can we pre-filter some of it? maybe a subfolder for HV consumption? Rest kept for debugging?
Download all Dewberry FIM_30 Ripple model files and folders to FIM/HV s3 buckets. Note: 485 models available. Total size could be over 1 TiB, numbers unconfirmed. ie) determine volumes.
Build script to pull all or filtered data from RTX to HV s3 buckets. While possible to call their buckets remote, Rob strongly advises against it for multiple reasons including; permissions, moving to other enviros (UAT, PROD across multiple regions) and pre-processing if required.
[in progress] Investigate gaps in data reaches. Note: Full replacement dataset from RTX is coming including re-adding missed FIM_10 data, missed in current releases.
Eval an internal version system to reflect what version of a dataset we get. May not want auto tie it to the Ripple public release name of "Ripple 3.0". Maybe subfolders in our S3 to distinguish differences. ie) (current FIM_30 version), replacement FIM_30 dataset coming, FIM_60 dataset coming. Internal dataset name/number convention TBD.

Example to start with:

ble_12030106_EastForkTrinity

[4A] Process Data

Part One: Data flow with small sample (i.e. one or two HUCs)

[in progress]Access to flows2fim.exe software, windows/linux environments. Maybe in the lambdas?
[in progress] Develop Workflow
Isolated workflow test #1058
- Strategy to choose mip, ble datasets when both available.
- Lookup strategy for picking stage, extent, depths.
- flows2fim.exe controls
- flows2fim.exe fim -lib EXTENT
- flows2fim.exe fim -lib DEPTH (HOLD: Scope for this project does not include depth tasks).
- Evaluate Cross Walk strategies (this is an idea)
- Geometry processing and partitioning.
- Benchmark
  - - Disk Speeds
  - - Memory Usage
  - - Network Usage
  - - Disk Size

Should this part be a sub-section somewhere or its own card? Not sure. I mostly is a part of the HV integration, but we can use a py command line twin to it for debuging / developing.

Develop Misc tools
- (Rob) "search by HUC" code block for HydroVIS - Create a HUC S3 search code (not a tool) that can take a HUC number in, and pull down from an S3 bucket, just the files and folders required for processing or paths. Can optionally get just s3 paths or download files or both. Needed for HV code, but a variant of it for FIM. Basic py code already exists in ras2fim and can be ported and adjusted. It has a S3 wildcard system in it.
- (Rob) "search by HUC" for FIM: tool for standard command line use for FIM debugging / Testing. Same as HV system in logic, using the ras2fim S3 wildcard search system.

Part Two: Data flow upscaling

Lessons learnt from part one
Dynamic Cross Walk to handle x75 increase from previous ras2fim data volume.??? TBD
Apply upscaling to resources
Performance and processing analysis, and scale with more HUCs.

Windows
- Relying on ESRI tools for processing (TBD)
Ubuntu
- Using QGIS and open source for processing (TBD)
[In Discussion] Convert Raster datasets to Polygon datasets
- Benchmark
  - - Disk Speeds
  - - Memory Usage
  - - Network Usage
  - - Disk Size

[4B] Loading Static data

Ripple dataset at 1-3 releases per year.
Performance getting ready, large volumes of data uploaded and available for dynamic processing by HV interacting with HAND data. FIM_60 next fall?

[]
Make a FIM to HV deployment tool. It can look through multiple source Ripple model folders and pull out just folders/file it needs to be sent to the HV deployment bucket for automated processing. TBD... might not be needed, depend on pre-filtering or additional processing between Ripple and HV integration if needed.

[5] Publish Data

Part One: Testing workflows to process data for publication

Lambda tests??

(Note: ones below are TBD as part of the integration design processes evaluations)

Windows
- Relying on ESRI tools for processing
Ubuntu
- Using QGIS and open source for processing
Convert Raster datasets to Polygon datasets
- Benchmark
  - - Disk Speeds
  - - Memory Usage
  - - Network Usage
  - - Disk Size
  - Note: We are talking to RTX about pre-building csv with vectors in them so we don't have to convert the tifs to extents. TBD

Part Two: Testing with small samples (i.e. one or two HUCs)

Lessons learnt from part one

Part Three: Scaling all available Ripple data.

Lessons learnt from part two

Integration for Ripple data to HV changes

This could be all steps required to run the system. Now that we have an idea of what we have in flows2fim.exe and the inputs, and we know we want to do some lambda steps, we are now gettign an idea of possible HV integration steps. Some of the HV steps are already pre-determined, such as morphing ras2fim code into ripple code.

1065: Ripple Task Details: HydroVIS Changes / Integration.

Ripple Boundary Service

In the current system, it has a separate service called ras2fim Boundaries. It is simply the WBD huc8 boundaries for each huc8 that has some ras2fim (and soon Ripple) data in it.

ie) current ras2fim v2 example:

1066: Ripple Task Details: Ripple Boundary Service: building out the Ripple Boundary service (replacing the ras2fim boundary service)

LorneLeonard-NOAA added the Request label Jan 15, 2025

LorneLeonard-NOAA assigned RobHanna-NOAA and LorneLeonard-NOAA Jan 15, 2025

LorneLeonard-NOAA changed the title ~~[EPIC] Ripple Project~~ [EPIC] Ripple Integration Project Jan 16, 2025

nickchadwick-noaa added this to the V2.1.x milestone Jan 17, 2025

RobHanna-NOAA modified the milestones: V2.1.x, V2.2.0 Jan 28, 2025

LorneLeonard-NOAA added documentation Improvements or additions to documentation Data Loads Task 3 GAMA Task 3 labels Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Ripple Integration Project #1042

[EPIC] Ripple Integration Project #1042

LorneLeonard-NOAA commented Jan 15, 2025 •

edited by RobHanna-NOAA

Loading

[EPIC] Ripple Integration Project #1042

[EPIC] Ripple Integration Project #1042

Comments

LorneLeonard-NOAA commented Jan 15, 2025 • edited by RobHanna-NOAA Loading

This is a evolving document. Expect many changes over the next few weeks.

Overall Design / Tasks

Phase 1: Start with example dataset

After Phase 1

DEV(TI) -> UAT -> PRD

[1] Sensitive Information Locations [WILL NOT BE visible in github]

[2] Infrastructure Setup for Development Environment

[3] Source Data

[4A] Process Data

Part One: Data flow with small sample (i.e. one or two HUCs)

Part Two: Data flow upscaling

[4B] Loading Static data

[5] Publish Data

Part One: Testing workflows to process data for publication

Part Two: Testing with small samples (i.e. one or two HUCs)

Part Three: Scaling all available Ripple data.

Integration for Ripple data to HV changes

Ripple Boundary Service

LorneLeonard-NOAA commented Jan 15, 2025 •

edited by RobHanna-NOAA

Loading