Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Viz-Pipeline Step Function Issues #1096

Open
nickchadwick-noaa opened this issue Feb 18, 2025 · 0 comments
Open

Viz-Pipeline Step Function Issues #1096

nickchadwick-noaa opened this issue Feb 18, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@nickchadwick-noaa
Copy link
Collaborator

nickchadwick-noaa commented Feb 18, 2025

This is a collection of step function failures over the last large weather event that showed holes in our workflow that need to be revisited. Subtasks should be made for these issues and delegated out to the team.

Log Write Failure

Issue

Step Functions will occasionally fail to write logs to Cloudwatch due to internal networking disruption.

Solution

Add Step Function State Retry mechanisms to all log-writing states so that the whole pipeline doesn't fail just because of rare connectivity issues.

Example Errors

Image

Image

Image


DB Deadlocks

Issue

Multiple pipelines trying to update the same hand_id,rc_stage_ft tuples at the same time.

Solution

Add Step Function State retry or Python retry logic on DeadlockDetected Exception.

Example Errors

srf_18hr_max_inundation
Image

ana_past_14day_max_inundation
Image


DB Connection Issue

Issue

Lambda failing to connect to Ingest RDS Instance for multiple database tables in multiple service pipelines. This is happening specifically when the rds-viz query is accessing the rds-ingest via a foreign data wrapper.

Solution

TBD

Example Errors

Image


Python Preprocess - 10GB

Issue

Lambda timed out on the ppp_mrf_mem1 pipeline due to the amount and size of files being downloaded and processed.

Solution

Need to rethink this Lambda to accommodate very large weather events. Possibly chunking the list of files to be downloaded and running multiple Lambda Invocations.

@nickchadwick-noaa nickchadwick-noaa added the enhancement New feature or request label Feb 18, 2025
@nickchadwick-noaa nickchadwick-noaa added this to the V2.2.0 milestone Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants