Skip to content

[BUG]: Health Status Transition Issue: Services Stuck in "notready" State #2916

@GavinZhu-GMI

Description

@GavinZhu-GMI

Describe the Bug

Problem Description

Current Issue: Dynamo services using Global Health Mode never automatically transition from "notready" to "ready" status, causing production deployment failures and requiring manual workarounds.


Current Workaround

# Required workaround in every deployment
env:
  - name: DYN_SYSTEM_STARTING_HEALTH_STATUS
    value: "ready"  # Forces immediate ready state

Steps to Reproduce

  1. Deploy a Dynamo service without explicit endpoint health configuration
  2. Check health endpoint: curl http://service:9090/health

Expected Behavior

Expected: Service transitions to "ready" after successful initialization

Actual Behavior

Actual: Service permanently returns {"status":"notready"} with HTTP 503

Environment

Two node PD disagg H200.

Additional Context

No response

Screenshots

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions