feat: health check changes based on endpoint served #1996

nnshah1 · 2025-07-17T20:58:02Z

Overview:

Adds DYN_SYSTEM_STARTING_HEALTH_STATUS to set the starting health status for a dynamo process

Adds DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS to set a list of endpoints that must be healthy to indicate overall health.

Adds struct with ability to set health status and set endpoint health status and get current health status.

Endpoints are automatically set to Ready when served and to the starting health status when initialized.

Example:

export DYN_SYSTEM_ENABLED="true"
export DYN_SYSTEM_STARTING_HEALTH_STATUS="notready"
export DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS="[\"generate\"]"
python3 examples/vllm/components/main.py --model deepseek-ai/DeepSeek-R1-Distill-Llama-8

In another shell

curl localhost:9090/health -v
 1240 * Host localhost:9090 was resolved.
 1241 * IPv6: ::1
 1242 * IPv4: 127.0.0.1
 1243 *   Trying [::1]:9090...
 1244 * connect to ::1 port 9090 from ::1 port 51428 failed: Connection refused
 1245 *   Trying 127.0.0.1:9090...
 1246 * Connected to localhost (127.0.0.1) port 9090
 1247 > GET /health HTTP/1.1
 1248 > Host: localhost:9090
 1249 > User-Agent: curl/8.5.0
 1250 > Accept: */*
 1251 >
 1252 < HTTP/1.1 503 Service Unavailable
 1253 < content-type: text/plain; charset=utf-8
 1254 < content-length: 92
 1255 < date: Mon, 21 Jul 2025 14:59:44 GMT
 1256 <
 1257 * Connection #0 to host localhost left intact
...
 
 curl localhost:9090/health -v
 1259 * Host localhost:9090 was resolved.
 1260 * IPv6: ::1
 1261 * IPv4: 127.0.0.1
 1262 *   Trying [::1]:9090...
 1263 * connect to ::1 port 9090 from ::1 port 34510 failed: Connection refused
 1264 *   Trying 127.0.0.1:9090...
 1265 * Connected to localhost (127.0.0.1) port 9090
 1266 > GET /health HTTP/1.1
 1267 > Host: localhost:9090
 1268 > User-Agent: curl/8.5.0
 1269 > Accept: */*
 1270 >
 1271 < HTTP/1.1 200 OK
 1272 < content-type: text/plain; charset=utf-8
 1273 < content-length: 112
 1274 < date: Mon, 21 Jul 2025 14:59:59 GMT
 1275 <
 1276 * Connection #0 to host localhost left intact

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

copy-pr-bot · 2025-07-17T20:58:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ryanolson

Can we collected all the Health logic into a struct with methods on it.

I'd like to be able to:

be able to call Health::mark_as_ready() manually,
have a default heath state - could be NotReady or Ready
- a default of ready makes the mark_as_ready() a no-op
you could set a list of (component.endpoint) that need to be live which could auto trigger.

lib/runtime/src/distributed.rs

nnshah1 · 2025-07-18T17:41:11Z

Can we collected all the Health logic into a struct with methods on it.

I'd like to be able to:

be able to call Health::mark_as_ready() manually,

have a default heath state - could be NotReady or Ready

a default of ready makes the mark_as_ready() a no-op

you could set a list of (component.endpoint) that need to be live which could auto trigger.

sounds good - so thinking of an explict set instead of implicit - @grahamking - reasonable - I can start changes to make more of a trait

nnshah1 · 2025-07-18T18:04:01Z

Here's what I'm thinking:

add a default_health_state and enum
add a struct on runtime that can be used to alter health state
add a list on the struct to auto set when served

coderabbitai · 2025-07-21T15:10:28Z

Walkthrough

The changes introduce a comprehensive health status tracking system to the runtime. A new SystemHealth struct and HealthStatus enum are added, with integration across configuration, runtime initialization, HTTP server health endpoints, and endpoint lifecycle management. Health status is now tracked per endpoint and system-wide, with corresponding updates to configuration, tests, and method signatures.

Changes

File(s)	Change Summary
lib/runtime/Cargo.toml	Added `async_closure` feature to `temp-env` dev-dependency.
lib/runtime/src/component.rs	Expanded import block to include `config::HealthStatus`.
lib/runtime/src/component/endpoint.rs	Updated `push_endpoint.start` call to pass endpoint name and system health, reflecting updated method signature.
lib/runtime/src/config.rs	Introduced `HealthStatus` enum and two new fields in `RuntimeConfig` for health status, with serialization, env var mapping, builder pattern, and tests for env var loading.
lib/runtime/src/distributed.rs	Modified `DistributedRuntime::new` to initialize and store a `SystemHealth` instance using config values, updating the struct and constructor logic.
lib/runtime/src/http_server.rs	Updated health handler to return JSON with health details and status code reflecting health; added parameterized tests for health endpoints; added imports and tracing.
lib/runtime/src/lib.rs	Added `SystemHealth` struct with methods for tracking and updating health; extended `DistributedRuntime` with a `system_health` field.
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs	Modified `PushEndpoint::start` to accept endpoint name and system health; updates health status before and after main loop.

Sequence Diagram(s)

sequenceDiagram
    participant Config as RuntimeConfig
    participant Dist as DistributedRuntime
    participant Health as SystemHealth
    participant Http as HTTP Server
    participant Endpoint as PushEndpoint

    Config->>Dist: Provide starting_health_status, use_endpoint_health_status
    Dist->>Health: Initialize SystemHealth with config
    Dist->>Http: Share Arc<Mutex<SystemHealth>>
    Endpoint->>Health: Set endpoint health to Ready at start
    Http->>Health: Query get_health_status()
    Health-->>Http: Return overall and endpoint health
    Endpoint->>Health: Set endpoint health to NotReady on shutdown

Estimated code review effort

3 (~45 minutes)

Poem

In the warren, health we track,
With endpoints ready, never slack.
System’s pulse now easy to see—
JSON sings our status, healthy as can be!
Each bunny hops with peace of mind,
For health and uptime, well-defined.
🐇💚

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (1)
55-58: LGTM! Correct health status lifecycle management.

The health status updates at startup (Ready) and shutdown (NotReady) are properly placed and follow the expected endpoint lifecycle pattern.

Minor optimization suggestion: Consider using &endpoint_name instead of cloning in both health update calls to avoid unnecessary string allocations.
 system_health
     .lock()
     .await
-    .set_endpoint_health_status(endpoint_name.clone(), HealthStatus::Ready);
+    .set_endpoint_health_status(&endpoint_name, HealthStatus::Ready);
And similarly for the shutdown call (assuming the SystemHealth method signature accepts &str).

Also applies to: 114-117
lib/runtime/src/config.rs (1)
127-136: Improve formatting consistency in Display implementation.

The new fields are added to the display output, but the formatting is inconsistent with existing fields. Consider adding proper separators for better readability.

Apply this diff to improve formatting consistency:
-        write!(f, "system_enabled={}", self.system_enabled)?;
-        write!(
-            f,
-            "use_endpoint_health_status={:?}",
-            self.use_endpoint_health_status
-        )?;
-        write!(
-            f,
-            "starting_health_status={:?}",
-            self.starting_health_status
-        )?;
+        write!(f, "system_enabled={}, ", self.system_enabled)?;
+        write!(f, "starting_health_status={:?}, ", self.starting_health_status)?;
+        write!(
+            f,
+            "use_endpoint_health_status={:?}",
+            self.use_endpoint_health_status
+        )?;
lib/runtime/src/lib.rs (1)
79-140: Well-designed health tracking system with minor improvements needed.

The SystemHealth implementation provides a solid foundation for tracking system and endpoint health. The logic correctly handles both system-wide and per-endpoint health tracking.

Consider these improvements for better consistency and robustness:
                 endpoint.clone(),
                 if *ready == HealthStatus::Ready {
                     "ready".to_string()
                 } else {
-                    "notready".to_string()
+                    "not_ready".to_string()
                 },
Also consider adding error handling for cases where an endpoint in use_endpoint_health_status doesn't exist in endpoint_health:
         if !self.use_endpoint_health_status.is_empty() {
             healthy = self.use_endpoint_health_status.iter().all(|endpoint| {
                 self.endpoint_health
                     .get(endpoint)
-                    .map_or(false, |status| *status == HealthStatus::Ready)
+                    .map_or_else(|| {
+                        tracing::warn!("Endpoint {} not found in health map", endpoint);
+                        false
+                    }, |status| *status == HealthStatus::Ready)
             });
lib/runtime/src/http_server.rs (2)
151-173: Excellent enhancement to health endpoint functionality.

The refactored health handler provides much more useful information with:

Structured JSON response including status, uptime, and per-endpoint health

Appropriate HTTP status codes (200 for healthy, 503 for unhealthy)

Proper tracing instrumentation for debugging

Consider using debug level instead of trace for the response logging to make it more accessible during normal debugging:
-    tracing::trace!("Response {}", response.to_string());
+    tracing::debug!("Health response: {}", response);
288-364: Comprehensive test coverage with opportunities for improvement.

The parameterized test effectively covers both healthy and unhealthy scenarios across multiple endpoints. The use of rstest for parameterization and temp_env::async_with_vars for environment isolation is excellent.

Consider these improvements for better maintainability:

Replace println! with proper test logging:
-                println!("[test] Waiting for server to start...");
+                tracing::info!("Waiting for server to start...");
Reduce sleep duration for faster tests:
-                sleep(std::time::Duration::from_millis(1000)).await;
+                sleep(std::time::Duration::from_millis(100)).await;
Consider extracting server setup into a test helper function to reduce code duplication and improve readability.

The test logic is solid and provides excellent coverage of the health endpoint functionality.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb6de94 and 2465345.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

lib/runtime/Cargo.toml (1 hunks)
lib/runtime/src/component.rs (1 hunks)
lib/runtime/src/component/endpoint.rs (1 hunks)
lib/runtime/src/config.rs (8 hunks)
lib/runtime/src/distributed.rs (3 hunks)
lib/runtime/src/http_server.rs (4 hunks)
lib/runtime/src/lib.rs (3 hunks)
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (7)

lib/runtime/Cargo.toml (2)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/runtime/src/component/endpoint.rs (2)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.

lib/runtime/src/lib.rs (2)

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

lib/runtime/src/distributed.rs (4)

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

Learnt from: PeaBrane
PR: #1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.

lib/runtime/src/http_server.rs (2)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: PeaBrane
PR: #1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

lib/runtime/src/config.rs (1)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (6)

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: PeaBrane
PR: #1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.

Learnt from: PeaBrane
PR: #1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Learnt from: jthomson04
PR: #1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the wait_for_key_count function correctly uses exact equality (==) instead of greater-than-or-equal (>=) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: Build and Test - vllm

🔇 Additional comments (13)

lib/runtime/Cargo.toml (1)

72-72: LGTM! Dev dependency feature addition is appropriate.

The addition of the async_closure feature to the temp-env crate aligns with the health status testing enhancements mentioned in the AI summary.

lib/runtime/src/component.rs (1)

32-34: LGTM! Clean import addition for health status support.

The refactoring to multi-line imports improves readability and the addition of HealthStatus import supports the new health tracking functionality.

lib/runtime/src/component/endpoint.rs (1)

113-117: LGTM! Proper integration of health tracking into endpoint startup.

The addition of endpoint name and system health parameters to the push_endpoint.start() call correctly integrates the new health tracking system with endpoint lifecycle management.

lib/runtime/src/distributed.rs (1)

68-86: LGTM! Clean SystemHealth initialization and integration.

The SystemHealth initialization properly uses configuration values and is correctly wrapped in Arc<Mutex<_>> for thread-safe sharing across the distributed runtime. The early config retrieval avoids redundant calls later in the function.

lib/runtime/src/pipeline/network/ingress/push_endpoint.rs (1)

44-49: LGTM! Proper health status lifecycle integration.

The method signature update correctly accepts the necessary parameters for health tracking integration.

lib/runtime/src/config.rs (6)

51-56: LGTM - Well-designed enum for health status.

The HealthStatus enum is properly implemented with appropriate derives and serde configuration for lowercase serialization, making it user-friendly in configuration files.

98-112: LGTM - Well-structured configuration fields.

The new health-related configuration fields are properly defined with:

Clear documentation explaining their purpose

Sensible defaults (NotReady for starting status, empty vector for endpoints)

Correct builder attributes for serialization control

Consistent naming with existing fields

170-171: LGTM - Correct environment variable mapping.

The new environment variable mappings follow the established pattern and correctly map to the corresponding struct fields.

208-209: LGTM - Consistent default values.

The single_threaded() method correctly initializes the new health-related fields with appropriate defaults.

235-236: LGTM - Consistent default values.

The Default implementation correctly initializes the new health-related fields with the same defaults as other constructors.

413-433: Confirm JSON array parsing for Vec in environment variables

I didn’t find any existing examples of loading a JSON-formatted array from an environment variable into a Vec<String> in our config code. Please verify that figment’s Env provider (used by RuntimeConfig::from_settings()) indeed supports deserializing a JSON array string (e.g. "[\"ready\"]") into Vec<String>. If it does not, consider one of the following:

Switch to a comma-delimited string (e.g. "ready,other") and split it manually or via a custom caster.

Add custom parsing logic to handle JSON arrays for use_endpoint_health_status.

Tagging for manual verification to ensure this test will actually pass in CI.

lib/runtime/src/lib.rs (1)

170-171: LGTM - Appropriate thread-safe integration.

The system_health field is correctly added to DistributedRuntime with Arc<Mutex<SystemHealth>> for safe concurrent access across async contexts.

lib/runtime/src/http_server.rs (1)

16-16: LGTM - Necessary imports for enhanced health functionality.

The new imports support the structured JSON health response and health status tracking functionality.

Also applies to: 21-22

lib/runtime/src/lib.rs

ryanolson

Just a question regarding what looks to be duplicate endpoint_name, or redundant endpoint_name being pass to the PushEndpoint

lib/runtime/src/pipeline/network/ingress/push_endpoint.rs

initial commit of health check changes

999cf5d

pull-request-size bot added the size/M label Jul 17, 2025

nnshah1 changed the title ~~draft: initial commit of health check changes~~ feat: initial commit of health check changes Jul 17, 2025

github-actions bot added the feat label Jul 17, 2025

nnshah1 added 2 commits July 17, 2025 16:21

updating test case

06cb8e4

incremental

022955b

ryanolson requested changes Jul 18, 2025

View reviewed changes

lib/runtime/src/distributed.rs Outdated Show resolved Hide resolved

lib/runtime/src/distributed.rs Outdated Show resolved Hide resolved

temporary incrementatl

02b3990

pull-request-size bot added size/L and removed size/M labels Jul 18, 2025

nnshah1 added 4 commits July 20, 2025 14:40

updating

a0fe4e3

Merge branch 'main' into neelays/health_check

83ac404

updated to use functions on trait

9eba586

updating

62d96cd

nnshah1 requested a review from ryanolson July 21, 2025 15:08

nnshah1 marked this pull request as ready for review July 21, 2025 15:08

nnshah1 requested a review from a team as a code owner July 21, 2025 15:08

updated cargo lock

2465345

coderabbitai bot reviewed Jul 21, 2025

View reviewed changes

nnshah1 added 2 commits July 21, 2025 09:53

updates for clippy

eda7f98

updated formatting

7169864

nnshah1 changed the title ~~feat: initial commit of health check changes~~ feat: health check changes based on endpoint served Jul 21, 2025

nnshah1 added 2 commits July 22, 2025 07:24

removed dropping runtime

5f2a214

updated format

979a9ba

ryanolson reviewed Jul 22, 2025

View reviewed changes

lib/runtime/src/lib.rs Show resolved Hide resolved

ryanolson reviewed Jul 22, 2025

View reviewed changes

lib/runtime/src/pipeline/network/ingress/push_endpoint.rs Show resolved Hide resolved

nnshah1 added 5 commits July 22, 2025 12:56

Merge branch 'main' into neelays/health_check

a6c1a28

updating with merges

89a9326

updated for format

baafcb6

update to mark as configuration and remove commented code

53644b7

updated for fmt

dd109ef

nnshah1 requested a review from ryanolson July 22, 2025 23:23

ryanolson approved these changes Jul 23, 2025

View reviewed changes

nnshah1 merged commit b127d95 into main Jul 23, 2025
10 checks passed

nnshah1 deleted the neelays/health_check branch July 23, 2025 00:27

coderabbitai bot mentioned this pull request Aug 21, 2025

refactor: move uptime tracking from system_status_server(HTTP) to DRT level #2587

Merged

This was referenced Sep 4, 2025

feat: Endpoint health check #2859

Closed

fix: fixed failing health probes to enable state transition between s… #2920

Closed

fix: fixed failing health probes to enable state transition #2925

Closed

feat: Canary Health Check. #2903

Merged

feat: health check changes based on endpoint served #1996

feat: health check changes based on endpoint served #1996

Uh oh!

Conversation

nnshah1 commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

copy-pr-bot bot commented Jul 17, 2025

Uh oh!

ryanolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nnshah1 commented Jul 18, 2025

Uh oh!

nnshah1 commented Jul 18, 2025

Uh oh!

coderabbitai bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nnshah1 commented Jul 17, 2025 •

edited

Loading

coderabbitai bot commented Jul 21, 2025 •

edited

Loading