-
Notifications
You must be signed in to change notification settings - Fork 676
feat: Deploy SLA planner to Kubernetes #2135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…-create-deploy-crds-for-vllm_v1-example
… of github.com:ai-dynamo/dynamo into hannahz/dep-216-create-deploy-crds-for-vllm_v1-example
… of https://github.com/ai-dynamo/dynamo into hannahz/dep-216-create-deploy-crds-for-vllm_v1-example
… of https://github.com/ai-dynamo/dynamo into hzhou/profile_vllmv1_k8s
…mo/dynamo into hzhou/profile_vllmv1_k8s
…/profile_vllmv1_k8s
…/profile_vllmv1_k8s
…/profile_vllmv1_k8s
WalkthroughThis update removes all support for local deployment and the LocalConnector from the planner, refactors the SLA planner and Prometheus components to async worker functions, unifies backend naming from "vllm_v1" to "vllm", updates deployment manifests and documentation, and improves configuration, scaling, and testing for Kubernetes-based SLA planner deployments. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant Prometheus
participant Planner
participant Workers
User->>Frontend: Send streaming request
Frontend->>Prometheus: Expose /metrics endpoint
Prometheus->>Planner: Provide metrics (TTFT, ITL, ISL, OSL)
Planner->>Kubernetes API: Adjust worker replicas (Prefill/Decode)
Planner->>Workers: Scale up/down via K8s
User-->>Frontend: Receive streamed response
sequenceDiagram
participant CLI
participant Profiler
participant Config
CLI->>Profiler: Run profile_sla.py --backend vllm
Profiler->>Config: Use "vllm" backend config
Profiler->>Profiler: Run profiling, find best TP sizes
Profiler->>User: Instruct to update config with best TP sizes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🔭 Outside diff range comments (1)
benchmarks/profiler/README.md (1)
157-167: Remove duplicate section.Lines 157-167 appear to be an exact duplicate of lines 146-156, both explaining how to change the
DYN_LOGlevel with identical YAML examples.Apply this diff to remove the duplicate section:
-To change `DYN_LOG` level, edit the yaml file by adding - -```yaml -... -spec: - envs: - - name: DYN_LOG - value: "debug" # or other log levels - ... -```
🧹 Nitpick comments (9)
benchmarks/profiler/README.md (1)
1-1: Consider the file structure and naming consistency.The file starts with a simple reference to external documentation but then contains extensive inline documentation. This creates confusion about whether this is a pointer file or comprehensive documentation.
Consider either:
- Making this a true pointer file with just the reference, or
- Updating the opening line to reflect that this file contains the actual documentation
docs/architecture/sla_planner.md (1)
11-11: Clear deprecation notice and migration guidance.The updated warning effectively communicates the deprecation of bare metal deployment and directs users to Kubernetes deployment.
Optional: Consider hyphenating "bare metal" as "bare-metal" when used as a compound adjective modifying "deployment".
docs/guides/dynamo_deploy/sla_planner_deployment.md (1)
1-117: Excellent comprehensive deployment guide.This new deployment guide provides thorough, well-structured documentation covering all aspects of SLA planner deployment from prerequisites through troubleshooting. The step-by-step approach, architecture overview, and practical examples make this highly valuable for users.
Minor improvement needed: The fenced code block at lines 46-52 should specify a language for better rendering:
-``` +```text vllm-disagg-planner-frontend-* 1/1 Running vllm-disagg-planner-prometheus-* 1/1 Running vllm-disagg-planner-planner-* 1/1 Running vllm-disagg-planner-backend-* 1/1 Running vllm-disagg-planner-prefill-* 1/1 Runningcomponents/planner/src/dynamo/planner/prometheus.py (1)
43-46: Consider cleaning up temporary configuration file.The temporary file created with
delete=Falseis never explicitly deleted. Consider adding cleanup in the finally block or using a context manager.- temp_file = tempfile.NamedTemporaryFile(mode="w", suffix=".yml", delete=False) - yaml.dump(config, temp_file) - temp_file.close() - config_path = temp_file.name + import os + temp_file = tempfile.NamedTemporaryFile(mode="w", suffix=".yml", delete=False) + config_path = temp_file.name + try: + yaml.dump(config, temp_file) + temp_file.close()And add cleanup after the process loop:
except asyncio.CancelledError: logger.info("Shutting down Prometheus...") process.terminate() process.wait() + os.unlink(config_path) raise + finally: + if os.path.exists(config_path): + os.unlink(config_path)components/planner/src/dynamo/planner/planner_sla.py (2)
28-30: Address the TODO: Remove hardcoded startup delayThe 30-second delay is a workaround that should be replaced with proper readiness checks. Consider implementing a retry mechanism that waits for dependent components to be ready instead of using a fixed delay.
Would you like me to generate a solution that implements proper component readiness checks or open an issue to track this technical debt?
46-52: Document the purpose of the mock endpointThe mock endpoint appears to be a placeholder. Consider adding a more descriptive comment explaining why this endpoint exists and whether it's intended to be replaced with actual functionality.
async def generate(request: RequestType): - """Dummy endpoint to satisfy that each component has an endpoint""" + """Mock endpoint required by the Dynamo runtime framework. + + Each component must expose at least one endpoint. This placeholder + endpoint satisfies that requirement while the actual planning logic + is handled by the start_sla_planner function. + """ yield "mock endpoint"components/planner/src/dynamo/planner/defaults.py (1)
57-59: Validate environment variable valuesThe port and namespace values from environment variables are used without validation. Invalid values could cause runtime issues.
Add basic validation:
class SLAPlannerDefaults(BasePlannerDefaults): port = os.environ.get("DYNAMO_PORT", "8000") + if not port.isdigit() or not (1 <= int(port) <= 65535): + raise ValueError(f"Invalid DYNAMO_PORT value: {port}. Must be a valid port number (1-65535).") + namespace = os.environ.get("DYNAMO_NAMESPACE", "vllm-disagg-planner") + if not namespace or not namespace.replace("-", "").replace("_", "").isalnum(): + raise ValueError(f"Invalid DYNAMO_NAMESPACE value: {namespace}. Must be a valid Kubernetes namespace name.") + prometheus_endpoint = _get_default_prometheus_endpoint(port, namespace)components/planner/src/dynamo/planner/kube.py (1)
142-156: Consider returning False instead of raising exception for missing deploymentThe
is_deployment_readymethod raises an exception when the deployment is not found, which is inconsistent with its boolean return type. This could cause unexpected errors for callers expecting a simple True/False response.async def is_deployment_ready(self, graph_deployment_name: str) -> bool: """Check if a graph deployment is ready""" graph_deployment = self._get_graph_deployment_from_name(graph_deployment_name) if not graph_deployment: - raise ValueError(f"Graph deployment {graph_deployment_name} not found") + # Deployment not found means it's not ready + return False conditions = graph_deployment.get("status", {}).get("conditions", []) ready_condition = next( (c for c in conditions if c.get("type") == "Ready"), None ) return ready_condition is not None and ready_condition.get("status") == "True"Alternatively, if you want to distinguish between "not found" and "not ready", consider creating a separate method like
deployment_exists().components/planner/test/kube.py (1)
148-150: Consider extracting test timeout constantsThe test timeout values (max_attempts=2, delay_seconds=0.1) are repeated across multiple tests. Consider extracting these as constants for better maintainability.
+# Test constants for faster execution +TEST_MAX_ATTEMPTS = 2 +TEST_DELAY_SECONDS = 0.1 + @pytest.mark.asyncio async def test_wait_for_graph_deployment_ready_success(k8s_api, mock_custom_api): """Test wait_for_graph_deployment_ready when deployment becomes ready""" # ... existing code ... await k8s_api.wait_for_graph_deployment_ready( - "test-deployment", max_attempts=2, delay_seconds=0.1 + "test-deployment", max_attempts=TEST_MAX_ATTEMPTS, delay_seconds=TEST_DELAY_SECONDS )Also applies to: 176-177, 189-191
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
benchmarks/profiler/README.md(1 hunks)benchmarks/profiler/profile_sla.py(1 hunks)benchmarks/profiler/utils/config.py(7 hunks)benchmarks/profiler/utils/dynamo_deployment.py(2 hunks)components/backends/vllm/README.md(3 hunks)components/backends/vllm/deploy/disagg_planner.yaml(4 hunks)components/backends/vllm/src/dynamo/vllm/args.py(5 hunks)components/planner/README.md(1 hunks)components/planner/src/dynamo/planner/__init__.py(0 hunks)components/planner/src/dynamo/planner/defaults.py(3 hunks)components/planner/src/dynamo/planner/kube.py(4 hunks)components/planner/src/dynamo/planner/kubernetes_connector.py(4 hunks)components/planner/src/dynamo/planner/local_connector.py(0 hunks)components/planner/src/dynamo/planner/planner_sla.py(2 hunks)components/planner/src/dynamo/planner/prometheus.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_core.py(5 hunks)components/planner/src/dynamo/planner/utils/prometheus.py(2 hunks)components/planner/test/kube.py(4 hunks)components/planner/test/kubernetes_connector.py(1 hunks)docs/architecture/pre_deployment_profiling.md(1 hunks)docs/architecture/sla_planner.md(2 hunks)docs/guides/dynamo_deploy/sla_planner_deployment.md(1 hunks)
💤 Files with no reviewable changes (2)
- components/planner/src/dynamo/planner/init.py
- components/planner/src/dynamo/planner/local_connector.py
🧰 Additional context used
🧠 Learnings (9)
components/planner/README.md (1)
Learnt from: biswapanda
PR: #1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
components/backends/vllm/README.md (1)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
docs/guides/dynamo_deploy/sla_planner_deployment.md (2)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: nnshah1
PR: #2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
benchmarks/profiler/utils/config.py (2)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the resources section under ServiceArgs specifies resources per individual worker, not total resources for all workers. So workers: 8 with gpu: '1' means 8 workers × 1 GPU each = 8 GPUs total.
Learnt from: krishung5
PR: #1388
File: examples/multimodal/utils/model.py:47-53
Timestamp: 2025-06-09T17:52:06.761Z
Learning: Different Vision-Language Models (VLMs) have different hidden dimensions, so using a single hardcoded fallback value of 4096 in get_vision_embeddings_size() is problematic and can cause tensor shape mismatches. The current code uses getattr(config, "hidden_size", 4096) where 4096 is a fallback default based on LLaVA models, but other VLMs like Qwen2-VL may have different dimensions.
components/planner/src/dynamo/planner/utils/planner_core.py (1)
Learnt from: biswapanda
PR: #1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
components/planner/src/dynamo/planner/planner_sla.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
components/planner/src/dynamo/planner/prometheus.py (2)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.
Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.
components/backends/vllm/src/dynamo/vllm/args.py (2)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.
components/backends/vllm/deploy/disagg_planner.yaml (3)
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: nnshah1
PR: #2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the resources section under ServiceArgs specifies resources per individual worker, not total resources for all workers. So workers: 8 with gpu: '1' means 8 workers × 1 GPU each = 8 GPUs total.
🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_deploy/sla_planner_deployment.md
46-46: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 LanguageTool
docs/architecture/sla_planner.md
[uncategorized] ~11-~11: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ts disaggregated setup. > [!WARNING] > Bare metal deployment with local connector is depr...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (31)
components/planner/README.md (1)
18-18: LGTM! Documentation consolidation aligns with PR objectives.The simplification from detailed inline documentation to an external reference is consistent with the PR's focus on Kubernetes-native deployment and documentation consolidation.
components/planner/test/kubernetes_connector.py (1)
66-66: Namespace attribute consistency verified in KubernetesConnector.All occurrences of the old
namespaceattribute have been replaced withdynamo_namespacethroughout the codebase, including the CLI parser, connector implementation, helper methods, and tests. No references tokubernetes_connector.namespaceremain.docs/architecture/pre_deployment_profiling.md (1)
32-32: Excellent addition! Clear instruction for applying profiling results.The added instruction to update the config file with the optimal TP sizes obtained from profiling is essential for users to properly configure their deployments. This bridges the gap between profiling results and actual configuration.
components/backends/vllm/README.md (2)
115-115: Excellent addition of SLA Planner documentation references.The addition of
disagg_planner.yamland links to SLA Planner documentation provides users with clear guidance on the new deployment option and comprehensive documentation.
128-129: Good addition of pre-deployment profiling prerequisite.Adding the pre-deployment profiling prerequisite with clear documentation reference helps users understand the complete workflow for SLA Planner deployment.
benchmarks/profiler/profile_sla.py (1)
592-594: LGTM! Backend naming unification looks good.The changes correctly update the backend naming from
"vllm_v1"to"vllm", which aligns with the broader effort to unify backend naming conventions across the codebase.benchmarks/profiler/utils/config.py (3)
83-83: Consistent metadata naming update.The metadata name change from
"vllm-v1-agg"to"vllm-agg"is consistent with the backend naming unification.
92-180: LGTM! Worker component naming updates are consistent.All references to
WORKER_COMPONENT_NAMES["vllm_v1"]have been correctly updated toWORKER_COMPONENT_NAMES["vllm"], maintaining functionality while aligning with the unified backend naming convention.
235-235: CONFIG_MODIFIERS key update aligns with naming changes.The key change from
"vllm_v1"to"vllm"in the CONFIG_MODIFIERS dictionary is consistent with the backend naming unification.docs/architecture/sla_planner.md (2)
109-115: Excellent deployment guidance with clear quick start.The addition of the detailed deployment guide reference and quick start command provides users with clear, actionable instructions for deploying the SLA planner.
117-118: Important frontend metrics requirement clearly communicated.The note about frontend metrics requirements is well-placed and informative, helping users understand the system dependencies.
components/planner/src/dynamo/planner/utils/prometheus.py (5)
32-38: Excellent defensive programming for empty query results.The addition of empty result checking prevents potential errors when no requests have been made yet, returning 0 as a sensible default for inter-token latency metrics.
45-51: Consistent empty result handling for TTFT metrics.The same defensive pattern is correctly applied to time-to-first-token metrics, maintaining consistency across the API.
58-64: Robust handling for request duration metrics.The empty result check for request duration follows the established pattern, improving the overall reliability of the Prometheus client.
85-91: Appropriate default handling for input sequence metrics.The defensive programming pattern is consistently applied to input sequence token metrics, ensuring graceful handling of scenarios with no available data.
98-104: Complete defensive pattern for output sequence metrics.The final method follows the same robust pattern, completing the consistent empty result handling across all Prometheus query methods.
benchmarks/profiler/utils/dynamo_deployment.py (1)
20-20: Type annotation improvements look good.The changes improve type precision by using explicit typing imports and adding appropriate type ignore comments for untyped modules.
Also applies to: 22-22, 65-65, 67-67
components/backends/vllm/src/dynamo/vllm/args.py (3)
82-88: Dynamic namespace configuration looks good.The use of
DYNAMO_NAMESPACEenvironment variable with a default fallback provides good flexibility for Kubernetes deployments.
135-142: IP resolution with fallback is well implemented.Good error handling for hostname resolution failures. Using IP addresses instead of hostnames in ETCD keys will be more reliable in Kubernetes environments where DNS might not be immediately available.
254-282: Robust IP address resolution implementation.The
get_host_ip()function has excellent error handling with multiple fallback mechanisms:
- Attempts hostname resolution
- Tests if the resolved IP can be bound
- Falls back to localhost with clear warnings
This will prevent issues in various deployment scenarios.
components/planner/src/dynamo/planner/kubernetes_connector.py (2)
28-30: Good separation of namespace concerns.Separating Dynamo namespace from Kubernetes namespace provides better flexibility for multi-namespace deployments.
92-99: Good addition of readiness check before scaling.Checking deployment readiness before applying scaling changes prevents operations on unstable deployments. The warning log is helpful for debugging.
components/planner/src/dynamo/planner/prometheus.py (2)
30-38: Clean async worker implementation.Good refactoring from class-based to async worker pattern using the
@dynamo_workerdecorator. This aligns well with the overall architecture changes.
64-74: Good process lifecycle management.The subprocess handling with proper cancellation and termination is well implemented. The continuous monitoring of process health is a good practice.
components/planner/src/dynamo/planner/utils/planner_core.py (4)
55-55: Good centralization of configuration.Using
SLAPlannerDefaultsfor namespace and Prometheus endpoint configuration aligns with the environment-driven approach.Also applies to: 63-65
171-180: Excellent handling of no-traffic scenarios.The NaN checks prevent invalid scaling decisions when there's no traffic. This is a critical improvement for production stability.
239-246: Good defensive programming for correction factor.The division by zero protection with fallback to default ITL value prevents crashes and ensures the planner continues operating even with incomplete metrics.
294-298: Efficient batch scaling implementation.Using
set_component_replicasfor batch updates is more efficient than individual component operations and reduces the chance of partial updates.components/planner/src/dynamo/planner/kube.py (1)
157-163: Verify the extended timeout is appropriateThe max_attempts has been increased from 60 to 180, resulting in a 30-minute default timeout. This seems excessive for most deployment scenarios.
Please verify that 30 minutes is the intended timeout. Consider:
- Making this configurable via environment variable
- Using exponential backoff instead of fixed intervals
- Documenting why such a long timeout is necessary
Would you like me to generate an implementation with configurable timeout and exponential backoff?
components/planner/test/kube.py (2)
42-58: Well-structured test additions for namespace supportThe tests properly cover the new namespace initialization functionality with both custom and default namespace scenarios.
77-129: Comprehensive test coverage for is_deployment_readyExcellent test coverage for the new
is_deployment_readymethod, including ready, not ready, and not found scenarios. The tests properly mock the internal method and validate all edge cases.
Overview:
This MR provides a complete Kubernetes deployment solution for the SLA-based planner system, enabling automatic worker scaling based on real-time metrics collected via Prometheus. It refactors and replaces the original SLA planner approach with a K8s-native architecture that supports streaming request telemetry and dynamic resource adjustments.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Documentation
Refactor
Tests