feat: update in-cluster benchmark job and yaml

hhzhang16 · hhzhang16 · commit 8f19a4d7cb9b · 2025-09-19T10:42:21.000-07:00
Signed-off-by: Hannah Zhang &lt;hannahz@nvidia.com&gt;
diff --git a/benchmarks/incluster/README.md b/benchmarks/incluster/README.md
@@ -29,70 +29,64 @@ The in-cluster benchmarking solution:
 ## Prerequisites
 
 1. **Kubernetes cluster** with NVIDIA GPUs and Dynamo namespace setup (see [Dynamo Cloud/Platform docs](../../docs/guides/dynamo_deploy/README.md))
-2. **dynamo-pvc** PersistentVolumeClaim configured (see [deploy/utils README](../../deploy/utils/README.md))
-3. **Service account** (`dynamo-sa`) with appropriate permissions (see [deploy/utils README](../../deploy/utils/README.md))
-4. **Docker image** containing the Dynamo benchmarking tools
+2. **Storage and service account** PersistentVolumeClaim and service account configured with appropriate permissions (see [deploy/utils README](../../deploy/utils/README.md))
+3. **Docker image** containing the Dynamo benchmarking tools
 
 ## Quick Start
 
 ### Step 1: Deploy Your DynamoGraphDeployment
 Deploy your DynamoGraphDeployment using the [deployment documentation](../../components/backends/). Ensure it has a frontend service exposed.
 
 ### Step 2: Deploy and Run Benchmark Job
+
+**Option A: Set environment variables (recommended for multiple commands)**
 ```bash
-# Deploy the benchmark job with your namespace
-NAMESPACE=your-namespace envsubst < benchmark_job.yaml | kubectl apply -f -
+# Set environment variables for your deployment
+export NAMESPACE=benchmarking
+export MODEL_NAME=Qwen/Qwen3-0.6B
+export INPUT_NAME=qwen-vllm-agg
+export SERVICE_URL=vllm-agg-frontend:8000
+export DOCKER_IMAGE=nvcr.io/nvidian/dynamo-dev/vllm-runtime:dyn-973.0
+
+# Deploy the benchmark job
+envsubst < benchmark_job.yaml | kubectl apply -f -
 
 # Monitor the job
-kubectl logs -f job/dynamo-benchmark -n your-namespace
+kubectl logs -f job/dynamo-benchmark -n $NAMESPACE
 
 # Check job status
-kubectl get jobs -n your-namespace
+kubectl get jobs -n $NAMESPACE
+```
+
+**Option B: One-liner deployment**
+```bash
+NAMESPACE=benchmarking MODEL_NAME=Qwen/Qwen3-0.6B INPUT_NAME=qwen-vllm-agg SERVICE_URL=vllm-agg-frontend:8000 DOCKER_IMAGE=nvcr.io/nvidian/dynamo-dev/vllm-runtime:dyn-973.0 envsubst < benchmark_job.yaml | kubectl apply -f -
 ```
 
 ### Step 3: Retrieve Results
 ```bash
 # Download results from PVC (recommended)
 python3 -m deploy.utils.download_pvc_results \
-  --namespace your-namespace \
+  --namespace $NAMESPACE \
   --output-dir ./benchmark_results \
   --folder /data/results \
   --no-config
 
 # Alternative: Copy results directly (requires pod name)
-kubectl cp <pod-name>:/data/results ./benchmark_results -n your-namespace
+kubectl cp <pod-name>:/data/results ./benchmark_results -n $NAMESPACE
 ```
 
 ## Configuration
 
-The job manifest uses these default parameters:
-- **Model**: `Qwen/Qwen3-0.6B`
-- **Input sequence length**: 2000 tokens
-- **Output sequence length**: 256 tokens
-- **Input**: `dsr1=${NAMESPACE}-dsr1-frontend:8000` (internal service URL)
-
-### Customizing the Job Manifest
-
-Edit `benchmark_job.yaml` to modify:
-
-```yaml
-# Change model
-args:
-  - --model
-  - "meta-llama/Meta-Llama-3-8B"
-
-# Change sequence lengths
-args:
-  - --isl
-  - "1500"
-  - --osl
-  - "200"
-
-# Change input service
-args:
-  - --input
-  - my-service=${NAMESPACE}-my-service:8000
-```
+The benchmark job is fully configurable through environment variables:
+
+### Required Environment Variables
+
+- **NAMESPACE**: Kubernetes namespace where the benchmark will run
+- **MODEL_NAME**: Hugging Face model identifier (e.g., `Qwen/Qwen3-0.6B`)
+- **INPUT_NAME**: Name identifier for the benchmark input (e.g., `qwen-agg`)
+- **SERVICE_URL**: Internal service URL for the DynamoGraphDeployment frontend
+- **DOCKER_IMAGE**: Docker image containing the Dynamo benchmarking tools
 
 ## Understanding Your Results
 
@@ -118,26 +112,26 @@ Results are stored in `/data/results` and follow the same structure as local ben
 
 ### Check Job Status
 ```bash
-kubectl get jobs -n <namespace>
-kubectl describe job dynamo-benchmark -n <namespace>
+kubectl get jobs -n $NAMESPACE
+kubectl describe job dynamo-benchmark -n $NAMESPACE
 ```
 
 ### View Logs
 ```bash
 # Follow logs in real-time
-kubectl logs -f job/dynamo-benchmark -n <namespace>
+kubectl logs -f job/dynamo-benchmark -n $NAMESPACE
 
 # Get logs from specific container
-kubectl logs job/dynamo-benchmark -c benchmark-runner -n <namespace>
+kubectl logs job/dynamo-benchmark -c benchmark-runner -n $NAMESPACE
 ```
 
 ### Debug Failed Jobs
 ```bash
 # Check pod status
-kubectl get pods -n <namespace> -l job-name=dynamo-benchmark
+kubectl get pods -n $NAMESPACE -l job-name=dynamo-benchmark
 
 # Describe failed pod
-kubectl describe pod <pod-name> -n <namespace>
+kubectl describe pod <pod-name> -n $NAMESPACE
 ```
 
 ## Comparison with Local Benchmarking
@@ -171,11 +165,14 @@ The in-cluster approach is recommended for:
 
 ```bash
 # Check PVC status
-kubectl get pvc dynamo-pvc -n <namespace>
+kubectl get pvc dynamo-pvc -n $NAMESPACE
 
 # Verify service account
-kubectl get sa dynamo-sa -n <namespace>
+kubectl get sa dynamo-sa -n $NAMESPACE
 
 # Check service endpoints
-kubectl get svc -n <namespace>
+kubectl get svc -n $NAMESPACE
+
+# Verify your service URL is accessible
+kubectl get svc $SERVICE_URL -n $NAMESPACE
 ```
diff --git a/benchmarks/incluster/benchmark_job.yaml b/benchmarks/incluster/benchmark_job.yaml
@@ -14,7 +14,7 @@ spec:
       - name: docker-imagepullsecret
       containers:
       - name: benchmark-runner
-        image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.0
+        image: ${DOCKER_IMAGE}
         resources:
           requests:
             cpu: "4"
@@ -35,7 +35,7 @@ spec:
         command: ["python3", "-m", "benchmarks.utils.benchmark"]
         args:
           - --model
-          - deepseek-ai/DeepSeek-R1
+          - ${MODEL_NAME}
           - --isl
           - "2000"
           - --std
@@ -45,7 +45,7 @@ spec:
           - --output-dir
           - /data/results
           - --input
-          - dsr1=${NAMESPACE}-sgl-dsr1-8gpu-frontend:8000
+          - ${INPUT_NAME}=${SERVICE_URL}
         volumeMounts:
           - name: data-volume
             mountPath: /data