ai-dynamo
diff --git a/‎.github/workflows/docs-link-check.yml‎
Lines changed: 61 additions & 0 deletions b/‎.github/workflows/docs-link-check.yml‎
Lines changed: 61 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎benchmarks/llm/README.md‎
Lines changed: 1 addition & 1 deletion b/‎benchmarks/llm/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎components/README.md‎
Lines changed: 1 addition & 1 deletion b/‎components/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎components/backends/sglang/README.md‎
Lines changed: 1 addition & 1 deletion b/‎components/backends/sglang/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎components/backends/sglang/deploy/README.md‎
Lines changed: 2 additions & 2 deletions b/‎components/backends/sglang/deploy/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎components/backends/sglang/docs/dsr1-wideep-h100.md‎
Lines changed: 1 addition & 1 deletion b/‎components/backends/sglang/docs/dsr1-wideep-h100.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎components/backends/sglang/slurm_jobs/README.md‎
Lines changed: 3 additions & 3 deletions b/‎components/backends/sglang/slurm_jobs/README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎components/backends/trtllm/README.md‎
Lines changed: 3 additions & 3 deletions b/‎components/backends/trtllm/README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎components/backends/trtllm/deploy/README.md‎
Lines changed: 2 additions & 2 deletions b/‎components/backends/trtllm/deploy/README.md‎
Lines changed: 2 additions & 2 deletions
@@ -0,0 +1,61 @@
+name: Docs link check
+
+on:
+  push:
+    branches:
+    - main
+  pull_request:
+
+permissions:
+  contents: read
+
+jobs:
+  lychee:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v4
+
+      # Cache lychee results (e.g. to avoid hitting rate limits)
+      # https://lychee.cli.rs/github_action_recipes/caching/
+      - name: Restore lychee cache
+        uses: actions/cache@v4
+        with:
+          path: .lycheecache
+          key: cache-lychee-${{ github.sha }}
+          restore-keys: cache-lychee-
+
+      # https://github.com/lycheeverse/lychee/issues/1487
+      - name: Install CA Certificates for lychee
+        run: |
+          sudo apt-get install ca-certificates
+
+      - name: Install lychee
+        run: |
+          set -euo pipefail
+          mkdir -p "$HOME/.local/bin"
+          cd "$RUNNER_TEMP"
+          # TODO: Lychee v0.19.1 doesn't support regex in --exclude-path, so use nightly
+          # release until there is a released version containing regex support.
+          curl -sSL -o lychee.tar.gz \
+            https://github.com/lycheeverse/lychee/releases/download/nightly/lychee-x86_64-unknown-linux-gnu.tar.gz
+          tar -xzf lychee.tar.gz
+          BIN_PATH=$(find . -maxdepth 2 -type f -name lychee | head -n1)
+          install -m 0755 "$BIN_PATH" "$HOME/.local/bin/lychee"
+          echo "$HOME/.local/bin" >> "$GITHUB_PATH"
+          lychee --version
+
+      - name: Check documentation links with lychee
+        env:
+          # Set GITHUB_TOKEN to avoid github rate limits on URL checks
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          set -euo pipefail
+          # Run lychee against all files in repo
+          lychee \
+            --cache \
+            --no-progress \
+            --exclude-path "ATTRIBUTIONS.*" \
+            --accept "200..=299, 403, 429" \
+            --exclude-all-private --exclude 0.0.0.0 \
+            .
@@ -183,7 +183,7 @@ Run the backend/worker like this:
 python -m dynamo.sglang.worker --help
 ```
 
-You can pass any sglang flags directly to this worker, see https://docs.sglang.ai/backend/server_arguments.html . See there to use multiple GPUs.
+You can pass any sglang flags directly to this worker, see https://docs.sglang.ai/advanced_features/server_arguments.html . See there to use multiple GPUs.
 
 ## TensorRT-LLM
 
 
@@ -12,4 +12,4 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-[../../examples/llm/benchmarks/README.md](../../examples/llm/benchmarks/README.md)
+Coming soon.
@@ -77,4 +77,4 @@ To get started with Dynamo components:
 4. **Run deployment scripts** from the engine's launch directory
 5. **Monitor performance** using the metrics component
 
-For detailed instructions, see the README files in each component directory and the main [Dynamo documentation](../../docs/).
+For detailed instructions, see the README files in each component directory and the main [Dynamo documentation](../docs/).
@@ -52,7 +52,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 ## Quick Start
 
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. See our different [architectures](../llm/README.md#deployment-architectures) for a high level overview of each pattern and the architecture diagram for each.
+Below we provide a guide that lets you run all of our common deployment patterns on a single node.
 
 ### Start NATS and ETCD in the background
 
 
@@ -74,7 +74,7 @@ extraPodSpec:
 
 Before using these templates, ensure you have:
 
-1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../docs/guides/dynamo_deploy/dynamo_cloud.md)
+1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
 2. **Kubernetes cluster with GPU support**
 3. **Container registry access** for SGLang runtime images
 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
@@ -159,4 +159,4 @@ Common issues and solutions:
 3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
 4. **Out of memory**: Increase memory limits or reduce model batch size
 
-For additional support, refer to the [deployment troubleshooting guide](../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
+For additional support, refer to the [deployment guide](../../../../docs/guides/dynamo_deploy/quickstart.md).
@@ -5,7 +5,7 @@ SPDX-License-Identifier: Apache-2.0
 
 # Running DeepSeek-R1 Disaggregated with WideEP on H100s
 
-Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
+Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
 
 ## Instructions
 
 
@@ -1,10 +1,10 @@
 # Example: Deploy Multi-node SGLang with Dynamo on SLURM
 
-This folder implements the example of [SGLang DeepSeek-R1 Disaggregated with WideEP](../dsr1-wideep.md) on a SLURM cluster.
+This folder implements the example of [SGLang DeepSeek-R1 Disaggregated with WideEP](../docs/dsr1-wideep-h100.md) on a SLURM cluster.
 
 ## Overview
 
-The scripts in this folder set up multiple cluster nodes to run the [SGLang DeepSeek-R1 Disaggregated with WideEP](../dsr1-wideep.md) example, with separate nodes handling prefill and decode.
+The scripts in this folder set up multiple cluster nodes to run the [SGLang DeepSeek-R1 Disaggregated with WideEP](../docs/dsr1-wideep-h100.md) example, with separate nodes handling prefill and decode.
 The node setup is done using Python job submission scripts with Jinja2 templates for flexible configuration. The setup also includes GPU utilization monitoring capabilities to track performance during benchmarks.
 
 ## Scripts
@@ -57,7 +57,7 @@ For simplicity of the example, we will make some assumptions about your SLURM cl
    If your cluster supports similar container based plugins, you may be able to
    modify the template to use that instead.
 3. We assume you have already built a recent Dynamo+SGLang container image as
-   described [here](../dsr1-wideep.md#instructions).
+   described [here](../docs/dsr1-wideep-h100.md#instructions).
    This is the image that can be passed to the `--container-image` argument in later steps.
 
 ## Usage
 
@@ -193,7 +193,7 @@ For complete Kubernetes deployment instructions, configurations, and troubleshoo
 
 ### Client
 
-See [client](../llm/README.md#client) section to learn how to send request to the deployment.
+See [client](../sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment.
 
 NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
 
@@ -218,7 +218,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh
 
 ## KV Cache Transfer in Disaggregated Serving
 
-Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-tranfer.md).
+Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-transfer.md).
 
 
 ## Request Migration
@@ -233,7 +233,7 @@ This allows a request to be migrated up to 3 times before failing. See the [Requ
 
 ## Client
 
-See [client](../llm/README.md#client) section to learn how to send request to the deployment.
+See [client](../sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment.
 
 NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
 
 
@@ -211,7 +211,7 @@ envs:
 
 ## Testing the Deployment
 
-Send a test request to verify your deployment. See the [client section](../../../../components/backends/llm/README.md#client) for detailed instructions.
+Send a test request to verify your deployment. See the [client section](../../../../components/backends/vllm/README.md#client) for detailed instructions.
 
 **Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`.
 
@@ -241,7 +241,7 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving
 - **UCX** (default): Standard method for KV cache transfer
 - **NIXL** (experimental): Alternative transfer method
 
-For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).
+For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).
 
 ## Request Migration