Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | TensorRT-LLM | Notes |
|---------|--------------|-------|
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |
| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | Planned |

### Large Scale P/D and WideEP Features

Expand Down Expand Up @@ -180,14 +180,14 @@ Below we provide a selected list of advanced examples. Please open up an issue i

### Multinode Deployment

For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../docs/components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../docs/components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.

### Speculative Decoding
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../docs/components/backends/trtllm/llama4_plus_eagle.md)**
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../components/backends/trtllm/llama4_plus_eagle.md)**

### Kubernetes Deployment

For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../docs/components/backends/trtllm/deploy/README.md)
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../components/backends/trtllm/deploy/README.md)

### Client

Expand Down Expand Up @@ -216,7 +216,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh

## KV Cache Transfer in Disaggregated Serving

Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../docs/components/backends/trtllm/kv-cache-tranfer.md).
Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../components/backends/trtllm/kv-cache-tranfer.md).

## Request Migration

Expand Down
1 change: 0 additions & 1 deletion docs/examples/runtime/hello_world/README.md

This file was deleted.

119 changes: 119 additions & 0 deletions docs/examples/runtime/hello_world/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Hello World Example

This is the simplest Dynamo example demonstrating a basic service using Dynamo's distributed runtime. It showcases the fundamental concepts of creating endpoints and workers in the Dynamo runtime system.

## Architecture

```text
Client (dynamo_worker)
┌─────────────┐
│ Backend │ Dynamo endpoint (/generate)
└─────────────┘
```

## Components

- **Backend**: A Dynamo service with an endpoint that receives text input and streams back greetings for each comma-separated word
- **Client**: A Dynamo worker that connects to and sends requests to the backend service, then prints out the response

## Implementation Details

The example demonstrates:

- **Endpoint Definition**: Using the `@dynamo_endpoint` decorator to create streaming endpoints
- **Worker Setup**: Using the `@dynamo_worker()` decorator to create distributed runtime workers
- **Service Creation**: Creating services and endpoints using the distributed runtime API
- **Streaming Responses**: Yielding data for real-time streaming
- **Client Integration**: Connecting to services and processing streams
- **Logging**: Basic logging configuration with `configure_dynamo_logging`

## Getting Started

## Prerequisites

Before running this example, ensure you have the following services running:

- **etcd**: A distributed key-value store used for service discovery and metadata storage
- **NATS**: A high-performance message broker for inter-component communication

You can start these services using Docker Compose:

```bash
# clone the dynamo repository if necessary
# git clone https://github.com/ai-dynamo/dynamo.git
cd dynamo
docker compose -f deploy/docker-compose.yml up -d
```

### Running the Example

First, start the backend service:
```bash
cd examples/runtime/hello_world
python hello_world.py
```

Second, in a separate terminal, run the client:
```bash
cd examples/runtime/hello_world
python client.py
```

The client will connect to the backend service and print the streaming results.

### Expected Output

When running the client, you should see streaming output like:
```text
Hello world!
Hello sun!
Hello moon!
Hello star!
```

## Code Structure

### Backend Service (`hello_world.py`)

- **`content_generator`**: A dynamo endpoint that processes text input and yields greetings
- **`worker`**: A dynamo worker that sets up the service, creates the endpoint, and serves it

### Client (`client.py`)

- **`worker`**: A dynamo worker that connects to the backend service and processes the streaming response

## Deployment to Kubernetes

Follow the [Quickstart Guide](../../../guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
Then deploy to kubernetes using

```bash
export NAMESPACE=<your-namespace>
cd dynamo
kubectl apply -f examples/runtime/hello_world/deploy/hello_world.yaml -n ${NAMESPACE}
```

to delete your deployment:

```bash
kubectl delete dynamographdeployment hello-world -n ${NAMESPACE}
```
1 change: 0 additions & 1 deletion docs/hidden_toctree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@
components/backends/trtllm/llama4_plus_eagle.md
components/backends/trtllm/multinode-examples.md
components/backends/trtllm/kv-cache-tranfer.md
components/backends/vllm/deepseek-r1.md
components/backends/vllm/deploy/README.md
components/backends/vllm/multi-node.md

1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,6 @@ The examples below assume you build the latest image yourself from source. If us
Writing Python Workers in Dynamo <guides/backend.md>
Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
Configuring Metrics for Observability <guides/metrics.md>

.. toctree::
:hidden:
Expand Down
119 changes: 0 additions & 119 deletions examples/runtime/hello_world/README.md

This file was deleted.

1 change: 1 addition & 0 deletions examples/runtime/hello_world/README.md
Loading