Skip to content

Commit bd71f35

Browse files
committed
feat(frontend): python frontend.py alternative to dynamo-run
No need to build or install the `dynamo-run` Rust binary. This will have the same performance as `dynamo-run`, it uses the Rust library. In time we could split http server, pre-processor and router into separate bindings, and call them here, allow easy customization.
1 parent d975761 commit bd71f35

File tree

2 files changed

+63
-0
lines changed

2 files changed

+63
-0
lines changed

components/frontend/README

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Dynamo ingress / frontend node.
2+
3+
This runs an OpenAI compliant HTTP server, a pre-processor, and a router in a single process. Engines / workers are auto-discovered when they call `register_llm`.
4+
5+
Requires `etcd` and `nats-server -js`.
6+

components/frontend/main.py

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# Start a frontend node. This runs:
5+
# - OpenAI HTTP server.
6+
# - Auto-discovery: Watches etcd for engine/worker registration (via `register_llm`).
7+
# - Pre-processor: Prompt templating and tokenization.
8+
# - Router, defaulting to round-robin (TODO: Add flags to enable KV routing).
9+
10+
import argparse
11+
import asyncio
12+
13+
import uvloop
14+
15+
from dynamo.llm import EngineType, EntrypointArgs, make_engine, run_input
16+
from dynamo.runtime import DistributedRuntime
17+
18+
19+
def parse_args():
20+
parser = argparse.ArgumentParser(
21+
description="Dynamo Frontend: HTTP+Pre-processor+Router",
22+
formatter_class=argparse.RawTextHelpFormatter, # To preserve multi-line help formatting
23+
)
24+
parser.add_argument(
25+
"--kv-cache-block-size", type=int, help="KV cache block size (u32)."
26+
)
27+
parser.add_argument(
28+
"--http-port", type=int, default=8080, help="HTTP port for the engine (u16)."
29+
)
30+
flags = parser.parse_args()
31+
32+
kwargs = {}
33+
if flags.http_port is not None:
34+
kwargs["http_port"] = flags.http_port
35+
if flags.kv_cache_block_size is not None:
36+
kwargs["kv_cache_block_size"] = flags.kv_cache_block_size
37+
38+
return kwargs
39+
40+
41+
async def main():
42+
runtime = DistributedRuntime(asyncio.get_running_loop(), False)
43+
flags = parse_args()
44+
45+
# out=dyn
46+
e = EntrypointArgs(EngineType.Dynamic, **flags)
47+
engine = await make_engine(runtime, e)
48+
49+
# in=http
50+
try:
51+
await run_input(runtime, "http", engine)
52+
except asyncio.exceptions.CancelledError:
53+
pass
54+
55+
56+
if __name__ == "__main__":
57+
uvloop.run(main())

0 commit comments

Comments
 (0)