-
Notifications
You must be signed in to change notification settings - Fork 725
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the Bug
There seems to be an error on worker startup with the NATS service, on latest main.
I have seen it work occasionally, so likely a race condition.
Steps to Reproduce
Clear etcd. Restart nats. Run this:
python -m dynamo.sglang --model-path Qwen/Qwen3-0.6B
Gives this output.
2025-12-04T14:55:38.090064Z INFO main.init: Registering model with endpoint types: chat,completions
2025-12-04T14:55:38.090222Z DEBUG dynamo_runtime::local_endpoint_registry: Registering local endpoint: generate
2025-12-04T14:55:38.090253Z DEBUG dynamo_runtime::component::endpoint: Registered engine for endpoint 'generate' in local registry
2025-12-04T14:55:38.090457Z INFO register._get_runtime_config: Got total KV blocks from scheduler: 750229 (max_total_tokens=750229, page_size=1)
2025-12-04T14:55:38.090712Z DEBUG dynamo_runtime::component::endpoint: Starting endpoint: dynamo/backend/generate
2025-12-04T14:55:38.098016Z DEBUG dynamo_llm::local_model: Registering MDC at path: dynamo/backend/generate/694d9adb9d05322d
2025-12-04T14:55:38.098130Z DEBUG dynamo_runtime::discovery::kv_store: KVStoreDiscovery::register: Registering base model instance_id=7587891215212032557, namespace=dynamo, component=backend, endpoint=generate, key=dynamo/backend/generate/694d9adb9d05322d
2025-12-04T14:55:38.098188Z DEBUG dynamo_runtime::discovery::kv_store: KVStoreDiscovery::register: Serialized instance to 1584 bytes for key=dynamo/backend/generate/694d9adb9d05322d
2025-12-04T14:55:38.098196Z DEBUG dynamo_runtime::discovery::kv_store: KVStoreDiscovery::register: Getting/creating bucket=v1/mdc for key=dynamo/backend/generate/694d9adb9d05322d
2025-12-04T14:55:38.098212Z DEBUG dynamo_runtime::discovery::kv_store: KVStoreDiscovery::register: Inserting into bucket=v1/mdc, key=dynamo/backend/generate/694d9adb9d05322d
2025-12-04T14:55:38.098543Z INFO dynamo_llm::kv_router::publisher: Registered KvStats Prometheus metrics
2025-12-04T14:55:38.098540Z INFO dynamo_runtime::component::endpoint: Endpoint starting with request plane mode: nats
2025-12-04T14:55:38.098661Z DEBUG dynamo_runtime::component::endpoint: Registering endpoint health check target endpoint_name=generate
2025-12-04T14:55:38.098802Z DEBUG dynamo_runtime::component::endpoint: Registering endpoint 'generate' with graceful shutdown tracker
2025-12-04T14:55:38.098846Z DEBUG dynamo_runtime::utils::graceful_shutdown: Endpoint registered, total active: 0 -> 1
2025-12-04T14:55:38.098897Z INFO dynamo_runtime::pipeline::network::manager: Creating NATS request plane server
2025-12-04T14:55:38.098991Z INFO dynamo_runtime::component::endpoint: Registering endpoint with request plane server endpoint=generate transport="nats"
2025-12-04T14:55:38.099046Z INFO dynamo_runtime::pipeline::network::ingress::nats_server: NatsMultiplexedServer::register_endpoint called endpoint_name=generate namespace=dynamo component=backend instance_id=7587891215212032557
2025-12-04T14:55:38.099106Z DEBUG dynamo_runtime::pipeline::network::ingress::nats_server: Looking up service group in registry service_name_raw=dynamo_backend service_name=dynamo_backend
2025-12-04T14:55:38.099543Z ERROR main.init: Failed to serve endpoints: Service 'dynamo_backend' not found in registry
2025-12-04T14:55:38.099592Z INFO main.init: Metrics task succesfully cancelled
2025-12-04T14:55:38.100186Z INFO dynamo_runtime::distributed: Added NATS service dynamo_backend
2025-12-04T14:55:38.107282Z INFO decode_handler.cleanup: Engine shutdown
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/grahamk/src/dynamo/main/components/src/dynamo/sglang/__main__.py", line 7, in <module>
main()
File "/home/grahamk/src/dynamo/main/components/src/dynamo/sglang/main.py", line 530, in main
uvloop.run(worker())
File "/home/grahamk/venv/sglang-0.5.4/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/grahamk/venv/sglang-0.5.4/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
return await main
^^^^^^^^^^
File "/home/grahamk/src/dynamo/main/components/src/dynamo/sglang/main.py", line 97, in worker
await init(runtime, config)
File "/home/grahamk/src/dynamo/main/components/src/dynamo/sglang/main.py", line 175, in init
await asyncio.gather(
Exception: Service 'dynamo_backend' not found in registry
It's definitely NATS:
--store-kv etcd --request-plane tcpworks--store-kv file --request-plane natsfails
Metadata
Metadata
Labels
bugSomething isn't workingSomething isn't working