You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/dynamo_glossary.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,16 +11,12 @@
11
11
## D
12
12
**Decode Phase** - The second phase of LLM inference that generates output tokens one at a time.
13
13
14
-
**depends()** - A Dynamo function that creates dependencies between services, enabling automatic client generation and service discovery.
15
-
16
14
**Disaggregated Serving** - Dynamo's core architecture that separates prefill and decode phases into specialized engines to maximize GPU throughput and improve performance.
17
15
18
16
**Distributed Runtime** - Dynamo's Rust-based core system that manages service discovery, communication, and component lifecycle across distributed clusters.
19
17
20
18
**Dynamo** - NVIDIA's high-performance distributed inference framework for Large Language Models (LLMs) and generative AI models, designed for multinode environments with disaggregated serving and cache-aware routing.
21
19
22
-
**Dynamo Artifact** - A packaged archive containing an inference graph and its dependencies, created using `dynamo build`. It's the containerized, deployable version of a Graph.
23
-
24
20
**Dynamo Cloud** - A Kubernetes platform providing managed deployment experience for Dynamo inference graphs.
25
21
26
22
## E
@@ -80,5 +76,8 @@
80
76
## V
81
77
**vLLM** - High-throughput LLM serving engine with Ray distributed support and PagedAttention.
82
78
79
+
## W
80
+
**Wide Expert Parallelism (WideEP)** - Mixture-of-Experts deployment strategy that spreads experts across many GPUs (e.g., 64-way EP) so each GPU hosts only a few experts.
81
+
83
82
## X
84
83
**xPyD (x Prefill y Decode)** - Dynamo notation describing disaggregated serving configurations where x prefill workers serve y decode workers. Dynamo supports runtime-reconfigurable xPyD.
0 commit comments