Skip to content

Commit 52915af

Browse files
authored
[docs][serve][llm] added touch ups (#58406)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
1 parent 693c021 commit 52915af

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

doc/source/serve/llm/architecture/serving-patterns/prefill-decode.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
(serve-llm-architecture-prefill-decode)=
22
# Prefill-decode disaggregation
33

4-
Prefill-decode (PD) disaggregation is a serving pattern that separates the prefill phase (processing input prompts) from the decode phase (generating tokens). This pattern optimizes resource utilization by scaling each phase independently based on its specific requirements.
4+
Prefill-decode (PD) disaggregation is a serving pattern that separates the prefill phase (processing input prompts) from the decode phase (generating tokens). This pattern was first pioneered in [DistServe](https://hao-ai-lab.github.io/blogs/distserve/) and optimizes resource utilization by scaling each phase independently based on its specific requirements.
55

66
## Architecture overview
77

0 commit comments

Comments
 (0)