Skip to content

Commit f41c84d

Browse files
kouroshHakhaYoussefEssDS
authored andcommitted
[docs][serve][llm] added touch ups (ray-project#58406)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
1 parent 63567a3 commit f41c84d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

doc/source/serve/llm/architecture/serving-patterns/prefill-decode.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
(serve-llm-architecture-prefill-decode)=
22
# Prefill-decode disaggregation
33

4-
Prefill-decode (PD) disaggregation is a serving pattern that separates the prefill phase (processing input prompts) from the decode phase (generating tokens). This pattern optimizes resource utilization by scaling each phase independently based on its specific requirements.
4+
Prefill-decode (PD) disaggregation is a serving pattern that separates the prefill phase (processing input prompts) from the decode phase (generating tokens). This pattern was first pioneered in [DistServe](https://hao-ai-lab.github.io/blogs/distserve/) and optimizes resource utilization by scaling each phase independently based on its specific requirements.
55

66
## Architecture overview
77

0 commit comments

Comments
 (0)