From 1117b1952cbe5a279f2eb86bd5f0c5b17c903ca4 Mon Sep 17 00:00:00 2001 From: Kourosh Hakhamaneshi Date: Tue, 4 Nov 2025 21:08:09 -0800 Subject: [PATCH] wip Signed-off-by: Kourosh Hakhamaneshi --- .../serve/llm/architecture/serving-patterns/prefill-decode.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/serve/llm/architecture/serving-patterns/prefill-decode.md b/doc/source/serve/llm/architecture/serving-patterns/prefill-decode.md index 49a51523be24..e426badb80d3 100644 --- a/doc/source/serve/llm/architecture/serving-patterns/prefill-decode.md +++ b/doc/source/serve/llm/architecture/serving-patterns/prefill-decode.md @@ -1,7 +1,7 @@ (serve-llm-architecture-prefill-decode)= # Prefill-decode disaggregation -Prefill-decode (PD) disaggregation is a serving pattern that separates the prefill phase (processing input prompts) from the decode phase (generating tokens). This pattern optimizes resource utilization by scaling each phase independently based on its specific requirements. +Prefill-decode (PD) disaggregation is a serving pattern that separates the prefill phase (processing input prompts) from the decode phase (generating tokens). This pattern was first pioneered in [DistServe](https://hao-ai-lab.github.io/blogs/distserve/) and optimizes resource utilization by scaling each phase independently based on its specific requirements. ## Architecture overview