From 1ed7d6ee0eeb2d709640279b815fd197f544f8be Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 23 May 2025 17:01:00 +0000 Subject: [PATCH 1/7] [Doc] Fix broken links and unlinked docs Signed-off-by: DarkLight1337 --- docs/contributing/model/tests.md | 4 ++-- docs/features/spec_decode.md | 2 +- docs/models/supported_models.md | 6 +++--- docs/serving/openai_compatible_server.md | 2 +- docs/{ => serving}/seed_parameter_behavior.md | 0 5 files changed, 7 insertions(+), 7 deletions(-) rename docs/{ => serving}/seed_parameter_behavior.md (100%) diff --git a/docs/contributing/model/tests.md b/docs/contributing/model/tests.md index 26880986181d..e538e36855a3 100644 --- a/docs/contributing/model/tests.md +++ b/docs/contributing/model/tests.md @@ -33,14 +33,14 @@ These tests compare the model outputs of vLLM against [HF Transformers](https:// #### Generative models -For [generative models][generative-models], there are two levels of correctness tests, as defined in : +For [generative models](../../models/generative_models.md), there are two levels of correctness tests, as defined in : - Exact correctness (`check_outputs_equal`): The text outputted by vLLM should exactly match the text outputted by HF. - Logprobs similarity (`check_logprobs_close`): The logprobs outputted by vLLM should be in the top-k logprobs outputted by HF, and vice versa. #### Pooling models -For [pooling models][pooling-models], we simply check the cosine similarity, as defined in . +For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in . [](){ #mm-processing-tests } diff --git a/docs/features/spec_decode.md b/docs/features/spec_decode.md index dce87c27896c..ee871823b078 100644 --- a/docs/features/spec_decode.md +++ b/docs/features/spec_decode.md @@ -170,7 +170,7 @@ A variety of speculative models of this type are available on HF hub: ## Speculating using EAGLE based draft models The following code configures vLLM to use speculative decoding where proposals are generated by -an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](). +an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https://arxiv.org/pdf/2401.15077) based draft model. A more detailed example for offline mode, including how to extract request level acceptance rate, can be found [here](gh-file:examples/offline_inference/eagle.py). ```python from vllm import LLM, SamplingParams diff --git a/docs/models/supported_models.md b/docs/models/supported_models.md index 416fe42fcb79..5a402ee88c61 100644 --- a/docs/models/supported_models.md +++ b/docs/models/supported_models.md @@ -3,7 +3,7 @@ title: Supported Models --- [](){ #supported-models } -vLLM supports [generative](generative-models) and [pooling](pooling-models) models across various tasks. +vLLM supports [generative](./generative_models.md) and [pooling](./pooling_models.md) models across various tasks. If a model supports more than one task, you can set the task via the `--task` argument. For each task, we list the model architectures that have been implemented in vLLM. @@ -376,7 +376,7 @@ Specified using `--task generate`. ### Pooling Models -See [this page](pooling-models) for more information on how to use pooling models. +See [this page](./pooling_models.md) for more information on how to use pooling models. !!! warning Since some model architectures support both generative and pooling tasks, @@ -628,7 +628,7 @@ Specified using `--task generate`. ### Pooling Models -See [this page](pooling-models) for more information on how to use pooling models. +See [this page](./pooling_models.md) for more information on how to use pooling models. !!! warning Since some model architectures support both generative and pooling tasks, diff --git a/docs/serving/openai_compatible_server.md b/docs/serving/openai_compatible_server.md index 27cb9310c516..012bddf3d9c9 100644 --- a/docs/serving/openai_compatible_server.md +++ b/docs/serving/openai_compatible_server.md @@ -5,7 +5,7 @@ title: OpenAI-Compatible Server vLLM provides an HTTP server that implements OpenAI's [Completions API](https://platform.openai.com/docs/api-reference/completions), [Chat API](https://platform.openai.com/docs/api-reference/chat), and more! This functionality lets you serve models and interact with them using an HTTP client. -In your terminal, you can [install](../getting_started/installation.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.) +In your terminal, you can [install](../getting_started/installation/README.md) vLLM, then start the server with the [`vllm serve`][serve-args] command. (You can also use our [Docker][deployment-docker] image.) ```bash vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 diff --git a/docs/seed_parameter_behavior.md b/docs/serving/seed_parameter_behavior.md similarity index 100% rename from docs/seed_parameter_behavior.md rename to docs/serving/seed_parameter_behavior.md From 20bc09dda6bb4fd1dab578c76f7b3bd614aab2b5 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 23 May 2025 17:01:45 +0000 Subject: [PATCH 2/7] Rename Signed-off-by: DarkLight1337 --- docs/serving/seed_parameter_behavior.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/serving/seed_parameter_behavior.md b/docs/serving/seed_parameter_behavior.md index ff17525cf8e2..301847292b83 100644 --- a/docs/serving/seed_parameter_behavior.md +++ b/docs/serving/seed_parameter_behavior.md @@ -1,4 +1,4 @@ -# Seed Parameter Behavior in vLLM +# Seed Parameter Behavior ## Overview From 8a9697034489168bca6fff7feaaf4ee44db0c76a Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 23 May 2025 17:02:33 +0000 Subject: [PATCH 3/7] Update title Signed-off-by: DarkLight1337 --- docs/.nav.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/.nav.yml b/docs/.nav.yml index c410b6b8223b..1c28823ca1eb 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -38,7 +38,7 @@ nav: - contributing/overview.md - glob: contributing/* flatten_single_child_sections: true - - contributing/model + - Model Implementation: contributing/model - Design Documents: - V0: design - V1: design/v1 From eb83501a376a3efe0b211fb0f35be0160039f9fd Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 23 May 2025 17:11:35 +0000 Subject: [PATCH 4/7] Fix link Signed-off-by: DarkLight1337 --- docs/contributing/model/tests.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing/model/tests.md b/docs/contributing/model/tests.md index e538e36855a3..67f8eda61dc5 100644 --- a/docs/contributing/model/tests.md +++ b/docs/contributing/model/tests.md @@ -40,7 +40,7 @@ For [generative models](../../models/generative_models.md), there are two levels #### Pooling models -For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in . +For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in . [](){ #mm-processing-tests } From a0f458c397a20ce27325eef585dd404de99c4ef5 Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 23 May 2025 17:13:32 +0000 Subject: [PATCH 5/7] Update Signed-off-by: DarkLight1337 --- docs/.nav.yml | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/.nav.yml b/docs/.nav.yml index 1c28823ca1eb..c9a10abe3e40 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -9,8 +9,12 @@ nav: - getting_started/examples/offline_inference - getting_started/examples/online_serving - getting_started/examples/other - - Roadmap: https://roadmap.vllm.ai - - Releases: https://github.com/vllm-project/vllm/releases + - User Guide: serving/offline_inference.md + - Developer Guide: contributing/overview.md + - API Reference: api/README.md + - News: + - Roadmap: https://roadmap.vllm.ai + - Releases: https://github.com/vllm-project/vllm/releases - User Guide: - Inference and Serving: - serving/offline_inference.md From b2c697bb691bb51a78e151110c7dd0b44433004d Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 23 May 2025 17:15:09 +0000 Subject: [PATCH 6/7] Rename Signed-off-by: DarkLight1337 --- docs/.nav.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/.nav.yml b/docs/.nav.yml index c9a10abe3e40..427733081c76 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -12,7 +12,7 @@ nav: - User Guide: serving/offline_inference.md - Developer Guide: contributing/overview.md - API Reference: api/README.md - - News: + - Timeline: - Roadmap: https://roadmap.vllm.ai - Releases: https://github.com/vllm-project/vllm/releases - User Guide: From 0f900ce0cf42473758590a23ad8c69d5c48b2efb Mon Sep 17 00:00:00 2001 From: DarkLight1337 Date: Fri, 23 May 2025 17:22:00 +0000 Subject: [PATCH 7/7] Add a layer of nesting Signed-off-by: DarkLight1337 --- docs/.nav.yml | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/.nav.yml b/docs/.nav.yml index 427733081c76..e2b0ed560700 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -9,9 +9,10 @@ nav: - getting_started/examples/offline_inference - getting_started/examples/online_serving - getting_started/examples/other - - User Guide: serving/offline_inference.md - - Developer Guide: contributing/overview.md - - API Reference: api/README.md + - Quick Links: + - User Guide: serving/offline_inference.md + - Developer Guide: contributing/overview.md + - API Reference: api/README.md - Timeline: - Roadmap: https://roadmap.vllm.ai - Releases: https://github.com/vllm-project/vllm/releases