vllm-project
diff --git a/‎.markdownlint.yaml‎
Lines changed: 0 additions & 1 deletion b/‎.markdownlint.yaml‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/api/README.md‎
Lines changed: 1 addition & 7 deletions b/‎docs/api/README.md‎
Lines changed: 1 addition & 7 deletions
diff --git a/‎docs/configuration/README.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/configuration/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/configuration/optimization.md‎
Lines changed: 0 additions & 2 deletions b/‎docs/configuration/optimization.md‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎docs/contributing/benchmarks.md‎
Lines changed: 2 additions & 6 deletions b/‎docs/contributing/benchmarks.md‎
Lines changed: 2 additions & 6 deletions
diff --git a/‎docs/contributing/model/README.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/contributing/model/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/contributing/model/registration.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/contributing/model/registration.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/contributing/model/tests.md‎
Lines changed: 0 additions & 2 deletions b/‎docs/contributing/model/tests.md‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎docs/deployment/docker.md‎
Lines changed: 0 additions & 4 deletions b/‎docs/deployment/docker.md‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎docs/deployment/frameworks/anyscale.md‎
Lines changed: 0 additions & 2 deletions b/‎docs/deployment/frameworks/anyscale.md‎
Lines changed: 0 additions & 2 deletions
@@ -4,7 +4,6 @@ MD013: false
 MD024:
   siblings_only: true
 MD033: false
-MD042: false
 MD045: false
 MD046: false
 MD051: false
 
@@ -20,8 +20,6 @@ API documentation for vLLM's configuration classes.
 - [vllm.config.CompilationConfig][]
 - [vllm.config.VllmConfig][]
 
-[](){ #offline-inference-api }
-
 ## Offline Inference
 
 LLM Class.
@@ -45,18 +43,14 @@ Engine classes for offline and online inference.
 
 Inference parameters for vLLM APIs.
 
-[](){ #sampling-params }
-
 - [vllm.SamplingParams][]
 - [vllm.PoolingParams][]
 
-[](){ #multi-modality }
-
 ## Multi-Modality
 
 vLLM provides experimental support for multi-modal models through the [vllm.multimodal][] package.
 
-Multi-modal inputs can be passed alongside text and token prompts to [supported models][supported-mm-models]
+Multi-modal inputs can be passed alongside text and token prompts to [supported models](../models/supported_models.md#list-of-multimodal-language-models)
 via the `multi_modal_data` field in [vllm.inputs.PromptType][].
 
 Looking to add your own multi-modal model? Please follow the instructions listed [here](../contributing/model/multimodal.md).
 
@@ -4,6 +4,6 @@ This section lists the most common options for running vLLM.
 
 There are three main levels of configuration, from highest priority to lowest priority:
 
-- [Request parameters][completions-api] and [input arguments][sampling-params]
+- [Request parameters](../serving/openai_compatible_server.md#completions-api) and [input arguments](../api/README.md#inference-parameters)
 - [Engine arguments](./engine_args.md)
 - [Environment variables](./env_vars.md)
@@ -27,8 +27,6 @@ You can monitor the number of preemption requests through Prometheus metrics exp
 
 In vLLM V1, the default preemption mode is `RECOMPUTE` rather than `SWAP`, as recomputation has lower overhead in the V1 architecture.
 
-[](){ #chunked-prefill }
-
 ## Chunked Prefill
 
 Chunked prefill allows vLLM to process large prefills in smaller chunks and batch them together with decode requests. This feature helps improve both throughput and latency by better balancing compute-bound (prefill) and memory-bound (decode) operations.
 
@@ -7,8 +7,8 @@ toc_depth: 4
 vLLM provides comprehensive benchmarking tools for performance testing and evaluation:
 
 - **[Benchmark CLI]**: `vllm bench` CLI tools and specialized benchmark scripts for interactive performance testing
-- **[Performance benchmarks][performance-benchmarks]**: Automated CI benchmarks for development
-- **[Nightly benchmarks][nightly-benchmarks]**: Comparative benchmarks against alternatives
+- **[Performance benchmarks](#performance-benchmarks)**: Automated CI benchmarks for development
+- **[Nightly benchmarks](#nightly-benchmarks)**: Comparative benchmarks against alternatives
 
 [Benchmark CLI]: #benchmark-cli
 
@@ -924,8 +924,6 @@ throughput numbers correctly is also adjusted.
 
 </details>
 
-[](){ #performance-benchmarks }
-
 ## Performance Benchmarks
 
 The performance benchmarks are used for development to confirm whether new changes improve performance under various workloads. They are triggered on every commit with both the `perf-benchmarks` and `ready` labels, and when a PR is merged into vLLM.
@@ -988,8 +986,6 @@ The benchmarking currently runs on a predefined set of models configured in the
 
 All continuous benchmarking results are automatically published to the public [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm).
 
-[](){ #nightly-benchmarks }
-
 ## Nightly Benchmarks
 
 These compare vLLM's performance against alternatives (`tgi`, `trt-llm`, and `lmdeploy`) when there are major updates of vLLM (e.g., bumping up to a new version). They are primarily intended for consumers to evaluate when to choose vLLM over other options and are triggered on every commit with both the `perf-benchmarks` and `nightly-benchmarks` labels.
 
@@ -1,7 +1,7 @@
 # Summary
 
 !!! important
-    Many decoder language models can now be automatically loaded using the [Transformers backend][transformers-backend] without having to implement them in vLLM. See if `vllm serve <model>` works first!
+    Many decoder language models can now be automatically loaded using the [Transformers backend](../../models/supported_models.md#transformers) without having to implement them in vLLM. See if `vllm serve <model>` works first!
 
 vLLM models are specialized [PyTorch](https://pytorch.org/) models that take advantage of various [features](../../features/README.md#compatibility-matrix) to optimize their performance.
 
 
@@ -8,7 +8,7 @@ This page provides detailed instructions on how to do so.
 
 ## Built-in models
 
-To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source][build-from-source].
+To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](../../getting_started/installation/gpu.md#build-wheel-from-source).
 This gives you the ability to modify the codebase and test your model.
 
 After you have implemented your model (see [tutorial](basic.md)), put it into the [vllm/model_executor/models](../../../vllm/model_executor/models) directory.
 
@@ -39,8 +39,6 @@ For [generative models](../../models/generative_models.md), there are two levels
 
 For [pooling models](../../models/pooling_models.md), we simply check the cosine similarity, as defined in [tests/models/utils.py](../../../tests/models/utils.py).
 
-[](){ #mm-processing-tests }
-
 ### Multi-modal processing
 
 #### Common tests
 
@@ -1,7 +1,5 @@
 # Using Docker
 
-[](){ #deployment-docker-pre-built-image }
-
 ## Use vLLM's Official Docker Image
 
 vLLM offers an official Docker image for deployment.
@@ -62,8 +60,6 @@ You can add any other [engine-args](../configuration/engine_args.md) you need af
     RUN uv pip install --system git+https://github.com/huggingface/transformers.git
     ```
 
-[](){ #deployment-docker-build-image-from-source }
-
 ## Building vLLM's Docker Image from Source
 
 You can build and run vLLM from source via the provided [docker/Dockerfile](../../docker/Dockerfile). To build vLLM:
 
@@ -1,7 +1,5 @@
 # Anyscale
 
-[](){ #deployment-anyscale }
-
 [Anyscale](https://www.anyscale.com) is a managed, multi-cloud platform developed by the creators of Ray.
 
 Anyscale automates the entire lifecycle of Ray clusters in your AWS, GCP, or Azure account, delivering the flexibility of open-source Ray