Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
da2d299
temporary gitignore
hmellor May 12, 2025
6c861ca
first commit
hmellor May 12, 2025
af3b60d
Add missing requirement
hmellor May 13, 2025
12ec865
Handle list tables
hmellor May 13, 2025
58ee036
Show all examples
hmellor May 13, 2025
3f3572d
Change structure to look more like final structure
hmellor May 14, 2025
c9ee202
Update gitignore
hmellor May 14, 2025
4de42f1
Don't blindly copy all
hmellor May 14, 2025
40fdcec
Remove index pages which add nothing
hmellor May 14, 2025
8047a0a
Handle image blocks
hmellor May 14, 2025
4b4e177
Only transpile files with extensions
hmellor May 14, 2025
f85f172
Remove unneeded TOCs from index
hmellor May 14, 2025
19047ba
Remove another unneeded index
hmellor May 14, 2025
2509639
We don't do toctrees anymore
hmellor May 14, 2025
0a7c43d
Generate examples using mkdocs format
hmellor May 14, 2025
53f055f
Handle TOC
hmellor May 14, 2025
812bd8c
More nav improvements
hmellor May 14, 2025
32fab12
Mark image as handled
hmellor May 14, 2025
1df088a
Handle GitHub URL schemes
hmellor May 14, 2025
2188496
Fix RunLLM widget
hmellor May 14, 2025
42af380
Update `.readthedocs.yaml`
hmellor May 14, 2025
5fd0c33
Adjust `.nav.yml`
hmellor May 14, 2025
a4a125a
Enable emoji for GitHub links
hmellor May 14, 2025
65a3cc9
Tweak
hmellor May 14, 2025
1ded939
Use transpile docs as a hook
hmellor May 14, 2025
dd16e08
Merge branch 'main' into mkdocs
hmellor May 14, 2025
9e84fe2
Add missing requirement for transpiling
hmellor May 14, 2025
f46e2d5
Remove mystery indent
hmellor May 14, 2025
7a460ce
Fix styling for feature tables
hmellor May 14, 2025
50e4896
Fix front-matter titles
hmellor May 14, 2025
d6b9635
Make installation index a readme
hmellor May 14, 2025
215366c
Update themes
hmellor May 15, 2025
77c2554
Ignore template and inc in docs
hmellor May 15, 2025
ec284e7
Organise nav for getting started
hmellor May 15, 2025
f28c9f6
Use dataclass for blocks and handle includes
hmellor May 15, 2025
0961b55
Update snippet tags in docs
hmellor May 15, 2025
09819d3
Handle `alt` from images
hmellor May 15, 2025
5385eb6
Update installation readme
hmellor May 15, 2025
1c3c4ad
Remove custom CSS that's no longer used
hmellor May 15, 2025
ceb799f
Add warning for include with extra attrs
hmellor May 15, 2025
cb03d40
Finish literalinclude
hmellor May 15, 2025
bc70c9d
Support math and code blocks
hmellor May 15, 2025
11cf31a
Fix includes
hmellor May 15, 2025
1a7a691
Rename all `index` files to `README`
hmellor May 15, 2025
7c5539c
Handle toctrees
hmellor May 15, 2025
7a2ead2
Clean up diff
hmellor May 15, 2025
d388a24
Cleanup diff 2
hmellor May 15, 2025
a47f141
Remove unnecessary leading `./`
hmellor May 15, 2025
230d0cd
Document new docs building
hmellor May 15, 2025
192a551
Remove index stuff from `gen_examples.py`
hmellor May 15, 2025
f351fe2
Formatting
hmellor May 16, 2025
7a274d9
Add API reference
hmellor May 16, 2025
85d45a4
Remove inline MyST code syntax
hmellor May 16, 2025
02f3a65
Fix API ref readme
hmellor May 16, 2025
a44232a
Fix code cross references
hmellor May 16, 2025
974c4ab
Fix admonitions in docstrings
hmellor May 16, 2025
0f92d8f
Comment out unhandled blocks
hmellor May 16, 2025
302556d
Fix figure in LLMEngine.step
hmellor May 16, 2025
77e6d8e
Remove argparse from CLI ref (tell user to run --help)
hmellor May 16, 2025
4c16f60
Remove unnecessary section in top level readme
hmellor May 16, 2025
a5a45cc
Fix mkdocs anchor syntax
hmellor May 16, 2025
23f6e46
Merge branch 'main' into mkdocs
hmellor May 16, 2025
d643c4a
Fix engine args page
hmellor May 16, 2025
d8a4e90
Tweak links
hmellor May 16, 2025
dfa3c30
Add latest warning to announcement banner
hmellor May 16, 2025
68209e0
Fix LMCache capitalisation
hmellor May 17, 2025
f184c0c
Fix announcement bar
hmellor May 17, 2025
3643ce0
Fix on startup hook
hmellor May 17, 2025
7740e35
Enable some search features
hmellor May 17, 2025
0406086
Improve headings in API ref
hmellor May 17, 2025
6387b4c
Transpile twice, once to find all the explicit anchors, and once to u…
hmellor May 19, 2025
6429083
Improve API ref
hmellor May 19, 2025
916e087
Let transpiler handle these links
hmellor May 19, 2025
a85e0d0
Merge branch 'main' into mkdocs
hmellor May 19, 2025
cb9fcaa
Reduce search rank of API ref
hmellor May 19, 2025
49b2069
Reduce repetition in API docs
hmellor May 19, 2025
ec40428
Simplify `.nav.yml`
hmellor May 19, 2025
2b736ab
Workaround no longer needed
hmellor May 19, 2025
dc13d59
Restructure `.nav.yml`
hmellor May 19, 2025
b80c71c
Revert "Workaround no longer needed"
hmellor May 19, 2025
300cb81
Fix absolute image paths
hmellor May 19, 2025
5ce330a
Fix blog nav
hmellor May 20, 2025
99cba62
Fix URL scheme titles
hmellor May 20, 2025
f6853e0
Fix straggling `project:` links
hmellor May 20, 2025
bb469d2
Transpiler improvement
hmellor May 20, 2025
30eafd5
Merge branch 'main' into mkdocs
hmellor May 20, 2025
b35bdcd
Fix confusing headings in API ref
hmellor May 20, 2025
9e4196b
Tidy extra mkdocs files
hmellor May 20, 2025
ab3abed
Make API ref nav slightly better
hmellor May 20, 2025
6ff269f
Merge branch 'main' into mkdocs
hmellor May 20, 2025
22ee168
Commit transpile output
hmellor May 20, 2025
680562f
Remove transpile hook from config
hmellor May 20, 2025
7c2ce70
Fix gitignore for examples
hmellor May 20, 2025
4502dfb
Fix some whitespace from transpile
hmellor May 20, 2025
331664c
Fix url schemes
hmellor May 20, 2025
8a44a66
Make pre-commit happy
hmellor May 20, 2025
8d8cf0f
update title for home page in nav
hmellor May 20, 2025
596e07e
Fix double newline
hmellor May 20, 2025
dd66626
Merge branch 'main' into mkdocs
hmellor May 20, 2025
b4c2e75
Tabulate not needed now that we're not transpiling
hmellor May 20, 2025
0543aaa
Merge branch 'main' into mkdocs
hmellor May 21, 2025
c33a510
Review comments
hmellor May 21, 2025
07351e2
Fix pre-commit
hmellor May 21, 2025
94e88af
Update `Documentation Build`
hmellor May 21, 2025
1f81b6e
Merge branch 'main' into mkdocs
hmellor May 21, 2025
a67d1cc
Add FalconH1 back to supported models list
hmellor May 21, 2025
105370c
Revert change to Dockerfile
hmellor May 21, 2025
56923ad
Fix typo
hmellor May 21, 2025
982184b
Merge branch 'main' into mkdocs
hmellor May 22, 2025
29f7267
Docs build needs the examples too
hmellor May 22, 2025
08fa15c
Make pre-commit happy
hmellor May 22, 2025
06d9b72
Merge branch 'main' into mkdocs
hmellor May 22, 2025
c57a89d
Merge branch 'main' into mkdocs
hmellor May 22, 2025
fe24554
Merge branch 'main' into mkdocs
hmellor May 23, 2025
7e8c725
Merge branch 'main' into mkdocs
hmellor May 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,13 @@ steps:

- label: Documentation Build # 2min
mirror_hardwares: [amdexperimental]
working_dir: "/vllm-workspace/test_docs/docs"
working_dir: "/vllm-workspace/test_docs"
fast_check: true
no_gpu: True
commands:
- pip install -r ../../requirements/docs.txt
- SPHINXOPTS=\"-W\" make html
# Check API reference (if it fails, you may have missing mock imports)
- grep \"sig sig-object py\" build/html/api/vllm/vllm.sampling_params.html
- pip install -r ../requirements/docs.txt
# TODO: add `--strict` once warnings in docstrings are fixed
- mkdocs build

- label: Async Engine, Inputs, Utils, Worker Test # 24min
mirror_hardwares: [amdexperimental]
Expand Down
6 changes: 1 addition & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,6 @@ instance/
# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
docs/source/getting_started/examples/
docs/source/api/vllm

# PyBuilder
.pybuilder/
target/
Expand Down Expand Up @@ -151,6 +146,7 @@ venv.bak/

# mkdocs documentation
/site
docs/getting_started/examples

# mypy
.mypy_cache/
Expand Down
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ repos:
rev: v0.9.29
hooks:
- id: pymarkdown
exclude: '.*\.inc\.md'
args: [fix]
- repo: https://github.com/rhysd/actionlint
rev: v1.7.7
Expand Down
8 changes: 2 additions & 6 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,8 @@ build:
tools:
python: "3.12"

sphinx:
configuration: docs/source/conf.py
fail_on_warning: true

# If using Sphinx, optionally build your docs in additional formats such as PDF
formats: []
mkdocs:
configuration: mkdocs.yaml

# Optionally declare the Python requirements required to build your docs
python:
Expand Down
2 changes: 2 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,9 @@ COPY vllm/v1 /usr/local/lib/python3.12/dist-packages/vllm/v1
# will not be imported by other tests
RUN mkdir test_docs
RUN mv docs test_docs/
RUN cp -r examples test_docs/
RUN mv vllm test_docs/
RUN mv mkdocs.yaml test_docs/
#################### TEST IMAGE ####################

#################### OPENAI API SERVER ####################
Expand Down
51 changes: 51 additions & 0 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
nav:
- Home:
- vLLM: README.md
- Getting Started:
- getting_started/quickstart.md
- getting_started/installation
- Examples:
- LMCache: getting_started/examples/lmcache
- getting_started/examples/offline_inference
- getting_started/examples/online_serving
- getting_started/examples/other
- Roadmap: https://roadmap.vllm.ai
- Releases: https://github.com/vllm-project/vllm/releases
- User Guide:
- Inference and Serving:
- serving/offline_inference.md
- serving/openai_compatible_server.md
- serving/*
- serving/integrations
- Training: training
- Deployment:
- deployment/*
- deployment/frameworks
- deployment/integrations
- Performance: performance
- Models:
- models/supported_models.md
- models/generative_models.md
- models/pooling_models.md
- models/extensions
- Features:
- features/compatibility_matrix.md
- features/*
- features/quantization
- Other:
- getting_started/*
- Developer Guide:
- contributing/overview.md
- glob: contributing/*
flatten_single_child_sections: true
- contributing/model
- Design Documents:
- V0: design
- V1: design/v1
- API Reference:
- api/README.md
- glob: api/vllm/*
preserve_directory_names: true
- Community:
- community/*
- vLLM Blog: https://blog.vllm.ai
25 changes: 0 additions & 25 deletions docs/Makefile

This file was deleted.

93 changes: 50 additions & 43 deletions docs/README.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we preserve some instructions on how to contribute and test documentation changes after this migration to MkDocs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, information about building the docs has moved to https://docs.vllm.ai/en/latest/contributing/overview.html#building-the-docs

Original file line number Diff line number Diff line change
@@ -1,43 +1,50 @@
# vLLM documents

## Build the docs

- Make sure in `docs` directory

```bash
cd docs
```

- Install the dependencies:

```bash
pip install -r ../requirements/docs.txt
```

- Clean the previous build (optional but recommended):

```bash
make clean
```

- Generate the HTML documentation:

```bash
make html
```

## Open the docs with your browser

- Serve the documentation locally:

```bash
python -m http.server -d build/html/
```

This will start a local server at http://localhost:8000. You can now open your browser and view the documentation.

If port 8000 is already in use, you can specify a different port, for example:

```bash
python -m http.server 3000 -d build/html/
```
# Welcome to vLLM

<figure markdown="span">
![](./assets/logos/vllm-logo-text-light.png){ align="center" alt="vLLM" class="no-scaled-link" width="60%" }
</figure>

<p style="text-align:center">
<strong>Easy, fast, and cheap LLM serving for everyone
</strong>
</p>

<p style="text-align:center">
<script async defer src="https://buttons.github.io/buttons.js"></script>
<a class="github-button" href="https://github.com/vllm-project/vllm" data-show-count="true" data-size="large" aria-label="Star">Star</a>
<a class="github-button" href="https://github.com/vllm-project/vllm/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
<a class="github-button" href="https://github.com/vllm-project/vllm/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
</p>

vLLM is a fast and easy-to-use library for LLM inference and serving.

Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

vLLM is fast with:

- State-of-the-art serving throughput
- Efficient management of attention key and value memory with [**PagedAttention**](https://blog.vllm.ai/2023/06/20/vllm.html)
- Continuous batching of incoming requests
- Fast model execution with CUDA/HIP graph
- Quantization: [GPTQ](https://arxiv.org/abs/2210.17323), [AWQ](https://arxiv.org/abs/2306.00978), INT4, INT8, and FP8
- Optimized CUDA kernels, including integration with FlashAttention and FlashInfer.
- Speculative decoding
- Chunked prefill

vLLM is flexible and easy to use with:

- Seamless integration with popular HuggingFace models
- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more
- Tensor parallelism and pipeline parallelism support for distributed inference
- Streaming outputs
- OpenAI-compatible API server
- Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs, Gaudi® accelerators and GPUs, IBM Power CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
- Prefix caching support
- Multi-lora support

For more information, check out the following:

- [vLLM announcing blog post](https://vllm.ai) (intro to PagedAttention)
- [vLLM paper](https://arxiv.org/abs/2309.06180) (SOSP 2023)
- [How continuous batching enables 23x throughput in LLM inference while reducing p50 latency](https://www.anyscale.com/blog/continuous-batching-llm-inference) by Cade Daniel et al.
- [vLLM Meetups][meetups]
107 changes: 107 additions & 0 deletions docs/api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Summary

[](){ #configuration }

## Configuration

API documentation for vLLM's configuration classes.

- [vllm.config.ModelConfig][]
- [vllm.config.CacheConfig][]
- [vllm.config.TokenizerPoolConfig][]
- [vllm.config.LoadConfig][]
- [vllm.config.ParallelConfig][]
- [vllm.config.SchedulerConfig][]
- [vllm.config.DeviceConfig][]
- [vllm.config.SpeculativeConfig][]
- [vllm.config.LoRAConfig][]
- [vllm.config.PromptAdapterConfig][]
- [vllm.config.MultiModalConfig][]
- [vllm.config.PoolerConfig][]
- [vllm.config.DecodingConfig][]
- [vllm.config.ObservabilityConfig][]
- [vllm.config.KVTransferConfig][]
- [vllm.config.CompilationConfig][]
- [vllm.config.VllmConfig][]

[](){ #offline-inference-api }

## Offline Inference

LLM Class.

- [vllm.LLM][]

LLM Inputs.

- [vllm.inputs.PromptType][]
- [vllm.inputs.TextPrompt][]
- [vllm.inputs.TokensPrompt][]

## vLLM Engines

Engine classes for offline and online inference.

- [vllm.LLMEngine][]
- [vllm.AsyncLLMEngine][]

## Inference Parameters

Inference parameters for vLLM APIs.

[](){ #sampling-params }
[](){ #pooling-params }

- [vllm.SamplingParams][]
- [vllm.PoolingParams][]

[](){ #multi-modality }

## Multi-Modality

vLLM provides experimental support for multi-modal models through the [vllm.multimodal][] package.

Multi-modal inputs can be passed alongside text and token prompts to [supported models][supported-mm-models]
via the `multi_modal_data` field in [vllm.inputs.PromptType][].

Looking to add your own multi-modal model? Please follow the instructions listed [here][supports-multimodal].

- [vllm.multimodal.MULTIMODAL_REGISTRY][]

### Inputs

User-facing inputs.

- [vllm.multimodal.inputs.MultiModalDataDict][]

Internal data structures.

- [vllm.multimodal.inputs.PlaceholderRange][]
- [vllm.multimodal.inputs.NestedTensors][]
- [vllm.multimodal.inputs.MultiModalFieldElem][]
- [vllm.multimodal.inputs.MultiModalFieldConfig][]
- [vllm.multimodal.inputs.MultiModalKwargsItem][]
- [vllm.multimodal.inputs.MultiModalKwargs][]
- [vllm.multimodal.inputs.MultiModalInputs][]

### Data Parsing

- [vllm.multimodal.parse][]

### Data Processing

- [vllm.multimodal.processing][]

### Memory Profiling

- [vllm.multimodal.profiling][]

### Registry

- [vllm.multimodal.registry][]

## Model Development

- [vllm.model_executor.models.interfaces_base][]
- [vllm.model_executor.models.interfaces][]
- [vllm.model_executor.models.adapters][]
2 changes: 2 additions & 0 deletions docs/api/vllm/.meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
search:
boost: 0.5
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
(meetups)=

# vLLM Meetups
---
title: vLLM Meetups
---
[](){ #meetups }

We host regular meetups in San Francisco Bay Area every 2 months. We will share the project updates from the vLLM team and have guest speakers from the industry to share their experience and insights. Please find the materials of our previous meetups below:

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Dockerfile

We provide a <gh-file:docker/Dockerfile> to construct the image for running an OpenAI compatible server with vLLM.
More information about deploying with Docker can be found [here](#deployment-docker).
More information about deploying with Docker can be found [here][deployment-docker].

Below is a visual representation of the multi-stage Dockerfile. The build graph contains the following nodes:

Expand All @@ -17,11 +17,9 @@ The edges of the build graph represent:

- `RUN --mount=(.\*)from=...` dependencies (with a dotted line and an empty diamond arrow head)

> :::{figure} /assets/contributing/dockerfile-stages-dependency.png
> :align: center
> :alt: query
> :width: 100%
> :::
> <figure markdown="span">
> ![](../../assets/contributing/dockerfile-stages-dependency.png){ align="center" alt="query" width="100%" }
> </figure>
>
> Made using: <https://github.com/patrickhoefler/dockerfilegraph>
>
Expand Down
Loading