You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With tensor parallelism enabled, each process will read the whole model and split it into chunks, which makes the disk reading time even longer (proportional to the size of tensor parallelism).
25
25
26
-
You can convert the model checkpoint to a sharded checkpoint using <gh-file:examples/offline_inference/save_sharded_state.py>. The conversion process might take some time, but later you can load the sharded checkpoint much faster. The model loading time should remain constant regardless of the size of tensor parallelism.
26
+
You can convert the model checkpoint to a sharded checkpoint using [examples/offline_inference/save_sharded_state.py](../../examples/offline_inference/save_sharded_state.py). The conversion process might take some time, but later you can load the sharded checkpoint much faster. The model loading time should remain constant regardless of the size of tensor parallelism.
Copy file name to clipboardExpand all lines: docs/configuration/tpu.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,7 +96,7 @@ Although it’s common to do this with GPUs, don't try to fragment 2 or 8 differ
96
96
97
97
### Tune your workloads
98
98
99
-
Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](gh-file:benchmarks/auto_tune/README.md) to optimize your workloads for your use case.
99
+
Although we try to have great default configs, we strongly recommend you check out the [vLLM auto-tuner](../../benchmarks/auto_tune/README.md) to optimize your workloads for your use case.
Copy file name to clipboardExpand all lines: docs/contributing/README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ Unsure on where to start? Check out the following links for tasks to work on:
22
22
23
23
## License
24
24
25
-
See <gh-file:LICENSE>.
25
+
See [LICENSE](../../LICENSE).
26
26
27
27
## Developing
28
28
@@ -54,7 +54,7 @@ For more details about installing from source and installing for other hardware,
54
54
For an optimized workflow when iterating on C++/CUDA kernels, see the [Incremental Compilation Workflow](./incremental_build.md) for recommendations.
55
55
56
56
!!! tip
57
-
vLLM is compatible with Python versions 3.10 to 3.13. However, vLLM's default [Dockerfile](gh-file:docker/Dockerfile) ships with Python 3.12 and tests in CI (except `mypy`) are run with Python 3.12.
57
+
vLLM is compatible with Python versions 3.10 to 3.13. However, vLLM's default [Dockerfile](../../docker/Dockerfile) ships with Python 3.12 and tests in CI (except `mypy`) are run with Python 3.12.
58
58
59
59
Therefore, we recommend developing with Python 3.12 to minimise the chance of your local environment clashing with our CI environment.
60
60
@@ -88,7 +88,7 @@ vLLM's `pre-commit` hooks will now run automatically every time you commit.
88
88
89
89
### Documentation
90
90
91
-
MkDocs is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. Documentation source files are written in Markdown, and configured with a single YAML configuration file, <gh-file:mkdocs.yaml>.
91
+
MkDocs is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. Documentation source files are written in Markdown, and configured with a single YAML configuration file, [mkdocs.yaml](../../mkdocs.yaml).
If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
153
153
154
154
!!! important
155
-
If you discover a security vulnerability, please follow the instructions [here](gh-file:SECURITY.md#reporting-a-vulnerability).
155
+
If you discover a security vulnerability, please follow the instructions [here](../../SECURITY.md).
156
156
157
157
## Pull Requests & Code Reviews
158
158
@@ -162,7 +162,7 @@ code quality and improve the efficiency of the review process.
162
162
163
163
### DCO and Signed-off-by
164
164
165
-
When contributing changes to this project, you must agree to the <gh-file:DCO>.
165
+
When contributing changes to this project, you must agree to the [DCO](../../DCO).
166
166
Commits must include a `Signed-off-by:` header which certifies agreement with
Copy file name to clipboardExpand all lines: docs/contributing/benchmarks.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -822,7 +822,7 @@ you should set `--endpoint /v1/embeddings` to use the Embeddings API. The backen
822
822
- CLIP: `--backend openai-embeddings-clip`
823
823
- VLM2Vec: `--backend openai-embeddings-vlm2vec`
824
824
825
-
For other models, please add your own implementation inside <gh-file:vllm/benchmarks/lib/endpoint_request_func.py> to match the expected instruction format.
825
+
For other models, please add your own implementation inside [vllm/benchmarks/lib/endpoint_request_func.py](../../vllm/benchmarks/lib/endpoint_request_func.py) to match the expected instruction format.
826
826
827
827
You can use any text or multi-modal dataset to benchmark the model, as long as the model supports it.
828
828
For example, you can use ShareGPT and VisionArena to benchmark vision-language embeddings.
@@ -962,7 +962,7 @@ For more results visualization, check the [visualizing the results](https://gith
962
962
963
963
The latest performance results are hosted on the public [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm).
964
964
965
-
More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](gh-file:.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md).
965
+
More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](../../.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md).
966
966
967
967
### Continuous Benchmarking
968
968
@@ -996,4 +996,4 @@ These compare vLLM's performance against alternatives (`tgi`, `trt-llm`, and `lm
996
996
997
997
The latest nightly benchmark results are shared in major release blog posts such as [vLLM v0.6.0](https://blog.vllm.ai/2024/09/05/perf-update.html).
998
998
999
-
More information on the nightly benchmarks and their parameters can be found [here](gh-file:.buildkite/nightly-benchmarks/nightly-descriptions.md).
999
+
More information on the nightly benchmarks and their parameters can be found [here](../../.buildkite/nightly-benchmarks/nightly-descriptions.md).
0 commit comments