Skip to content

Conversation

@ahao-anyscale
Copy link
Contributor

Description

Documents best practices and strategies to decrease autoscaling time for serve llm replicas.

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@ahao-anyscale ahao-anyscale marked this pull request as ready for review October 21, 2025 16:53
@ahao-anyscale ahao-anyscale requested review from a team as code owners October 21, 2025 16:53
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation llm labels Oct 21, 2025
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Copy link
Contributor

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, need to wait until landing: https://github.com/anyscale/ray-serve-llm-perf-examples/tree/master/replica_initialization

In the meantime please address the comments below

```

In this example the cache folder is located at `/home/ray/.cache/vllm/torch_compile_cache/131ee5c6d9`.
Simply upload this file to your S3 bucket. The cache folder can now be retrieved at startup. We provide a custom utility to download the compile cache from cloud storage. Simply specify the `CloudDownloader` callback in `LLMConfig` and supply the relevant arguments. Make sure to set the `cache_dir` in compilation_config correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@kouroshHakha kouroshHakha changed the title [serve][llm] Model loading Docs [doc][serve][llm] Model loading Docs Oct 24, 2025
@kouroshHakha kouroshHakha enabled auto-merge (squash) October 24, 2025 22:48
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 24, 2025
@kouroshHakha kouroshHakha merged commit a1cf87c into ray-project:master Oct 24, 2025
7 of 8 checks passed
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 27, 2025
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants