-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating the Yi series models #3958
Merged
Merged
Changes from 11 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
dc646f7
Add files via upload
Haijian06 1a7fc13
Update and rename qwen2-7b.yaml to yi15-6b.yaml
Haijian06 14feaf7
Add files via upload
Haijian06 4dc2a91
Update yi15-9b.yaml
Haijian06 a1b68bc
Update yi15-34b.yaml
Haijian06 60cd160
Update yi15-6b.yaml
Haijian06 9be9bd9
Add files via upload
Haijian06 a9ffe54
Update yicoder-1_5b.yaml
Haijian06 6de0cf7
Update yicoder-9b.yaml
Haijian06 16723c0
Merge branch 'skypilot-org:master' into master
Haijian06 a53e27a
Add files via upload
Haijian06 022fa97
Update yi15-34b.yaml
Haijian06 b746b2a
Update yi15-6b.yaml
Haijian06 f58ec47
Update yi15-9b.yaml
Haijian06 7cd5681
Update yicoder-1_5b.yaml
Haijian06 55cf8db
Update yicoder-9b.yaml
Haijian06 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Serving Yi on Your Own Kubernetes or Cloud | ||
|
||
🤖 The Yi series models are the next generation of open-source large language models trained from scratch by [01.AI](https://www.lingyiwanwu.com/en). | ||
|
||
**Update (Sep 19, 2024) -** SkyPilot now supports the [**Yi**](https://01-ai.github.io/) model(Yi-Coder Yi-1.5)! | ||
|
||
<p align="center"> | ||
<img src="https://raw.githubusercontent.com/01-ai/Yi/main/assets/img/coder/bench1.webp" alt="yi" width="600"/> | ||
</p> | ||
|
||
## Why use SkyPilot to deploy over commercial hosted solutions? | ||
|
||
* Get the best GPU availability by utilizing multiple resources pools across Kubernetes clusters and multiple regions/clouds. | ||
* Pay absolute minimum — SkyPilot picks the cheapest resources across Kubernetes clusters and regions/clouds. No managed solution markups. | ||
* Scale up to multiple replicas across different locations and accelerators, all served with a single endpoint | ||
* Everything stays in your Kubernetes or cloud account (your VMs & buckets) | ||
* Completely private - no one else sees your chat history | ||
|
||
|
||
## Running Yi model with SkyPilot | ||
|
||
After [installing SkyPilot](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html), run your own Yi model on vLLM with SkyPilot in 1-click: | ||
|
||
1. Start serving Yi-1.5 34B on a single instance with any available GPU in the list specified in [yi15-34b.yaml](https://github.com/skypilot-org/skypilot/blob/master/llm/yi/yi15-34b.yaml) with a vLLM powered OpenAI-compatible endpoint (You can also switch to [yicoder-9b.yaml](https://github.com/skypilot-org/skypilot/blob/master/llm/yi/yicoder-9b.yaml) or [other model](https://github.com/skypilot-org/skypilot/tree/master/llm/yi) for a smaller model): | ||
|
||
```console | ||
sky launch -c yi yi15-34b.yaml | ||
``` | ||
2. Send a request to the endpoint for completion: | ||
```bash | ||
ENDPOINT=$(sky status --endpoint 8000 yi) | ||
|
||
curl http://$ENDPOINT/v1/completions \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"model": "01-ai/Yi-1.5-34B-Chat", | ||
"prompt": "Who are you?", | ||
"max_tokens": 512 | ||
}' | jq -r '.choices[0].text' | ||
``` | ||
|
||
3. Send a request for chat completion: | ||
```bash | ||
curl http://$ENDPOINT/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"model": "01-ai/Yi-1.5-34B-Chat", | ||
"messages": [ | ||
{ | ||
"role": "system", | ||
"content": "You are a helpful assistant." | ||
}, | ||
{ | ||
"role": "user", | ||
"content": "Who are you?" | ||
} | ||
], | ||
"max_tokens": 512 | ||
}' | jq -r '.choices[0].message.content' | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
envs: | ||
MODEL_NAME: 01-ai/Yi-1.5-34B-Chat | ||
|
||
service: | ||
# Specifying the path to the endpoint to check the readiness of the replicas. | ||
readiness_probe: | ||
path: /v1/chat/completions | ||
post_data: | ||
model: $MODEL_NAME | ||
messages: | ||
- role: user | ||
content: Hello! What is your name? | ||
max_tokens: 1 | ||
initial_delay_seconds: 1200 | ||
# How many replicas to manage. | ||
replicas: 2 | ||
|
||
|
||
resources: | ||
accelerators: {A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8} | ||
disk_size: 1024 | ||
disk_tier: best | ||
memory: 32+ | ||
ports: 8000 | ||
|
||
setup: | | ||
pip install vllm==0.6.1.post2 | ||
pip install vllm-flash-attn | ||
|
||
run: | | ||
export PATH=$PATH:/sbin | ||
vllm serve $MODEL_NAME \ | ||
--host 0.0.0.0 \ | ||
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \ | ||
--max-model-len 1024 | tee ~/openai_api_server.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
envs: | ||
MODEL_NAME: 01-ai/Yi-1.5-6B-Chat | ||
|
||
service: | ||
# Specifying the path to the endpoint to check the readiness of the replicas. | ||
readiness_probe: | ||
path: /v1/chat/completions | ||
post_data: | ||
model: $MODEL_NAME | ||
messages: | ||
- role: user | ||
content: Hello! What is your name? | ||
max_tokens: 1 | ||
initial_delay_seconds: 1200 | ||
# How many replicas to manage. | ||
replicas: 2 | ||
|
||
|
||
resources: | ||
accelerators: {L4, A10g, A10, L40, A40, A100, A100-80GB} | ||
disk_tier: best | ||
ports: 8000 | ||
|
||
setup: | | ||
pip install vllm==0.6.1.post2 | ||
pip install vllm-flash-attn | ||
|
||
run: | | ||
export PATH=$PATH:/sbin | ||
vllm serve $MODEL_NAME \ | ||
--host 0.0.0.0 \ | ||
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \ | ||
--max-model-len 1024 | tee ~/openai_api_server.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
envs: | ||
MODEL_NAME: 01-ai/Yi-1.5-9B-Chat | ||
|
||
service: | ||
# Specifying the path to the endpoint to check the readiness of the replicas. | ||
readiness_probe: | ||
path: /v1/chat/completions | ||
post_data: | ||
model: $MODEL_NAME | ||
messages: | ||
- role: user | ||
content: Hello! What is your name? | ||
max_tokens: 1 | ||
initial_delay_seconds: 1200 | ||
# How many replicas to manage. | ||
replicas: 2 | ||
|
||
|
||
resources: | ||
accelerators: {L4:8, A10g:8, A10:8, A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8} | ||
disk_tier: best | ||
ports: 8000 | ||
|
||
setup: | | ||
pip install vllm==0.6.1.post2 | ||
pip install vllm-flash-attn | ||
|
||
run: | | ||
export PATH=$PATH:/sbin | ||
vllm serve $MODEL_NAME \ | ||
--host 0.0.0.0 \ | ||
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \ | ||
--max-model-len 1024 | tee ~/openai_api_server.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
envs: | ||
MODEL_NAME: 01-ai/Yi-Coder-1.5B-Chat | ||
|
||
service: | ||
# Specifying the path to the endpoint to check the readiness of the replicas. | ||
readiness_probe: | ||
path: /v1/chat/completions | ||
post_data: | ||
model: $MODEL_NAME | ||
messages: | ||
- role: user | ||
content: Hello! What is your name? | ||
max_tokens: 1 | ||
initial_delay_seconds: 1200 | ||
# How many replicas to manage. | ||
replicas: 2 | ||
|
||
|
||
resources: | ||
accelerators: {L4, A10g, A10, L40, A40, A100, A100-80GB} | ||
disk_tier: best | ||
ports: 8000 | ||
|
||
setup: | | ||
pip install vllm==0.6.1.post2 | ||
pip install vllm-flash-attn | ||
|
||
run: | | ||
export PATH=$PATH:/sbin | ||
vllm serve $MODEL_NAME \ | ||
--host 0.0.0.0 \ | ||
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \ | ||
--max-model-len 1024 | tee ~/openai_api_server.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
envs: | ||
MODEL_NAME: 01-ai/Yi-Coder-9B-Chat | ||
|
||
service: | ||
# Specifying the path to the endpoint to check the readiness of the replicas. | ||
readiness_probe: | ||
path: /v1/chat/completions | ||
post_data: | ||
model: $MODEL_NAME | ||
messages: | ||
- role: user | ||
content: Hello! What is your name? | ||
max_tokens: 1 | ||
initial_delay_seconds: 1200 | ||
# How many replicas to manage. | ||
replicas: 2 | ||
|
||
|
||
resources: | ||
accelerators: {L4:8, A10g:8, A10:8, A100:4, A100:8, A100-80GB:2, A100-80GB:4, A100-80GB:8} | ||
disk_tier: best | ||
ports: 8000 | ||
|
||
setup: | | ||
pip install vllm==0.6.1.post2 | ||
pip install vllm-flash-attn | ||
|
||
run: | | ||
export PATH=$PATH:/sbin | ||
vllm serve $MODEL_NAME \ | ||
--host 0.0.0.0 \ | ||
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \ | ||
--max-model-len 1024 | tee ~/openai_api_server.log |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not adding instructions for
sky serve up
, we can leave these sections out.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimizing the docs later sounds good to me. Merging for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Michaelvll Thanks a lot, I've changed the file!