-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] update(example model): for OpenAI compatible serving #4503
Conversation
The previous model was in a gated repo and not available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simon-mo What do you think of this change? It totally makes sense to me but I'm not sure if you have an alternative model in mind to use in the tutorial.
FWIW the |
When we added Mistral it doesn't have the gate :(((( I think NousResearch is a good alternative, another one is Zephyr which we uses for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good to go after this model alias switch
@@ -4,7 +4,7 @@ vLLM provides an HTTP server that implements OpenAI's [Completions](https://plat | |||
|
|||
You can start the server using Python, or using [Docker](deploying_with_docker.rst): | |||
```bash | |||
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.2 --dtype auto --api-key token-abc123 | |||
python -m vllm.entrypoints.openai.api_server --model unsloth/llama-3-8b-Instruct --dtype auto --api-key token-abc123 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python -m vllm.entrypoints.openai.api_server --model unsloth/llama-3-8b-Instruct --dtype auto --api-key token-abc123 | |
python -m vllm.entrypoints.openai.api_server --model NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123 |
@@ -16,7 +16,7 @@ client = OpenAI( | |||
) | |||
|
|||
completion = client.chat.completions.create( | |||
model="mistralai/Mistral-7B-Instruct-v0.2", | |||
model="unsloth/llama-3-8b-Instruct", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model="unsloth/llama-3-8b-Instruct", | |
model="NousResearch/Meta-Llama-3-8B-Instruct", |
@@ -37,7 +37,7 @@ Or directly merge them into the JSON payload if you are using HTTP call directly | |||
|
|||
```python | |||
completion = client.chat.completions.create( | |||
model="mistralai/Mistral-7B-Instruct-v0.2", | |||
model="unsloth/llama-3-8b-Instruct", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model="unsloth/llama-3-8b-Instruct", | |
model="NousResearch/Meta-Llama-3-8B-Instruct", |
@@ -87,7 +87,7 @@ In order for the language model to support chat protocol, vLLM requires the mode | |||
a chat template in its tokenizer configuration. The chat template is a Jinja2 template that | |||
specifies how are roles, messages, and other chat-specific tokens are encoded in the input. | |||
|
|||
An example chat template for `mistralai/Mistral-7B-Instruct-v0.2` can be found [here](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2#instruction-format) | |||
An example chat template for `unsloth/llama-3-8b-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example chat template for `unsloth/llama-3-8b-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX) | |
An example chat template for `NousResearch/Meta-Llama-3-8B-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX) |
instead of unsloth model
good suggestion @mgoin, I updated with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous reference model model
mistralai/Mistral-7B-Instruct-v0.2
in getting started doc required special access to the model card on hugging face, leading to an exception when starting the reference example.The error with was:
The new model proposed for the getting started - unsloth/llama-3-8b-Instruct - is publicly available with reasonable performance and runs on commodity hardware. The getting started proposed in the vLLM Quickstart now works.