Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] update(example model): for OpenAI compatible serving #4503

Merged
merged 2 commits into from
May 1, 2024

Conversation

fpaupier
Copy link
Contributor

Previous reference model model mistralai/Mistral-7B-Instruct-v0.2 in getting started doc required special access to the model card on hugging face, leading to an exception when starting the reference example.

The error with was:

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
401 Client Error. 

The new model proposed for the getting started - unsloth/llama-3-8b-Instruct - is publicly available with reasonable performance and runs on commodity hardware. The getting started proposed in the vLLM Quickstart now works.

The previous model was in a gated repo and not available
@fpaupier fpaupier changed the title update(example model): for OpenAI compatible serving [Doc] update(example model): for OpenAI compatible serving Apr 30, 2024
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon-mo What do you think of this change? It totally makes sense to me but I'm not sure if you have an alternative model in mind to use in the tutorial.

@mgoin
Copy link
Member

mgoin commented Apr 30, 2024

FWIW the NousResearch/Meta-Llama-3-8B-Instruct model seems to be the most downloaded alias to meta-llama/Meta-Llama-3-8B-Instruct and has been updated to include the tokenizer fixes

@simon-mo
Copy link
Collaborator

When we added Mistral it doesn't have the gate :((((

I think NousResearch is a good alternative, another one is Zephyr which we uses for testing.

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to go after this model alias switch

@@ -4,7 +4,7 @@ vLLM provides an HTTP server that implements OpenAI's [Completions](https://plat

You can start the server using Python, or using [Docker](deploying_with_docker.rst):
```bash
python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.2 --dtype auto --api-key token-abc123
python -m vllm.entrypoints.openai.api_server --model unsloth/llama-3-8b-Instruct --dtype auto --api-key token-abc123
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python -m vllm.entrypoints.openai.api_server --model unsloth/llama-3-8b-Instruct --dtype auto --api-key token-abc123
python -m vllm.entrypoints.openai.api_server --model NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123

@@ -16,7 +16,7 @@ client = OpenAI(
)

completion = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.2",
model="unsloth/llama-3-8b-Instruct",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model="unsloth/llama-3-8b-Instruct",
model="NousResearch/Meta-Llama-3-8B-Instruct",

@@ -37,7 +37,7 @@ Or directly merge them into the JSON payload if you are using HTTP call directly

```python
completion = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.2",
model="unsloth/llama-3-8b-Instruct",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model="unsloth/llama-3-8b-Instruct",
model="NousResearch/Meta-Llama-3-8B-Instruct",

@@ -87,7 +87,7 @@ In order for the language model to support chat protocol, vLLM requires the mode
a chat template in its tokenizer configuration. The chat template is a Jinja2 template that
specifies how are roles, messages, and other chat-specific tokens are encoded in the input.

An example chat template for `mistralai/Mistral-7B-Instruct-v0.2` can be found [here](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2#instruction-format)
An example chat template for `unsloth/llama-3-8b-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An example chat template for `unsloth/llama-3-8b-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)
An example chat template for `NousResearch/Meta-Llama-3-8B-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)

@fpaupier
Copy link
Contributor Author

fpaupier commented May 1, 2024

good suggestion @mgoin, I updated with the NousResearch model

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks @fpaupier for the update and @mgoin for the model suggestion & review!

@ywang96 ywang96 merged commit e491c7e into vllm-project:main May 1, 2024
48 checks passed
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024
Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants