[Doc] update(example model): for OpenAI compatible serving #4503

fpaupier · 2024-04-30T13:21:42Z

Previous reference model model mistralai/Mistral-7B-Instruct-v0.2 in getting started doc required special access to the model card on hugging face, leading to an exception when starting the reference example.

The error with was:

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
401 Client Error.

The new model proposed for the getting started - unsloth/llama-3-8b-Instruct - is publicly available with reasonable performance and runs on commodity hardware. The getting started proposed in the vLLM Quickstart now works.

The previous model was in a gated repo and not available

ywang96

@simon-mo What do you think of this change? It totally makes sense to me but I'm not sure if you have an alternative model in mind to use in the tutorial.

mgoin · 2024-04-30T20:35:19Z

FWIW the NousResearch/Meta-Llama-3-8B-Instruct model seems to be the most downloaded alias to meta-llama/Meta-Llama-3-8B-Instruct and has been updated to include the tokenizer fixes

simon-mo · 2024-04-30T23:46:16Z

When we added Mistral it doesn't have the gate :((((

I think NousResearch is a good alternative, another one is Zephyr which we uses for testing.

mgoin

Should be good to go after this model alias switch

mgoin · 2024-05-01T01:39:35Z

docs/source/serving/openai_compatible_server.md

@@ -4,7 +4,7 @@ vLLM provides an HTTP server that implements OpenAI's [Completions](https://plat

 You can start the server using Python, or using [Docker](deploying_with_docker.rst):
 ```bash
-python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.2 --dtype auto --api-key token-abc123
+python -m vllm.entrypoints.openai.api_server --model unsloth/llama-3-8b-Instruct --dtype auto --api-key token-abc123


Suggested change

python -m vllm.entrypoints.openai.api_server --model unsloth/llama-3-8b-Instruct --dtype auto --api-key token-abc123

python -m vllm.entrypoints.openai.api_server --model NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123

mgoin · 2024-05-01T01:39:45Z

docs/source/serving/openai_compatible_server.md

@@ -16,7 +16,7 @@ client = OpenAI(
 )

 completion = client.chat.completions.create(
-  model="mistralai/Mistral-7B-Instruct-v0.2",
+  model="unsloth/llama-3-8b-Instruct",


Suggested change

model="unsloth/llama-3-8b-Instruct",

model="NousResearch/Meta-Llama-3-8B-Instruct",

mgoin · 2024-05-01T01:39:51Z

docs/source/serving/openai_compatible_server.md

@@ -37,7 +37,7 @@ Or directly merge them into the JSON payload if you are using HTTP call directly

 ```python
 completion = client.chat.completions.create(
-  model="mistralai/Mistral-7B-Instruct-v0.2",
+  model="unsloth/llama-3-8b-Instruct",


Suggested change

model="unsloth/llama-3-8b-Instruct",

model="NousResearch/Meta-Llama-3-8B-Instruct",

mgoin · 2024-05-01T01:40:43Z

docs/source/serving/openai_compatible_server.md

@@ -87,7 +87,7 @@ In order for the language model to support chat protocol, vLLM requires the mode
 a chat template in its tokenizer configuration. The chat template is a Jinja2 template that
 specifies how are roles, messages, and other chat-specific tokens are encoded in the input.

-An example chat template for `mistralai/Mistral-7B-Instruct-v0.2` can be found [here](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2#instruction-format)
+An example chat template for `unsloth/llama-3-8b-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)


Suggested change

An example chat template for `unsloth/llama-3-8b-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)

An example chat template for `NousResearch/Meta-Llama-3-8B-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)

instead of unsloth model

fpaupier · 2024-05-01T05:55:52Z

good suggestion @mgoin, I updated with the NousResearch model

ywang96

LGTM - Thanks @fpaupier for the update and @mgoin for the model suggestion & review!

…ect#4503)

update(example model): for OpenAI compatible serving

6fce250

The previous model was in a gated repo and not available

fpaupier changed the title ~~update(example model): for OpenAI compatible serving~~ [Doc] update(example model): for OpenAI compatible serving Apr 30, 2024

ywang96 reviewed Apr 30, 2024

View reviewed changes

mgoin reviewed May 1, 2024

View reviewed changes

refactor(demo model): use NousResearch/Meta-Llama-3-8B-Instruct

3ccd77e

instead of unsloth model

ywang96 approved these changes May 1, 2024

View reviewed changes

ywang96 merged commit e491c7e into vllm-project:main May 1, 2024
48 checks passed

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 6, 2024

[Doc] update(example model): for OpenAI compatible serving (vllm-proj…

75c6ebf

…ect#4503)

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024

[Doc] update(example model): for OpenAI compatible serving (vllm-proj…

6e3a823

…ect#4503)

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request May 7, 2024

[Doc] update(example model): for OpenAI compatible serving (vllm-proj…

2d6e03c

…ect#4503)

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Temirulan pushed a commit to Temirulan/vllm-whisper that referenced this pull request Sep 6, 2024

[Doc] update(example model): for OpenAI compatible serving (vllm-proj…

cf69f3a

…ect#4503)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] update(example model): for OpenAI compatible serving #4503

[Doc] update(example model): for OpenAI compatible serving #4503

fpaupier commented Apr 30, 2024

ywang96 left a comment

mgoin commented Apr 30, 2024 •

edited

Loading

simon-mo commented Apr 30, 2024

mgoin left a comment

mgoin May 1, 2024

mgoin May 1, 2024

mgoin May 1, 2024

mgoin May 1, 2024

fpaupier commented May 1, 2024 •

edited

Loading

ywang96 left a comment

	python -m vllm.entrypoints.openai.api_server --model unsloth/llama-3-8b-Instruct --dtype auto --api-key token-abc123
	python -m vllm.entrypoints.openai.api_server --model NousResearch/Meta-Llama-3-8B-Instruct --dtype auto --api-key token-abc123

	model="unsloth/llama-3-8b-Instruct",
	model="NousResearch/Meta-Llama-3-8B-Instruct",

	An example chat template for `unsloth/llama-3-8b-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)
	An example chat template for `NousResearch/Meta-Llama-3-8B-Instruct` can be found [here](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing#scrollTo=vITh0KVJ10qX)

[Doc] update(example model): for OpenAI compatible serving #4503

[Doc] update(example model): for OpenAI compatible serving #4503

Conversation

fpaupier commented Apr 30, 2024

ywang96 left a comment

Choose a reason for hiding this comment

mgoin commented Apr 30, 2024 • edited Loading

simon-mo commented Apr 30, 2024

mgoin left a comment

Choose a reason for hiding this comment

mgoin May 1, 2024

Choose a reason for hiding this comment

mgoin May 1, 2024

Choose a reason for hiding this comment

mgoin May 1, 2024

Choose a reason for hiding this comment

mgoin May 1, 2024

Choose a reason for hiding this comment

fpaupier commented May 1, 2024 • edited Loading

ywang96 left a comment

Choose a reason for hiding this comment

mgoin commented Apr 30, 2024 •

edited

Loading

fpaupier commented May 1, 2024 •

edited

Loading