-
Notifications
You must be signed in to change notification settings - Fork 644
Add outlines vLLM OpenAI server #598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job integrating vllms openai module with outlines!
Could you document usage and availability in vllm.md (or a new markdown file openai_endpoint.md)?
I'll do thorough smoke testing later this week.
outlines/serve/openai_server.py
Outdated
| from http import HTTPStatus | ||
| from typing import AsyncGenerator, Dict, List, Optional, Tuple, Union | ||
|
|
||
| from aioprometheus import MetricsMiddleware |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure any new dependencies are added are part of pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't they be installed when installing vLLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should have pulled before comparing to my local openai_server.py...
| best_of=request.best_of, | ||
| top_k=request.top_k, | ||
| ignore_eos=request.ignore_eos, | ||
| use_beam_search=request.use_beam_search, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beam search functionality depends on #539
|
I'd expect this to extend OpenAI's existing response_format field to include the schema, rather than use an extra_body field. Thoughts? |
|
Thank you for opening a PR, this is impressive and I will need a few days to review it. In the meantime please make sure that the checks in the CI are passing. |
|
Wow looks amazing, so I understand now after I deploy outlines to my server, then I can just add another line into my litellm config pointing to outlines, and then it just works compatible with my existing UI? |
|
I tried smoke testing, but got |
|
@Soufiane-Ra could you solve the formatting issues? |
| return StreamingResponse( | ||
| completion_stream_generator(), media_type="text/event-stream" | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Soufiane-Ra Great work! We'd love to use this. It looks like there is another indentation error here. Could you fix it? Thx
|
Following this and would love to see any update |
I cannot agree. To be openai compatible we have to use E.g. if you use openai library for sending requests to Just my two cents ... Just saw this PR in vLLM: vllm-project/vllm#3211 |
|
Thank you for contributing! In the meantime vLLM integrated Outlines, and you can now use structured generation directly from there. |
To use with Langchain / OpenAI add extra_body={"schema": yourSchema} or extra_body={"regex": yourRegex} example: