-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: FastAPI 0.113.0 breaks vLLM OpenAPI #8212
Comments
I believe I was able to find a solution to this. It is related to OpenAI-Python #1454 Not sure why it works with fastapi 0.112.2 but fails in 0.113.0 Problem line: async def create_chat_completion(request: ChatCompletionRequest,
raw_request: Request): Confirmed Fix:
I'll make a PR on this and reference the issue. Can also add some try/catch with TypeAdapter validation, unless it's seen as unnecessary or impacts performance. |
Minimal example of triggering the issue:Quick guide to running the latest vllm-openai container, upgrading fastapi, and triggering the issue. Also includes instructions to quickly change to editable mode Pre-requisites:
Download and start the latest vllm container:
Working Example with 0.112.2:Show current fastapi version:
Start server with small model
POST to 'v1/chat/completions'
Non-working example after upgrading fastapi:Upgrade fastapi to 0.113.0 or higher
Start openai-compatible api_server:
From outside the container, attempt POST to 'v1/chat/completions':
Dev setup using pre-compiled C-binaries (saves hours of compiling when running pip install -e .):Start docker container as above. Install devel packages for Nvidia
Build vllm editable using precompiled binaries
Run API server
Example of an inference request:
|
I resolved the issue by downgrading FastAPI to version 0.111.0: pip install fastapi==0.111.0 For reference, I'm using |
A few things I noticed:
This may be a red herring, but wondering if there's some weirdness with Anyway, smallest reproducible example: $ pip install vllm==0.6.0 fastapi==0.113.0 pydantic==2.8.2
$ python Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:16:23 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:16:23 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
Traceback (most recent call last):
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 283, in get_model_fields
return [
^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 284, in <listcomp>
ModelField(field_info=field_info, name=name)
File "<string>", line 6, in __init__
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 109, in __post_init__
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 264, in __init__
self._init_core_attrs(rebuild_mocks=False)
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 142, in wrapped
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 102, in _get_schema
schema = gen.generate_schema(type_)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
return self._annotated_schema(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
schema = self._apply_annotations(source_type, annotations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
schema = get_inner_schema(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
lambda source, handler: handler(source)
^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
items_schema = handler.generate_schema(self.item_source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
return self._generate_schema.generate_schema(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
return self._match_generic_type(obj, origin)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
return self._union_schema(obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
choices.append(self.generate_schema(arg))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
return self._typed_dict_schema(obj, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
for field_name, annotation in get_type_hints_infer_globalns(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 2336, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 371, in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 877, in _evaluate
eval(self.__forward_code__, globalns, localns),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
>>> Note that $ pip install pydantic==2.9.0
$ python Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:26:12 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:26:12 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
[ModelField(field_info=FieldInfo(annotation=List[Union[ChatCompletionSystemMessageParam, ChatCompletionUserMessageParam, ChatCompletionAssistantMessageParam, ChatCompletionToolMessageParam, ChatCompletionFunctionMessageParam, CustomChatCompletionMessageParam]], required=True), name='messages', mode='validation'), ModelField(field_info=FieldInfo(annotation=str, required=True), name='model', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='frequency_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default=None), name='logit_bias', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=0), name='top_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='max_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=1), name='n', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='presence_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[ResponseFormat, NoneType], required=False, default=None), name='response_format', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None, metadata=[Ge(ge=-9223372036854775808), Le(le=9223372036854775807)]), name='seed', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, List[str], NoneType], required=False, default_factory=list), name='stop', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='stream', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[StreamOptions, NoneType], required=False, default=None), name='stream_options', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.7), name='temperature', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=1.0), name='top_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[ChatCompletionToolsParam], NoneType], required=False, default=None), name='tools', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Literal['none'], Literal['auto'], ChatCompletionNamedToolChoiceParam, NoneType], required=False, default='none'), name='tool_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='parallel_tool_calls', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None), name='user', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='best_of', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='use_beam_search', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=-1), name='top_k', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=0.0), name='min_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='repetition_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='length_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='early_stopping', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[int], NoneType], required=False, default_factory=list), name='stop_token_ids', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='include_stop_str_in_output', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='ignore_eos', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=0), name='min_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='skip_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='spaces_between_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1)])], NoneType], required=False, default=None), name='truncate_prompt_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='prompt_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, the new message will be prepended with the last message if they belong to the same role.'), name='echo', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True, description='If true, the generation prompt will be added to the chat template. This is a parameter used by chat template in tokenizer config of the model.'), name='add_generation_prompt', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, special tokens (e.g. BOS) will be added to the prompt on top of what is added by the chat template. For most models, the chat template takes care of adding the special tokens so this should be set to false (as is the default).'), name='add_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[Dict[str, str]], NoneType], required=False, default=None, description='A list of dicts representing documents that will be accessible to the model if it is performing RAG (retrieval-augmented generation). If the template does not support RAG, this argument will have no effect. We recommend that each document should be a dict containing "title" and "text" keys.'), name='documents', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='A Jinja template to use for this conversion. As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.'), name='chat_template', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, Any], NoneType], required=False, default=None, description='Additional kwargs to pass to the template renderer. Will be accessible by the chat template.'), name='chat_template_kwargs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, dict, BaseModel, NoneType], required=False, default=None, description='If specified, the output will follow the JSON schema.'), name='guided_json', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the regex pattern.'), name='guided_regex', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None, description='If specified, the output will be exactly one of the choices.'), name='guided_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the context free grammar.'), name='guided_grammar', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description="If specified, will override the default guided decoding backend of the server for this specific request. If set, must be either 'outlines' / 'lm-format-enforcer'"), name='guided_decoding_backend', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, will override the default whitespace pattern for guided json decoding.'), name='guided_whitespace_pattern', mode='validation')]
>>> This makes me feel like this is a pydantic issue? Or at least a confluence of factors across openai / pydantic / fastapi. |
Checking @pachewise's code, I was able to reduce the error reproduction to: from typing_extensions import Annotated
from typing import List
from vllm.entrypoints.chat_utils import (
ChatCompletionMessageParam,
)
from vllm.entrypoints.openai.protocol import ChatCompletionRequest
from pydantic import TypeAdapter
for name, field in ChatCompletionRequest.model_fields.items():
print(name, field)
TypeAdapter(Annotated[List[ChatCompletionMessageParam], field]) That doesn't use FastAPI, it's just Pydantic. And indeed, it's fixed by upgrading Pydantic to 2.9.0. 🎉 It wasn't breaking in FastAPI before because the logic before 0.113.0 wasn't using |
Glad that it's resolved! Does the issue still occur in FastAPI 0.113.1 with Pydantic 2.8? If so, we may have to update either |
@DarkLight1337 yes, I'd recommend |
Unfortunately the fastapi bump has broken Ray 2.9 compatibility. $ pip install vllm==0.6.1.post2 'ray[serve]==2.9.3'
... snip...
The conflict is caused by:
vllm 0.6.1.post2 depends on fastapi>=0.114.1; python_version >= "3.9"
ray[serve] 2.9.3 depends on fastapi<=0.108.0; extra == "serve" I've prepped a fix for the Ray 2.9 regression introduced in a different PR, but it won't really help unless we address the fastapi pin here as well. Can we lower the fastapi pinned version, since it wasn't actually the cause of the issue, so we maintain the Ray 2.9 compatibility? |
On it! |
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
FastAPI released 0.113.0 about 5 hours ago. This release has a major refactor of Pydantic support. It appears this causes a Pydantic failure with the OpenAI-API calling.
Confirmed that reverting to FastAPI 0.112.2 resolves the problem (
pip install fastapi==0.112.2
).Here are logs on the failure:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: