@@ -31,6 +31,59 @@ Please refer to the above pages for more details about each API.
3131This section lists the most common options for running the vLLM engine.
3232For a full list, refer to the [ Engine Arguments] ( #engine-args ) page.
3333
34+ ### Model resolution
35+
36+ vLLM loads HuggingFace-compatible models by inspecting the ` architectures ` field in ` config.json ` of the model repository
37+ and finding the corresponding implementation that is registered to vLLM.
38+ Nevertheless, our model resolution may fail for the following reasons:
39+
40+ - The ` config.json ` of the model repository lacks the ` architectures ` field.
41+ - Unofficial repositories refer to a model using alternative names which are not recorded in vLLM.
42+ - The same architecture name is used for multiple models, creating ambiguity as to which model should be loaded.
43+
44+ In those cases, vLLM may throw an error like:
45+
46+ ``` text
47+ Traceback (most recent call last):
48+ ...
49+ File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
50+ for arch in architectures:
51+ TypeError: 'NoneType' object is not iterable
52+ ```
53+
54+ or:
55+
56+ ``` text
57+ File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
58+ raise ValueError(
59+ ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
60+ ```
61+
62+ :::{note}
63+ The above error is distinct from the following similar but different error:
64+
65+ ``` text
66+ File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
67+ raise ValueError(
68+ ValueError: Model architectures ['<arch>'] failed to be inspected. Please check the logs for more details.
69+ ```
70+
71+ This error means that vLLM failed to import the model file. Usually, it is related to missing dependencies or outdated
72+ binaries in the vLLM build. Please read the logs carefully to determine the real cause of the error.
73+ :::
74+
75+ To fix this, explicitly specify the model architecture by passing ` config.json ` overrides to the ` hf_overrides ` option.
76+ For example:
77+
78+ ``` python
79+ model = LLM(
80+ model = " cerebras/Cerebras-GPT-1.3B" ,
81+ hf_overrides = {" architectures" : [" GPT2LMHeadModel" ]}, # GPT-2
82+ )
83+ ```
84+
85+ Our [ list of supported models] ( #supported-models ) shows the model architectures that are recognized by vLLM.
86+
3487### Reducing memory usage
3588
3689Large models might cause your machine to run out of memory (OOM). Here are some options that help alleviate this problem.
0 commit comments