Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Support download models from modelscope #475

Merged
merged 15 commits into from
Sep 22, 2023
1 change: 1 addition & 0 deletions .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ jobs:
pip install ctransformers
pip install sentence-transformers
pip install s3fs
pip install modelscope
pip install -e ".[dev]"
working-directory: .

Expand Down
75 changes: 40 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,47 +194,52 @@ To view the builtin models, run the following command:
$ xinference registrations
```

| Type | Name | Language | Ability |
|------|---------------------|--------------|-----------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| Type | Name | Language | Ability |
|------|---------------------|--------------|------------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| LLM | glaive-coder | ['en'] | ['chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-16k | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-8k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-16k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy-v11.1 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm-7b | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat-7b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-20b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy | ['en', 'zh'] | ['embed', 'chat'] |

For in-depth details on the built-in models, please refer to [built-in models](https://inference.readthedocs.io/en/latest/models/builtin/index.html).

**NOTE**:
- Xinference will download models automatically for you, and by default the models will be saved under `${USER}/.xinference/cache`.
- If you have trouble downloading models from the Hugging Face, run `export XINFERENCE_MODEL_SRC=xorbits` to download models from our mirror site.
- If you have trouble downloading models from the Hugging Face, run `export XINFERENCE_MODEL_SRC=modelscope` to download models from [modelscope](https://modelscope.cn/). Models supported by modelscope:
- llama-2
- llama-2-chat
- baichuan-2
- baichuan-2-chat
- chatglm2
- chatglm2-32k
- internlm-chat-20b

## Custom models
Please refer to [custom models](https://inference.readthedocs.io/en/latest/models/custom.html).
2 changes: 1 addition & 1 deletion README_ja_JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ $ xinference registrations
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy-v11.1 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | OpenBuddy | ['en', 'zh'] | ['embed', 'chat'] |

**注**:
- Xinference は自動的にモデルをダウンロードし、デフォルトでは `${USER}/.xinference/cache` の下に保存されます。
Expand Down
77 changes: 41 additions & 36 deletions README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,47 +176,52 @@ model.chat(
$ xinference registrations
```

| Type | Name | Language | Ability |
|------|---------------------|--------------|-----------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| Type | Name | Language | Ability |
|------|---------------------|--------------|------------------------|
| LLM | baichuan | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-2 | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | baichuan-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | baichuan-2-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | chatglm2-32k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | code-llama | ['en'] | ['generate'] |
| LLM | code-llama-instruct | ['en'] | ['chat'] |
| LLM | code-llama-python | ['en'] | ['generate'] |
| LLM | falcon | ['en'] | ['embed', 'generate'] |
| LLM | falcon-instruct | ['en'] | ['embed', 'chat'] |
| LLM | glaive-coder | ['en'] | ['chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-16k | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-8k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-16k | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy-v11.1 | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | gpt-2 | ['en'] | ['generate'] |
| LLM | internlm-7b | ['en', 'zh'] | ['embed', 'generate'] |
| LLM | internlm-chat-7b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | internlm-chat-20b | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | llama-2 | ['en'] | ['embed', 'generate'] |
| LLM | llama-2-chat | ['en'] | ['embed', 'chat'] |
| LLM | opt | ['en'] | ['embed', 'generate'] |
| LLM | orca | ['en'] | ['embed', 'chat'] |
| LLM | qwen-chat | ['en', 'zh'] | ['embed', 'chat'] |
| LLM | starchat-beta | ['en'] | ['embed', 'chat'] |
| LLM | starcoder | ['en'] | ['generate'] |
| LLM | starcoderplus | ['en'] | ['embed', 'generate'] |
| LLM | vicuna-v1.3 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5 | ['en'] | ['embed', 'chat'] |
| LLM | vicuna-v1.5-16k | ['en'] | ['embed', 'chat'] |
| LLM | wizardlm-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | wizardmath-v1.0 | ['en'] | ['embed', 'chat'] |
| LLM | OpenBuddy | ['en', 'zh'] | ['embed', 'chat'] |

更多信息请参考 [内置模型](https://inference.readthedocs.io/en/latest/models/builtin/index.html)。

**注意**:
- Xinference 会自动为你下载模型,默认的模型存放路径为 `${USER}/.xinference/cache`。
- 如果您在Hugging Face下载模型时遇到问题,请运行 `export XINFERENCE_MODEL_SRC=xorbits`,从我们的镜像站点下载模型。
-
- 如果您在Hugging Face下载模型时遇到问题,请运行 `export XINFERENCE_MODEL_SRC=modelscope`,默认优先从 modelscope 下载。目前 modelscope 支持的模型有:
- llama-2
- llama-2-chat
- baichuan-2
- baichuan-2-chat
- chatglm2
- chatglm2-32k
- internlm-chat-20b

## 自定义模型
请参考 [自定义模型](https://inference.readthedocs.io/en/latest/models/custom.html)。
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ install_requires =
typing_extensions
fsspec
s3fs
modelscope

[options.packages.find]
exclude =
Expand Down
7 changes: 7 additions & 0 deletions xinference/model/llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from .core import LLM
from .llm_family import (
BUILTIN_LLM_FAMILIES,
BUILTIN_MODELSCOPE_LLM_FAMILIES,
LLM_CLASSES,
GgmlLLMSpecV1,
LLMFamilyV1,
Expand Down Expand Up @@ -83,6 +84,12 @@ def _install():
for json_obj in json.load(codecs.open(json_path, "r", encoding="utf-8")):
BUILTIN_LLM_FAMILIES.append(LLMFamilyV1.parse_obj(json_obj))

modelscope_json_path = os.path.join(
os.path.dirname(os.path.abspath(__file__)), "llm_family_modelscope.json"
)
for json_obj in json.load(codecs.open(modelscope_json_path, "r", encoding="utf-8")):
BUILTIN_MODELSCOPE_LLM_FAMILIES.append(LLMFamilyV1.parse_obj(json_obj))

from ...constants import XINFERENCE_MODEL_DIR

user_defined_llm_dir = os.path.join(XINFERENCE_MODEL_DIR, "llm")
Expand Down
52 changes: 4 additions & 48 deletions xinference/model/llm/llm_family.json
Original file line number Diff line number Diff line change
Expand Up @@ -1015,7 +1015,7 @@
{
"version": 1,
"context_length": 8192,
"model_name": "internlm",
"model_name": "internlm-7b",
"model_lang": [
"en",
"zh"
Expand All @@ -1042,7 +1042,7 @@
{
"version": 1,
"context_length": 4096,
"model_name": "internlm-chat",
"model_name": "internlm-chat-7b",
"model_lang": [
"en",
"zh"
Expand Down Expand Up @@ -1083,54 +1083,10 @@
]
}
},
{
"version": 1,
"context_length": 8192,
"model_name": "internlm-chat-8k",
"model_lang": [
"en",
"zh"
],
"model_ability": [
"embed",
"chat"
],
"model_description": "Internlm-chat-8k is a special version of Internlm-chat, with a context window of 8k tokens instead of 4k.",
"model_specs": [
{
"model_format": "pytorch",
"model_size_in_billions": 7,
"quantizations": [
"4-bit",
"8-bit",
"none"
],
"model_id": "internlm/internlm-chat-7b-8k",
"model_revision": "8bd146e7dc41ba5f3eba95679554a03acc9f0043"
}
],
"prompt_style": {
"style_name": "INTERNLM",
"system_prompt": "",
"roles": [
"<|User|>",
"<|Bot|>"
],
"intra_message_sep": "<eoh>\n",
"inter_message_sep": "<eoa>\n",
"stop_token_ids": [
1,
103028
],
"stop": [
"<eoa>"
]
}
},
{
"version": 1,
"context_length": 16384,
"model_name": "internlm-16k",
"model_name": "internlm-20b",
"model_lang": [
"en",
"zh"
Expand All @@ -1157,7 +1113,7 @@
{
"version": 1,
"context_length": 16384,
"model_name": "internlm-chat-16k",
"model_name": "internlm-chat-20b",
"model_lang": [
"en",
"zh"
Expand Down
Loading