Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update new models #2632

Merged
merged 5 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,12 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ "ubuntu-latest", "macos-12", "windows-latest" ]
os: [ "ubuntu-latest", "macos-13", "windows-latest" ]
python-version: [ "3.9", "3.10", "3.11", "3.12" ]
module: [ "xinference" ]
exclude:
- { os: macos-12, python-version: 3.10 }
- { os: macos-12, python-version: 3.11 }
- { os: macos-13, python-version: 3.10 }
- { os: macos-13, python-version: 3.11 }
- { os: windows-latest, python-version: 3.10 }
- { os: windows-latest, python-version: 3.11 }
include:
Expand Down Expand Up @@ -185,6 +185,7 @@ jobs:
${{ env.SELF_HOST_PYTHON }} -m pip install -U cachetools
${{ env.SELF_HOST_PYTHON }} -m pip install -U silero-vad
${{ env.SELF_HOST_PYTHON }} -m pip install -U pydantic
${{ env.SELF_HOST_PYTHON }} -m pip install -U diffusers
${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=1500 \
--disable-warnings \
--cov-config=setup.cfg --cov-report=xml --cov=xinference xinference/core/tests/test_continuous_batching.py && \
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,14 @@ potential of cutting-edge AI models.
- Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929)
- Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906)
### New Models
- Built-in support for [GLM Edge](https://github.com/THUDM/GLM-Edge): [#2582](https://github.com/xorbitsai/inference/pull/2582)
- Built-in support for [QwQ-32B-Preview](https://qwenlm.github.io/blog/qwq-32b-preview/): [#2602](https://github.com/xorbitsai/inference/pull/2602)
- Built-in support for [Qwen 2.5 Series](https://qwenlm.github.io/blog/qwen2.5/): [#2325](https://github.com/xorbitsai/inference/pull/2325)
- Built-in support for [Fish Speech V1.4](https://huggingface.co/fishaudio/fish-speech-1.4): [#2295](https://github.com/xorbitsai/inference/pull/2295)
- Built-in support for [DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5): [#2292](https://github.com/xorbitsai/inference/pull/2292)
- Built-in support for [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio): [#2271](https://github.com/xorbitsai/inference/pull/2271)
- Built-in support for [Qwen2-vl-instruct](https://github.com/QwenLM/Qwen2-VL): [#2205](https://github.com/xorbitsai/inference/pull/2205)
- Built-in support for [MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B): [#2263](https://github.com/xorbitsai/inference/pull/2263)
- Built-in support for [CogVideoX](https://github.com/THUDM/CogVideo): [#2049](https://github.com/xorbitsai/inference/pull/2049)
- Built-in support for [flux.1-schnell & flux.1-dev](https://www.basedlabs.ai/tools/flux1): [#2007](https://github.com/xorbitsai/inference/pull/2007)
### Integrations
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
- [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.
Expand Down
4 changes: 2 additions & 2 deletions README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,14 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布
- 支持语音识别模型: [#929](https://github.com/xorbitsai/inference/pull/929)
- 增加 Metrics 统计信息: [#906](https://github.com/xorbitsai/inference/pull/906)
### 新模型
- 内置 [GLM Edge](https://github.com/THUDM/GLM-Edge): [#2582](https://github.com/xorbitsai/inference/pull/2582)
- 内置 [QwQ-32B-Preview](https://qwenlm.github.io/blog/qwq-32b-preview/): [#2602](https://github.com/xorbitsai/inference/pull/2602)
- 内置 [Qwen 2.5 Series](https://qwenlm.github.io/blog/qwen2.5/): [#2325](https://github.com/xorbitsai/inference/pull/2325)
- 内置 [Fish Speech V1.4](https://huggingface.co/fishaudio/fish-speech-1.4): [#2295](https://github.com/xorbitsai/inference/pull/2295)
- 内置 [DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5): [#2292](https://github.com/xorbitsai/inference/pull/2292)
- 内置 [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio): [#2271](https://github.com/xorbitsai/inference/pull/2271)
- 内置 [Qwen2-vl-instruct](https://github.com/QwenLM/Qwen2-VL): [#2205](https://github.com/xorbitsai/inference/pull/2205)
- 内置 [MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B): [#2263](https://github.com/xorbitsai/inference/pull/2263)
- 内置 [CogVideoX](https://github.com/THUDM/CogVideo): [#2049](https://github.com/xorbitsai/inference/pull/2049)
- 内置 [flux.1-schnell & flux.1-dev](https://www.basedlabs.ai/tools/flux1): [#2007](https://github.com/xorbitsai/inference/pull/2007)
### 集成
- [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/):一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力,帮助您轻松实现复杂的问答场景。
- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
Expand Down
111 changes: 111 additions & 0 deletions doc/source/models/builtin/llm/glm-edge-chat.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
.. _models_llm_glm-edge-chat:

========================================
glm-edge-chat
========================================

- **Context Length:** 8192
- **Model Name:** glm-edge-chat
- **Languages:** en, zh
- **Abilities:** chat
- **Description:** The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 1_5 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 1_5
- **Quantizations:** 4-bit, 8-bit, none
- **Engines**: Transformers
- **Model ID:** THUDM/glm-edge-1.5b-chat
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 1_5 --model-format pytorch --quantization ${quantization}


Model Spec 2 (pytorch, 4 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 4
- **Quantizations:** 4-bit, 8-bit, none
- **Engines**: Transformers
- **Model ID:** THUDM/glm-edge-4b-chat
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 4 --model-format pytorch --quantization ${quantization}


Model Spec 3 (ggufv2, 1_5 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 1_5
- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-1.5b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization}


Model Spec 4 (ggufv2, 1_5 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 1_5
- **Quantizations:** F16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-1.5b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-1.5b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 1_5 --model-format ggufv2 --quantization ${quantization}


Model Spec 5 (ggufv2, 4 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 4
- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-4b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 4 --model-format ggufv2 --quantization ${quantization}


Model Spec 6 (ggufv2, 4 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 4
- **Quantizations:** F16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-4b-chat-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-4b-chat-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-chat --size-in-billions 4 --model-format ggufv2 --quantization ${quantization}

143 changes: 143 additions & 0 deletions doc/source/models/builtin/llm/glm-edge-v.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
.. _models_llm_glm-edge-v:

========================================
glm-edge-v
========================================

- **Context Length:** 8192
- **Model Name:** glm-edge-v
- **Languages:** en, zh
- **Abilities:** chat, vision
- **Description:** The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (pytorch, 2 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 2
- **Quantizations:** 4-bit, 8-bit, none
- **Engines**: Transformers
- **Model ID:** THUDM/glm-edge-v-2b
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format pytorch --quantization ${quantization}


Model Spec 2 (pytorch, 5 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 5
- **Quantizations:** 4-bit, 8-bit, none
- **Engines**: Transformers
- **Model ID:** THUDM/glm-edge-v-5b
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format pytorch --quantization ${quantization}


Model Spec 3 (ggufv2, 2 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 2
- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-v-2b-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization}


Model Spec 4 (ggufv2, 2 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 2
- **Quantizations:** F16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-v-2b-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization}


Model Spec 5 (ggufv2, 2 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 2
- **Quantizations:** f16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-v-2b-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-2b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 2 --model-format ggufv2 --quantization ${quantization}


Model Spec 6 (ggufv2, 5 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 5
- **Quantizations:** Q4_0, Q4_1, Q4_K, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K, Q5_K_M, Q5_K_S, Q6_K, Q8_0
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-v-5b-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization}


Model Spec 7 (ggufv2, 5 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 5
- **Quantizations:** F16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-v-5b-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization}


Model Spec 8 (ggufv2, 5 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 5
- **Quantizations:** f16
- **Engines**: llama.cpp
- **Model ID:** THUDM/glm-edge-v-5b-gguf
- **Model Hubs**: `Hugging Face <https://huggingface.co/THUDM/glm-edge-v-5b-gguf>`__, `ModelScope <https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b-gguf>`__

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-engine ${engine} --model-name glm-edge-v --size-in-billions 5 --model-format ggufv2 --quantization ${quantization}

14 changes: 14 additions & 0 deletions doc/source/models/builtin/llm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,16 @@ The following is a list of built-in LLM in Xinference:
- 8192
- GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.

* - :ref:`glm-edge-chat <models_llm_glm-edge-chat>`
- chat
- 8192
- The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.

* - :ref:`glm-edge-v <models_llm_glm-edge-v>`
- chat, vision
- 8192
- The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs.

* - :ref:`glm4-chat <models_llm_glm4-chat>`
- chat, tools
- 131072
Expand Down Expand Up @@ -616,6 +626,10 @@ The following is a list of built-in LLM in Xinference:

glm-4v

glm-edge-chat

glm-edge-v

glm4-chat

glm4-chat-1m
Expand Down
8 changes: 5 additions & 3 deletions doc/source/models/model_abilities/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,11 @@ Supported models
The ``tools`` ability is supported with the following models in Xinference:

* :ref:`models_llm_qwen-chat`
* :ref:`models_llm_chatglm3`
* :ref:`models_llm_gorilla-openfunctions-v1`

* :ref:`models_llm_glm4-chat`
* :ref:`models_llm_glm4-chat-1m`
* :ref:`models_llm_llama-3.1-instruct`
* :ref:`models_llm_qwen2.5-instruct`
* :ref:`models_llm_qwen2.5-coder-instruct`

Quickstart
==============
Expand Down
1 change: 1 addition & 0 deletions doc/source/models/model_abilities/vision.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ The ``vision`` ability is supported with the following models in Xinference:
* :ref:`qwen2-vl-instruct <models_llm_qwen2-vl-instruct>`
* :ref:`llama-3.2-vision <models_llm_llama-3.2-vision>`
* :ref:`llama-3.2-vision-instruct <models_llm_llama-3.2-vision-instruct>`
* :ref:`glm-edge-v <models_llm_glm-edge-v>`


Quickstart
Expand Down
Loading