From 4d87ba1568ce7a126ea9dba94b6000ee212d8d2a Mon Sep 17 00:00:00 2001
From: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Date: Fri, 15 Sep 2023 16:23:38 +0800
Subject: [PATCH 1/3] DOC: update README

---
 README.md       |  8 ++++----
 README_zh_CN.md | 10 +++++-----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/README.md b/README.md
index 749db8a690..13ac38e691 100644
--- a/README.md
+++ b/README.md
@@ -33,8 +33,9 @@ potential of cutting-edge AI models.
 - Xinference dashboard: [#93](https://github.com/xorbitsai/inference/issues/93)
 ### New Models
 - Built-in support for [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
-### Tools
-- LlamaIndex plugin: [#7151](https://github.com/jerryjliu/llama_index/pull/7151)
+### Integrations
+- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
+- [Chatbox](https://chatboxai.app/): a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.
 
 
 ## Key Features
@@ -57,8 +58,7 @@ for seamless management and monitoring.
 allowing the seamless distribution of model inference across multiple devices or machines.
 
 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
-with popular third-party libraries like [LangChain](https://python.langchain.com/docs/integrations/providers/xinference) 
-and [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window).
+with popular third-party libraries including [LangChain](https://python.langchain.com/docs/integrations/providers/xinference), [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window), [Dify](https://docs.dify.ai/advanced/model-configuration/xinference), and [Chatbox](https://chatboxai.app/).
 
 ## Getting Started
 Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual
diff --git a/README_zh_CN.md b/README_zh_CN.md
index 274e88028a..34992a361a 100644
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@@ -29,8 +29,10 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 - Xinference 仪表盘: [#93](https://github.com/xorbitsai/inference/issues/93)
 ### 新模型
 - 内置 [CodeLLama](https://github.com/facebookresearch/codellama): [#414](https://github.com/xorbitsai/inference/pull/414) [#402](https://github.com/xorbitsai/inference/pull/402)
-### 工具
-- LlamaIndex 插件: [#7151](https://github.com/jerryjliu/llama_index/pull/7151)
+### 集成
+- [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。
+- [Chatbox](https://chatboxai.app/): 一个支持前沿大语言模型的桌面客户端，支持 Windows，Mac，以及 Linux。
+
 
 
 
@@ -46,9 +48,7 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 
 🌐 **集群计算，分布协同**: 支持分布式部署，通过内置的资源调度器，让不同大小的模型按需调度到不同机器，充分使用集群资源。
 
-🔌 **开放生态，无缝对接**: 与流行的三方库无缝对接，包括 [LangChain](https://python.langchain.com/docs/integrations/providers/xinference) 
-and [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window)。
-让开发者能够快速构建基于 AI 的应用。
+🔌 **开放生态，无缝对接**: 与流行的三方库无缝对接，包括 [LangChain](https://python.langchain.com/docs/integrations/providers/xinference)，[LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window)，[Dify](https://docs.dify.ai/advanced/model-configuration/xinference)，以及 [Chatbox](https://chatboxai.app/)。
 
 ## 快速入门
 Xinference 可以通过 pip 从 PyPI 安装。我们非常推荐在安装前创建一个新的虚拟环境以避免依赖冲突。

From 5f9d40e80aed00786a81114b7ddd4ea0d9b633fa Mon Sep 17 00:00:00 2001
From: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Date: Fri, 15 Sep 2023 16:33:15 +0800
Subject: [PATCH 2/3] DOC: update hot topics

---
 README.md                                  |  1 -
 README_zh_CN.md                            |  2 +-
 doc/source/index.rst                       | 18 ++++++++++--------
 doc/source/models/builtin/llama-2-chat.rst |  6 +++---
 doc/source/models/builtin/llama-2.rst      |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/README.md b/README.md
index 13ac38e691..4f9e1e696c 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,6 @@ potential of cutting-edge AI models.
 ## 🔥 Hot Topics
 ### Framework Enhancements
 - Embedding model support: [#418](https://github.com/xorbitsai/inference/pull/418)
-- Custom model support: [#325](https://github.com/xorbitsai/inference/pull/325)
 - LoRA support: [#271](https://github.com/xorbitsai/inference/issues/271)
 - Multi-GPU support for PyTorch models: [#226](https://github.com/xorbitsai/inference/issues/226)
 - Xinference dashboard: [#93](https://github.com/xorbitsai/inference/issues/93)
diff --git a/README_zh_CN.md b/README_zh_CN.md
index 34992a361a..6fb1364018 100644
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@@ -23,7 +23,7 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 
 ## 🔥 近期热点
 ### 框架增强
-- 自定义模型: [#325](https://github.com/xorbitsai/inference/pull/325)
+- Embedding 模型支持: [#418](https://github.com/xorbitsai/inference/pull/418)
 - LoRA 支持: [#271](https://github.com/xorbitsai/inference/issues/271)
 - PyTorch 模型多 GPU 支持: [#226](https://github.com/xorbitsai/inference/issues/226)
 - Xinference 仪表盘: [#93](https://github.com/xorbitsai/inference/issues/93)
diff --git a/doc/source/index.rst b/doc/source/index.rst
index f18cdf5eec..d1abd475ec 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -33,7 +33,9 @@ allowing the seamless distribution of model inference across multiple devices or
 
 🔌 **Built-in Integration with Third-Party Libraries**: Xorbits Inference seamlessly integrates
 with popular third-party libraries like `LangChain <https://python.langchain.com/docs/integrations/providers/xinference>`_
-and `LlamaIndex <https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window>`_.
+, `LlamaIndex <https://gpt-index.readthedocs.io/en/stable/examples/llm/XinferenceLocalDeployment.html#i-run-pip-install-xinference-all-in-a-terminal-window>`_
+, `Dify <https://docs.dify.ai/advanced/model-configuration/xinference>`_
+, and `Chatbox <https://chatboxai.app/>`_.
 
 
 🔥 Hot Topics
@@ -41,20 +43,20 @@ and `LlamaIndex <https://gpt-index.readthedocs.io/en/stable/examples/llm/Xinfere
 
 Framework Enhancements
 ~~~~~~~~~~~~~~~~~~~~~~
-- Custom model support: `#325 <https://github.com/xorbitsai/inference/pull/325>`_
+- Embedding model support: `#418 <https://github.com/xorbitsai/inference/pull/418>`_
 - LoRA support: `#271 <https://github.com/xorbitsai/inference/issues/271>`_
 - Multi-GPU support for PyTorch models: `#226 <https://github.com/xorbitsai/inference/issues/226>`_
 - Xinference dashboard: `#93 <https://github.com/xorbitsai/inference/issues/93>`_
 
 New Models
 ~~~~~~~~~~
-- Built-in support for `Starcoder` in GGML: `#289 <https://github.com/xorbitsai/inference/pull/289>`_
-- Built-in support for `MusicGen <https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md>`_: `#313 <https://github.com/xorbitsai/inference/issues/313>`_
-- Built-in support for `SD-XL <https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0>`_: `318 <https://github.com/xorbitsai/inference/issues/318>`_
+- Built-in support for `CodeLLama <https://github.com/facebookresearch/codellama>`_: `#414 <https://github.com/xorbitsai/inference/pull/414>`_ `#402 <https://github.com/xorbitsai/inference/pull/402>`_
 
-Tools
-~~~~~
-- LlamaIndex plugin: `7151 <https://github.com/jerryjliu/llama_index/pull/7151>`_
+
+Integrations
+~~~~~~~~~~~~
+- `Dify <https://docs.dify.ai/advanced/model-configuration/xinference>`_: an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
+- `Chatbox <https://chatboxai.app/>`_: a desktop client for multiple cutting-edge LLM models, available on Windows, Mac and Linux.
 
 
 License
diff --git a/doc/source/models/builtin/llama-2-chat.rst b/doc/source/models/builtin/llama-2-chat.rst
index 2eca8d85ef..ded31d4a41 100644
--- a/doc/source/models/builtin/llama-2-chat.rst
+++ b/doc/source/models/builtin/llama-2-chat.rst
@@ -66,7 +66,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 
 Model Spec 5 (pytorch, 13 Billion)
@@ -84,7 +84,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 Model Spec 6 (pytorch, 70 Billion)
 ++++++++++++++++++++++++++++++++++
@@ -101,4 +101,4 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
\ No newline at end of file
+   4-bit quantization is not supported on macOS.
\ No newline at end of file
diff --git a/doc/source/models/builtin/llama-2.rst b/doc/source/models/builtin/llama-2.rst
index 19614bbba4..a42090890a 100644
--- a/doc/source/models/builtin/llama-2.rst
+++ b/doc/source/models/builtin/llama-2.rst
@@ -65,7 +65,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 Model Spec 5 (pytorch, 13 Billion)
 ++++++++++++++++++++++++++++++++++
@@ -82,7 +82,7 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.
 
 Model Spec 6 (pytorch, 70 Billion)
 ++++++++++++++++++++++++++++++++++
@@ -99,4 +99,4 @@ chosen quantization method from the options listed above::
 
 .. note::
 
-4-bit quantization is not supported on macOS.
+   4-bit quantization is not supported on macOS.

From e2d4529f22e902e208aa56dfc8e3cf5cbd686381 Mon Sep 17 00:00:00 2001
From: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Date: Fri, 15 Sep 2023 16:36:39 +0800
Subject: [PATCH 3/3] Add vLLM

---
 README.md            | 1 +
 README_zh_CN.md      | 2 +-
 doc/source/index.rst | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 4f9e1e696c..47dbd5ac3f 100644
--- a/README.md
+++ b/README.md
@@ -26,6 +26,7 @@ potential of cutting-edge AI models.
 
 ## 🔥 Hot Topics
 ### Framework Enhancements
+- Incorporate vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
 - Embedding model support: [#418](https://github.com/xorbitsai/inference/pull/418)
 - LoRA support: [#271](https://github.com/xorbitsai/inference/issues/271)
 - Multi-GPU support for PyTorch models: [#226](https://github.com/xorbitsai/inference/issues/226)
diff --git a/README_zh_CN.md b/README_zh_CN.md
index 6fb1364018..20f09c8007 100644
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@@ -23,6 +23,7 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 
 ## 🔥 近期热点
 ### 框架增强
+- 引入 vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
 - Embedding 模型支持: [#418](https://github.com/xorbitsai/inference/pull/418)
 - LoRA 支持: [#271](https://github.com/xorbitsai/inference/issues/271)
 - PyTorch 模型多 GPU 支持: [#226](https://github.com/xorbitsai/inference/issues/226)
@@ -41,7 +42,6 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 
 ⚡️ **前沿模型，应有尽有**：框架内置众多中英文的前沿大语言模型，包括 baichuan，chatglm2 等，一键即可体验！内置模型列表还在快速更新中！
 
-
 🖥 **异构硬件，快如闪电**：通过 [ggml](https://github.com/ggerganov/ggml)，同时使用你的 GPU 与 CPU 进行推理，降低延迟，提高吞吐！
 
 ⚙️ **接口调用，灵活多样**：提供多种使用模型的接口，包括 RPC，RESTful API，命令行，web UI 等等。方便模型的管理与监控。
diff --git a/doc/source/index.rst b/doc/source/index.rst
index d1abd475ec..56177bae04 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -43,6 +43,7 @@ with popular third-party libraries like `LangChain <https://python.langchain.com
 
 Framework Enhancements
 ~~~~~~~~~~~~~~~~~~~~~~
+- Incorporate vLLM: `#445 <https://github.com/xorbitsai/inference/pull/445>`_
 - Embedding model support: `#418 <https://github.com/xorbitsai/inference/pull/418>`_
 - LoRA support: `#271 <https://github.com/xorbitsai/inference/issues/271>`_
 - Multi-GPU support for PyTorch models: `#226 <https://github.com/xorbitsai/inference/issues/226>`_