Skip to content

Commit

Permalink
FEAT: Support Phi-2 (#828)
Browse files Browse the repository at this point in the history
Co-authored-by: aresnow <aresnow1@gmail.com>
  • Loading branch information
Bojun-Feng and aresnow1 authored Dec 29, 2023
1 parent f06eaa8 commit 3907fc6
Show file tree
Hide file tree
Showing 8 changed files with 96 additions and 2 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ potential of cutting-edge AI models.
- Speculative decoding: [#509](https://github.com/xorbitsai/inference/pull/509)
- Incorporate vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
### New Models
- Built-in support for [phi-2](https://huggingface.co/microsoft/phi-2): [#828](https://github.com/xorbitsai/inference/pull/828)
- Built-in support for [mistral-instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2): [#796](https://github.com/xorbitsai/inference/pull/796)
- Built-in support for [deepseek-llm](https://huggingface.co/deepseek-ai) and [deepseek-coder](https://huggingface.co/deepseek-ai): [#786](https://github.com/xorbitsai/inference/pull/786)
- Built-in support for [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1): [#782](https://github.com/xorbitsai/inference/pull/782)
Expand Down
1 change: 1 addition & 0 deletions README_zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Xorbits Inference(Xinference)是一个性能强大且功能全面的分布
- 投机采样: [#509](https://github.com/xorbitsai/inference/pull/509)
- 引入 vLLM: [#445](https://github.com/xorbitsai/inference/pull/445)
### 新模型
- 内置 [phi-2](https://huggingface.co/microsoft/phi-2): [#828](https://github.com/xorbitsai/inference/pull/828)
- 内置 [mistral-instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2): [#796](https://github.com/xorbitsai/inference/pull/796)
- 内置 [deepseek-llm](https://huggingface.co/deepseek-ai)[deepseek-coder](https://huggingface.co/deepseek-ai): [#786](https://github.com/xorbitsai/inference/pull/786)
- 内置 [Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1): [#782](https://github.com/xorbitsai/inference/pull/782)
Expand Down
2 changes: 2 additions & 0 deletions doc/source/models/builtin/llm/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ The following is a list of built-in LLM in Xinference:

orca

phi-2

qwen-chat

skywork
Expand Down
43 changes: 43 additions & 0 deletions doc/source/models/builtin/llm/phi-2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
.. _models_llm_phi-2:

========================================
phi-2
========================================

- **Context Length:** 2048
- **Model Name:** phi-2
- **Languages:** en
- **Abilities:** generate
- **Description:** Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites.

Specifications
^^^^^^^^^^^^^^


Model Spec 1 (ggufv2, 2 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** ggufv2
- **Model Size (in billions):** 2
- **Quantizations:** Q2_K, Q3_K_S, Q3_K_M, Q3_K_L, Q4_0, Q4_K_S, Q4_K_M, Q5_0, Q5_K_S, Q5_K_M, Q6_K, Q8_0
- **Model ID:** TheBloke/phi-2-GGUF

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name phi-2 --size-in-billions 2 --model-format ggufv2 --quantization ${quantization}


Model Spec 2 (pytorch, 2 Billion)
++++++++++++++++++++++++++++++++++++++++

- **Model Format:** pytorch
- **Model Size (in billions):** 2
- **Quantizations:** 4-bit, 8-bit, none
- **Model ID:** microsoft/phi-2

Execute the following command to launch the model, remember to replace ``${quantization}`` with your
chosen quantization method from the options listed above::

xinference launch --model-name phi-2 --size-in-billions 2 --model-format pytorch --quantization ${quantization}

1 change: 1 addition & 0 deletions doc/source/models/builtin/llm/skywork-math.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ Execute the following command to launch the model, remember to replace ``${quant
chosen quantization method from the options listed above::

xinference launch --model-name Skywork-Math --size-in-billions 13 --model-format pytorch --quantization ${quantization}

1 change: 1 addition & 0 deletions doc/source/models/builtin/llm/skywork.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ Execute the following command to launch the model, remember to replace ``${quant
chosen quantization method from the options listed above::

xinference launch --model-name Skywork --size-in-billions 13 --model-format pytorch --quantization ${quantization}

4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ dev =
all =
chatglm-cpp>=0.3.0
ctransformers
llama-cpp-python>=0.2.23
llama-cpp-python>=0.2.25
transformers>=4.34.1
torch
accelerate>=0.20.3
Expand All @@ -91,7 +91,7 @@ all =
auto-gptq ; sys_platform!='darwin'
optimum
ggml =
llama-cpp-python>=0.2.23
llama-cpp-python>=0.2.25
ctransformers
chatglm-cpp>=0.3.0
transformers =
Expand Down
45 changes: 45 additions & 0 deletions xinference/model/llm/llm_family.json
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,51 @@
"intra_message_sep": "\n\n### "
}
},
{
"version": 1,
"context_length": 2048,
"model_name": "phi-2",
"model_lang": [
"en"
],
"model_ability": [
"generate"
],
"model_description": "Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites.",
"model_specs": [
{
"model_format": "ggufv2",
"model_size_in_billions": 2,
"quantizations": [
"Q2_K",
"Q3_K_S",
"Q3_K_M",
"Q3_K_L",
"Q4_0",
"Q4_K_S",
"Q4_K_M",
"Q5_0",
"Q5_K_S",
"Q5_K_M",
"Q6_K",
"Q8_0"
],
"model_id": "TheBloke/phi-2-GGUF",
"model_file_name_template": "phi-2.{quantization}.gguf"
},
{
"model_format": "pytorch",
"model_size_in_billions": 2,
"quantizations": [
"4-bit",
"8-bit",
"none"
],
"model_id": "microsoft/phi-2",
"model_revision": "d3186761bf5c4409f7679359284066c25ab668ee"
}
]
},
{
"version": 1,
"context_length": 2048,
Expand Down

0 comments on commit 3907fc6

Please sign in to comment.