Skip to content

Commit

Permalink
add qwen int4 model, refine example (#217)
Browse files Browse the repository at this point in the history
* add qwen int4 model, refine example

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* update args

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fixtypos

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
  • Loading branch information
WeiweiZhang1 authored Aug 9, 2024
1 parent 04678e0 commit fed34b7
Showing 2 changed files with 15 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -22,6 +22,7 @@ image presents an overview of AutoRound. Check out our updated paper on [arxiv]
<div align="left">

## What's New
* [2024/08] Enabled the export and inference of the quantized model to the AutoRound format on HPU devices, please refer to [Intel/Qwen2-7B-int4-inc](https://huggingface.co/Intel/Qwen2-7B-int4-inc) and [Intel/Qwen2-57B-A14B-Instruct-int4-inc](https://huggingface.co/Intel/Qwen2-57B-A14B-Instruct-int4-inc).
* [2024/07] Important change: the default value of nsamples has been changed from 512 to 128 to reduce the memory usages, which may cause a slight accuracy drop in some scenarios
* [2024/06] AutoRound format supports mixed bit-widths and group sizes for inference, resolving the significant performance drop issue with the asymmetric kernel
* [2024/05] AutoRound supports lm-head quantization, saving 0.7G for LLaMA3-8B at W4G128.
@@ -168,6 +169,8 @@ print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))

| Model | Supported |
|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Qwen/Qwen2-7B | [HF-int4-model](https://huggingface.co/Intel/Qwen2-7B-int4-inc)
| Qwen/Qwen2-57B-A14B-Instruct | [HF-int4-model](https://huggingface.co/Intel/Qwen2-57B-A14B-Instruct-int4-inc)
| Intel/neural-chat-7b-v3-3 | [HF-int4-model](https://huggingface.co/Intel/neural-chat-7b-v3-3-int4-inc), [accuracy](./docs/neural-chat-7b-v3-3-acc.md), [recipe](./examples/language-modeling/scripts/neural-chat-7b-v3-3.sh), [example](./examples/language-modeling/) |
| Intel/neural-chat-7b-v3-1 | [HF-int4-model](https://huggingface.co/Intel/neural-chat-7b-v3-1-int4-inc), [accuracy](./docs/neural-chat-7b-v3-1-acc.md), [recipe](./examples/language-modeling/scripts/neural-chat-7b-v3-1.sh), [example](./examples/language-modeling/) |
| mistralai/Mistral-7B-v0.1 | [HF-int4-model-lmhead](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc-lmhead),[HF-int4-model](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc), [accuracy](./docs/Mistral-7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mistral-7B-v0.1.sh), [example](./examples/language-modeling/) |
12 changes: 12 additions & 0 deletions examples/language-modeling/main.py
Original file line number Diff line number Diff line change
@@ -131,6 +131,9 @@

parser.add_argument("--act_bits", default=32, type=int,
help="activation bits")

parser.add_argument("--fp_layers_list", default="", type=str,
help="List of Layers to maintain original data type")

args = parser.parse_args()

@@ -269,6 +272,15 @@
layer_config[n] = {"bits": 32}
print(
f"{n} will not be quantized due to its shape not being divisible by 32, resulting in an exporting issue to autogptq")
fp_layers_list = args.fp_layers_list.split(",")
if bool(fp_layers_list):
for n, m in model.named_modules():
if isinstance(m, torch.nn.Linear) or isinstance(m, transformers.modeling_utils.Conv1D):
name = n.split('.')[-1]
if n in fp_layers_list or name in fp_layers_list:
layer_config[n] = {"bits": 32}
print(
f"{n} will not be quantized.")
lm_head_layer_name = "lm_head"
for n, _ in model.named_modules():
lm_head_layer_name = n

0 comments on commit fed34b7

Please sign in to comment.