You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/README.md
+61-12Lines changed: 61 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,9 +35,8 @@ python run_clm_no_trainer.py \
35
35
--woq_group_size 128 \
36
36
--gptq_max_seq_length 2048 \
37
37
--gptq_use_max_length \
38
-
--accuracy \
39
-
--tasks "lambada_openai" \
40
-
--double_quant_type "BNB_NF4"
38
+
--double_quant_type "BNB_NF4" \
39
+
--output_dir saved_results
41
40
42
41
# "--woq_algo RTN" is used to enable RTN algorithms
43
42
python run_clm_no_trainer.py \
@@ -48,9 +47,38 @@ python run_clm_no_trainer.py \
48
47
--woq_bits 4 \
49
48
--woq_scheme asym \
50
49
--woq_group_size 128 \
50
+
--double_quant_type "BNB_NF4"
51
+
--output_dir saved_results
52
+
53
+
# "--woq_algo AWQ" is used to enable AWQ algorithms
54
+
python run_clm_no_trainer.py \
55
+
--model EleutherAI/gpt-j-6B \
56
+
--dataset NeelNanda/pile-10k \
57
+
--quantize \
58
+
--woq_algo AWQ \
59
+
--woq_bits 4 \
60
+
--woq_scheme asym \
61
+
--woq_group_size 128 \
62
+
--calib_iters 128
63
+
64
+
# "--woq_algo AutoRound" is used to enable AutoRound algorithms
65
+
python run_clm_no_trainer.py \
66
+
--model EleutherAI/gpt-j-6B \
67
+
--dataset NeelNanda/pile-10k \
68
+
--quantize \
69
+
--woq_algo AutoRound \
70
+
--woq_bits 4 \
71
+
--woq_scheme asym \
72
+
--woq_group_size 128
73
+
74
+
# "--accuracy" for eval
75
+
python run_clm_no_trainer.py \
76
+
--model EleutherAI/gpt-j-6B \
77
+
--dataset NeelNanda/pile-10k \
78
+
--int8 \
51
79
--accuracy \
52
80
--tasks "lambada_openai" \
53
-
--double_quant_type "BNB_NF4"
81
+
--output_dir saved_results
54
82
```
55
83
**Notes**: Weight-only quantization based on fake quantization is previewly supported and supports RTN, GPTQ[1], AWQ[2], TEQ algorithms. For more details, please refer to [link](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md). Our GPTQ API support various CLMs including GPTJ, OPTs, Blooms, Llamas, Falcons, MPTs, ChatGLMs, etc. Simply replace the "--model" argument with other models to quantize different CLMs with GPTQ.
56
84
@@ -72,8 +100,6 @@ python run_clm_no_trainer.py \
72
100
--woq_group_size 128 \
73
101
--gptq_max_seq_length 2048 \
74
102
--gptq_use_max_length \
75
-
--accuracy \
76
-
--tasks "lambada_openai" \
77
103
--double_quant_type "BNB_NF4"
78
104
79
105
# "--woq_algo RTN" is used to enable RTN algorithms
Copy file name to clipboardExpand all lines: examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/run_benchmark.sh
+30-29Lines changed: 30 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -70,58 +70,59 @@ function run_benchmark {
70
70
fi
71
71
echo$extra_cmd
72
72
73
-
if [ "${topology}"="opt_125m_woq_gptq_int4" ];then
73
+
if [ "${topology}"="opt_125m_woq_gptq_int4" ];then
0 commit comments