Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

vvchernov · 2023-11-09T09:36:48Z

Develop python script in repo which should do and test the following (see also below in the thread):

Base scenario:

Create two matrices (X, W) filled by floating point values with predefined distribution.
- square matrices with size 1024*1024
- value type is float16
- develop flexible solution with possibility to change size and data type
Multiply the matrices (X * W = Y) and save result (Y, original one)
Preprocess matrices (X -> X', W -> W')
- develop flexible solution with possibility to change preprocessing type
Multiply the preprocessed matrices (X' * W' = Y') and save result (Y', preprocessed one)
Fake-quantize the preprocessed matrices (Q(W') or/and Q(X'))
- at least one matrix should be quantize, but it can be only one
- develop flexible solution with possibility to change quantization type
Multiply the quantized matrices and save result (Y_q, quantized one)
Postprocess the quantized result (Y_q -> Y_p)
Find differences between pre/postprocessed, quantized and original results. Use metrics for the obtained matrices as final evaluation value

Data distribution

in general it assumes combination of two normal distributions (context and outliers) for both X and W matrices
The following parameters are used to control distribution:
- context average value
- context dispersion
- number of context values
- distance between context and outliers or outliers average value
- outliers dispersion
- number of outliers values
Start simplification: W has context only. outliers dispersion is much less than context one (e.g. D_o = 0.1 * D_c). Outliers number is much less matrix size (e.g. N_o ~ 0.1 * sqrt(N_c)). Context average value = 0.

Preprocessing

Smoothing from SmoothQuant
AWQ algorithm

Quantization

symmetric per-tensor int8
symmetric per-channel int8
symmetric per-group int8
- 32
- 64
- 128
asymmetric per-tensor int8
asymmetric per-channel int8
asymmetric per-group int8
- 32
- 64
- 128
GPTQ-like
- int8
- int4
- int3

Postprocessing

First step: no postprocessing
compensate error by bias

Metrics

First step: Use Frobenius norm (L_F) for error evaluation (see here or Russian version)
Study different matrix norm and analyze do they give us correct metrics.

Statistics scenario

Use base scenario for set of matrices (e.g. 100) with the same distributions and collect error statistics (mean and std)

Calibration scenario

Use set of matrices (e.g. 100) with the same distributions for parameters calibration and evaluate error statistics on other set of matrices (e.g. 100) with the same distributions

Additional features:

use matrix from dump

vvchernov · 2023-11-09T11:37:01Z

Notes:

I recommended use torch framework for matrix manipulation due to it supports float16, but numpy does not
It is better to unify calculation for distribution parameters. We can go to dimensionless calculations, as unit the context dispersion can be used. But for the sake of convenience I suggest to use the context dispersion = 10. It can be checked once for other values that results are the same.

vvchernov · 2023-11-09T11:37:13Z

Tests:

Dependence of the error on distribution parameters
Dependence of the error on per-tensor SmoothQuant alpha
Dependence of the error on per-channel SmoothQuant alpha
Dependence of the error on AWQ alpha
Dependence of the error on error bias
slightly change scales to exactly match int and outliers average value

Krosha31 · 2023-11-27T13:05:47Z

https://github.com/Deelvin/tvm-samples/pull/15

added base scenario

Krosha31 · 2024-01-09T23:11:47Z

Залил текущие результаты. Завтра постараюсь еще что-нибудь сделать

Krosha31 · 2024-02-12T16:32:28Z

Залил графики на диск
https://drive.google.com/drive/folders/1KMbUUymlp6QZb1f_ThX0V7K49K4AF00Y?usp=drive_link

vvchernov assigned Krosha31 Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

vvchernov commented Nov 9, 2023 •

edited

Loading

vvchernov commented Nov 9, 2023 •

edited

Loading

vvchernov commented Nov 9, 2023 •

edited

Loading

Krosha31 commented Nov 27, 2023 •

edited

Loading

Krosha31 commented Jan 9, 2024

Krosha31 commented Feb 12, 2024

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

Comments

vvchernov commented Nov 9, 2023 • edited Loading

vvchernov commented Nov 9, 2023 • edited Loading

vvchernov commented Nov 9, 2023 • edited Loading

Krosha31 commented Nov 27, 2023 • edited Loading

Krosha31 commented Jan 9, 2024

Krosha31 commented Feb 12, 2024

vvchernov commented Nov 9, 2023 •

edited

Loading

vvchernov commented Nov 9, 2023 •

edited

Loading

vvchernov commented Nov 9, 2023 •

edited

Loading

Krosha31 commented Nov 27, 2023 •

edited

Loading