Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

Open
vvchernov opened this issue Nov 9, 2023 · 5 comments
Assignees

Comments

@vvchernov
Copy link

vvchernov commented Nov 9, 2023

Develop python script in repo which should do and test the following (see also below in the thread):

  1. Base scenario:
  • Create two matrices (X, W) filled by floating point values with predefined distribution.
    • square matrices with size 1024*1024
    • value type is float16
    • develop flexible solution with possibility to change size and data type
  • Multiply the matrices (X * W = Y) and save result (Y, original one)
  • Preprocess matrices (X -> X', W -> W')
    • develop flexible solution with possibility to change preprocessing type
  • Multiply the preprocessed matrices (X' * W' = Y') and save result (Y', preprocessed one)
  • Fake-quantize the preprocessed matrices (Q(W') or/and Q(X'))
    • at least one matrix should be quantize, but it can be only one
    • develop flexible solution with possibility to change quantization type
  • Multiply the quantized matrices and save result (Yq, quantized one)
  • Postprocess the quantized result (Yq -> Yp)
  • Find differences between pre/postprocessed, quantized and original results. Use metrics for the obtained matrices as final evaluation value
  1. Data distribution
  • in general it assumes combination of two normal distributions (context and outliers) for both X and W matrices
  • The following parameters are used to control distribution:
    • context average value
    • context dispersion
    • number of context values
    • distance between context and outliers or outliers average value
    • outliers dispersion
    • number of outliers values
  • Start simplification: W has context only. outliers dispersion is much less than context one (e.g. Do = 0.1 * Dc). Outliers number is much less matrix size (e.g. No ~ 0.1 * sqrt(Nc)). Context average value = 0.
  1. Preprocessing
  • Smoothing from SmoothQuant
  • AWQ algorithm
  1. Quantization
  • symmetric per-tensor int8
  • symmetric per-channel int8
  • symmetric per-group int8
    • 32
    • 64
    • 128
  • asymmetric per-tensor int8
  • asymmetric per-channel int8
  • asymmetric per-group int8
    • 32
    • 64
    • 128
  • GPTQ-like
    • int8
    • int4
    • int3
  1. Postprocessing
  • First step: no postprocessing
  • compensate error by bias
  1. Metrics
  • First step: Use Frobenius norm (LF) for error evaluation (see here or Russian version)
  • Study different matrix norm and analyze do they give us correct metrics.
  1. Statistics scenario
  • Use base scenario for set of matrices (e.g. 100) with the same distributions and collect error statistics (mean and std)
  1. Calibration scenario
  • Use set of matrices (e.g. 100) with the same distributions for parameters calibration and evaluate error statistics on other set of matrices (e.g. 100) with the same distributions
  1. Additional features:
  • use matrix from dump
@vvchernov
Copy link
Author

vvchernov commented Nov 9, 2023

Notes:

  • I recommended use torch framework for matrix manipulation due to it supports float16, but numpy does not
  • It is better to unify calculation for distribution parameters. We can go to dimensionless calculations, as unit the context dispersion can be used. But for the sake of convenience I suggest to use the context dispersion = 10. It can be checked once for other values that results are the same.

@vvchernov
Copy link
Author

vvchernov commented Nov 9, 2023

Tests:

  • Dependence of the error on distribution parameters
  • Dependence of the error on per-tensor SmoothQuant alpha
  • Dependence of the error on per-channel SmoothQuant alpha
  • Dependence of the error on AWQ alpha
  • Dependence of the error on error bias
  • slightly change scales to exactly match int and outliers average value

@Krosha31
Copy link
Collaborator

Krosha31 commented Nov 27, 2023

@Krosha31
Copy link
Collaborator

Krosha31 commented Jan 9, 2024

Залил текущие результаты. Завтра постараюсь еще что-нибудь сделать

@Krosha31
Copy link
Collaborator

Залил графики на диск
https://drive.google.com/drive/folders/1KMbUUymlp6QZb1f_ThX0V7K49K4AF00Y?usp=drive_link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants