unofficial implementation of Generative Low-bitwidth Data Free Quantization
It is for personal study, any advice is welcome.
zeroQ: https://arxiv.org/pdf/2001.00281.pdf
The Origin Paper : https://arxiv.org/pdf/2003.03603.pdf
It seems that the Generator can generate data with classification boundary, and zeroQ will prefer to generate data that will not pay attention to the data distribution.
But I can not reproduce the beautiful generated data in the paper. 😅😅😅
model | QuanType | W/A bit | top1 | top5 |
---|---|---|---|---|
resnet18 | ||||
fp | 69.758 | 89.078 | ||
zeroQ | 8/8 | 69.230 | 88.840 | |
4/8 | 57.582 | 81.182 | ||
8/4 | 1.130 | 3.056 | ||
4/4 | 0.708 | 2.396 | ||
GDFQ | 8/8 | |||
4/8 | ||||
8/4 | ||||
4/4 |
I also try to clone the [origin zeroQ repository](https://github.com/amirgholami/ZeroQ/blob/ba37f793dbcb9f966b58f6b8d1e9de3c34a11b8c/classification/utils/quantize_model.py#L36) and just set the all weight_bit to 4, the acc is about 10.
And get about 24.16% by using pytorchcv. But 2.16 by using torchvision's model.
- The floating model using torchvision, so the architecture must fit the torchvisoin model name.
You may reference https://pytorch.org/docs/stable/torchvision/models.html - Batch size set the default batch size as 32
Default is 4 bit
python train.py [imagenet path]
optional arguments:
-a , --arch model architecture
-m , --method zeroQ, GDFQ
--n_epochs GDFQ's trainig epochs
--n_iter training iteration per trainig epochs
--batch_size batch size
--q_lr learning rate of GDFQ's quantization model
--g_lr learning rate of GDFQ's generator model
-qa quantization activation bit
-qw quantization weight bit
-qb quantization bias bit
Ex: Training with resnet
python train.py -a resnet18
Ex: Training with vgg16_bn with 8 bit activation, 8bit weight, 8 bit bias
python train.py -a vgg16_bn -qa 8 -qw 8 -qb 8
- Question about the fixed batch norm of the Qmodel. (will not affect the training? or it needs to quantize the batch norm first?)
- The toy experiment can not generate the beautiful output, maybe something wrong. (Any advice or PR is welcome)
- The acc is wired when using 4 bit in zeroQ when using difference model source.
- add zeroQ traning.
- Check the effect of the BNS and KL.
The performace did reach the number in the paper.
So it may have some bug for now.
All the results are base on fake quantization, not the true low bit inference.