Add AutoGPTQ's cpu kernel. #245

qwopqwop200 · 2023-08-10T13:51:51Z

Currently only supports 2,4 bits, desc_act does not.
Enabling use_qigen overrides all settings related to cuda.

import os

from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import numpy as np
import torch
import torch.nn as nn

pretrained_model_dir = "facebook/opt-125m"
quantized_model_dir = "opt-125m-4bit-128g"

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, use_qigen=True)
print(tokenizer.decode(model.generate(**tokenizer("test is", return_tensors="pt"),max_new_tokens=5)[0]))

opt-125m	Bits	group-size	Wikitext2
cuda	4	128	29.8402
QIgen	4	128	29.8416

PanQiWei

Hi, I just leave some reviews to make sure we can have a good shape for this pr, so that it can be easily extend and mantained in the future.

auto_gptq/modeling/_base.py

auto_gptq/nn_modules/qlinear/qlinear_qigen.py

PanQiWei · 2023-08-11T03:54:00Z

auto_gptq/modeling/_base.py

- use_cuda_fp16=use_cuda_fp16,
- desc_act=quantize_config.desc_act,
- trainable=trainable
+ checkpoint=checkpoint,


I would suggest to make_quant_cpu first and then load checkpoint. And of course this requires a futher modification in make_quant_cpu(), will describe it on there.

PanQiWei · 2023-08-11T03:57:54Z

auto_gptq/modeling/_utils.py

+ qweights=checkpoint[name1 + '.qweight'].contiguous(), 
+ zeros=checkpoint[name1 + '.qzeros'],
+ scales=checkpoint[name1 + '.scales'].float(),
+ bias = checkpoint[name1 + '.bias'].float() if name1 + '.bias' in checkpoint else None)


TBH, I'm not fond of the way it make quant here, basically it load checkpoint directly when init QuantLinear, which breaks the API format to be different with other QuantLinear, and may cause difficulty of maintainance in the future.

My suggestions is only replace module in make_quant functions and load state dicts after this function is called in from_quantized()

PanQiWei · 2023-08-11T04:00:14Z

auto_gptq/nn_modules/qlinear/qlinear_qigen.py

+class QuantLinear(nn.Module):
+ QUANT_TYPE = "qigen"
+
+ def __init__(self, bits=4, group_size=-1, N=0, M=0, qweights=None, zeros=None, scales=None, bias=None, hint=1, p=8, l1=2**18):


At least those five arguments' name and order should be same with other QuantLinear: bits, group_size, infeatures, outfeatures, bias

qwopqwop200 · 2023-08-17T06:24:53Z

My code is complete. Please review again.

fxmarty · 2023-08-18T11:47:19Z

@qwopqwop200 Did not look at the code but given that CPUs usually do not support fp16 compute, is the compute rather done in fp32? With dequantization e.g. int4 -> fp32?

qwopqwop200 · 2023-08-19T04:08:05Z

Probably so. But I didn't design this kernel, so I'm not sure how it works.

fxmarty · 2023-08-21T08:05:40Z

Where does it come from?

qwopqwop200 · 2023-08-21T08:11:53Z

Where does it come from?

https://github.com/IST-DASLab/QIGen/tree/master

auto_gptq/modeling/_utils.py

qwopqwop200 · 2023-08-25T09:26:09Z

oh sorry I merged them by mistake. I've checked that it's working now, so there shouldn't be any problems.

qwopqwop200 added 4 commits August 10, 2023 10:06

install qigen and move file

1b3723a

support cpu

7ba78af

Merge branch 'PanQiWei:main' into main

870be83

use_cpu name change and default dtype change

aa5528c

qwopqwop200 marked this pull request as draft August 11, 2023 04:32

PanQiWei reviewed Aug 11, 2023

View reviewed changes

qwopqwop200 added 8 commits August 11, 2023 15:04

chang name make_quant_cpu to make_quant_qigen

2c1afc2

change name make_quant_cpu to make_quant_qigen

c591d6a

remove many contiguous and change arguments name

a807e03

change arguments name

051f3fa

name change

084c9d8

qigen formatting qlinear

5d5b687

qigen refactoring

b8a4291

Merge branch 'PanQiWei:main' into main

f23a06f

qwopqwop200 marked this pull request as ready for review August 17, 2023 06:23

PanQiWei reviewed Aug 24, 2023

View reviewed changes

auto_gptq/modeling/_utils.py Outdated Show resolved Hide resolved

duplicate code remove

dafdd61

qwopqwop200 merged commit 6a9d80e into AutoGPTQ:main Aug 25, 2023

This was referenced Aug 28, 2023

[BUG] Can't build from source on Windows platform #290

Closed

[BUG]Unable to build from source on WIN10 using Cuda toolkit 12.1 #298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AutoGPTQ's cpu kernel. #245

Add AutoGPTQ's cpu kernel. #245

qwopqwop200 commented Aug 10, 2023 •

edited

Loading

PanQiWei left a comment

PanQiWei Aug 11, 2023

PanQiWei Aug 11, 2023

PanQiWei Aug 11, 2023

qwopqwop200 commented Aug 17, 2023

fxmarty commented Aug 18, 2023

qwopqwop200 commented Aug 19, 2023

fxmarty commented Aug 21, 2023

qwopqwop200 commented Aug 21, 2023

qwopqwop200 commented Aug 25, 2023

Add AutoGPTQ's cpu kernel. #245

Add AutoGPTQ's cpu kernel. #245

Conversation

qwopqwop200 commented Aug 10, 2023 • edited Loading

PanQiWei left a comment

Choose a reason for hiding this comment

PanQiWei Aug 11, 2023

Choose a reason for hiding this comment

PanQiWei Aug 11, 2023

Choose a reason for hiding this comment

PanQiWei Aug 11, 2023

Choose a reason for hiding this comment

qwopqwop200 commented Aug 17, 2023

fxmarty commented Aug 18, 2023

qwopqwop200 commented Aug 19, 2023

fxmarty commented Aug 21, 2023

qwopqwop200 commented Aug 21, 2023

qwopqwop200 commented Aug 25, 2023

qwopqwop200 commented Aug 10, 2023 •

edited

Loading