[BUG/Help] Windows 下 INT-4 量化模型无法加载 #162

OedoSoldier · 2023-03-19T20:09:27Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Compiling kernels : C:\Users\{}\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.c
Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\{}\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.c -shared -o C:\Users\{}\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.so
Kernels compiled : C:\Users\{}\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.so
Traceback (most recent call last):
  File "{}\main.py", line 33, in <module>
    model = AutoModel.from_pretrained(args.path, trust_remote_code=True).float()
  File "C:\Users\{}\anaconda3\envs\lora\lib\site-packages\transformers\models\auto\auto_factory.py", line 466, in from_pretrained
    return model_class.from_pretrained(
  File "C:\Users\{}\anaconda3\envs\lora\lib\site-packages\transformers\modeling_utils.py", line 2498, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\Users\{}/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 940, in __init__
    self.quantize(self.config.quantization_bit, self.config.quantization_embeddings, use_quantization_cache=True, empty_init=True)
  File "C:\Users\{}/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 1277, in quantize
    self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)
  File "C:\Users\{}/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization.py", line 399, in quantize
    load_cpu_kernel(**kwargs)
  File "C:\Users\{}/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization.py", line 386, in load_cpu_kernel
    cpu_kernels = CPUKernel(**kwargs)
  File "C:\Users\{}/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization.py", line 137, in __init__
    kernels = ctypes.CDLL(kernel_file, winmode=0)
  File "C:\Users\{}\anaconda3\envs\lora\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\{}\.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\quantization_kernels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.

Expected Behavior

No response

Steps To Reproduce

在 Windows 下加载 INT-4 量化模型，显示 CPU kernal 编译成功，但无法加载已编译的 kernal。经检查 quantization_kernels_parallel.so 成功编译，且使用 os.path.exists() 检测文件也返回 True。

在 WSL 下一切正常。

Environment

- OS: Windows 11
- Python: 3.10.6
- Transformers: 4.27.1
- PyTorch: 1.13.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

The text was updated successfully, but these errors were encountered:

kenneth104 · 2023-03-20T03:23:13Z

不知道这个对开发人员有没有帮助
这边想在CPU int4下加载，但提示没有cpu kernel

设置如下：
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()

启动如下：
venv\Scripts\activate && streamlit run web_demo2.py --server.port 6006

Explicitly passing a revisionis encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevisionis encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. No compiled kernel found. Compiling kernels : C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.c Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.c -shared -o C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.so Kernels compiled : C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.so Cannot load cpu kernel, don't use quantized model on cpu. Using quantization cache Applying quantization to glm layers

songxxzp · 2023-03-20T03:44:22Z

不知道这个对开发人员有没有帮助这边想在CPU int4下加载，但提示没有cpu kernel

设置如下： model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()

启动如下： venv\Scripts\activate && streamlit run web_demo2.py --server.port 6006

Explicitly passing a revisionis encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevisionis encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. No compiled kernel found. Compiling kernels : C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.c Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.c -shared -o C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.so Kernels compiled : C:\Users\username\.cache\huggingface\modules\transformers_modules\THUDM\chatglm-6b-int4\a93efa90f5b012b13a1197b9f47835b8ef1cc307\quantization_kernels_parallel.so Cannot load cpu kernel, don't use quantized model on cpu. Using quantization cache Applying quantization to glm layers

请确保已安装gcc和openmp。
可以自行编译kernel（先尝试quantization_kernels.c，quantization_kernels_parallel.c需要openmp）

gcc -fPIC -std=c99 quantization_kernels.c -shared -o quantization_kernels.so
gcc -pthread -fopenmp -std=c99 quantization_kernels_parallel.c -shared -o quantization_kernels_parallel.so

编译时加入-O3会极大的加速，但在某些平台上可能造成错误，请根据情况自行添加优化参数。
然后在原先模型加载后手动加载一下手动编译的kernel：

model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
model = model.quantize(bits=4, kernel_file="Your Kernel Path")

推测错误原因有可能是没有openmp无法加载并行的kernel，也有可能是路径过于复杂ctypes没有正确处理。
另外请检查杀毒软件。

kenneth104 · 2023-03-20T06:06:23Z

@songxxzp

非常感谢你的帮助，这边更换到Linux平台可以使用了

fxb392 · 2023-04-03T14:25:56Z

No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.so
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 /root/.cache/huggingface/modules/transformers_modules/local/quantization_kernels.c -shared -o /root/.cache/huggingface/modules/transformers_modules/local/quantization_kernels.so
Kernels compiled : /root/.cache/huggingface/modules/transformers_modules/local/quantization_kernels.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
欢迎使用 ChatGLM-6B 模型，输入内容即可进行对话，clear 清空对话历史，stop 终止程序
根据@songxxzp 和#183 (comment) 的帮助改好了，感谢!
总结一下，报错是编译 kernal失败
(1)手动编译，在模型path下

gcc -fPIC -pthread -fopenmp -std=c99 quantization_kernels_parallel.c -shared -o quantization_kernels_parallel.so
gcc -fPIC -pthread -fopenmp -std=c99 quantization_kernels.c -shared -o quantization_kernels.so

（2）然后在原先模型加载后手动加载一下手动编译的kernel

model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
model = model.quantize(bits=4, kernel_file="Your Kernel Path")

还是会报编译错误，但是已经可以使用了。

sgb25sgb · 2023-04-13T06:46:40Z

报编译错误，但可以使用这是什么情况

fxb392 · 2023-04-13T06:50:35Z

@sgb25sgb 使用默认的加载cpu kernel 失败，但是model = model.quantize(bits=4, kernel_file="Your Kernel Path")加载的cpu kernel 成功了

sgb25sgb · 2023-04-15T09:21:13Z

fxb392 谢谢你！

undo-nothing · 2023-06-15T09:44:49Z

参考这个https://github.com/THUDM/ChatGLM-6B/issues/967，亲测有效

deapge · 2023-07-17T09:23:13Z

试下这个：
在你下载的源码 chatglm-6b-int4/quantization.py 的文件中，搜索找到这样三行
kernels = ctypes.cdll.LoadLibrary(kernel_file)，
都把它们都改成kernels = ctypes.CDLL(kernel_file,winmode=0)。

修改截图

xiabo0816 mentioned this issue Mar 21, 2023

[BUG/Help] ChatGLM-6B-int4部署CPU版时，提示relocation R_X86_64_32 against `.text' can not be used when making a shared object; recompile with -fPIC #183

Closed

1 task

songxxzp closed this as completed Mar 22, 2023

yknBugs mentioned this issue Apr 8, 2023

加载chatglm-6b-int4-qe会报错 Akegarasu/ChatGLM-webui#35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/Help] Windows 下 INT-4 量化模型无法加载 #162

[BUG/Help] Windows 下 INT-4 量化模型无法加载 #162

OedoSoldier commented Mar 19, 2023 •

edited

Loading

kenneth104 commented Mar 20, 2023

songxxzp commented Mar 20, 2023 •

edited

Loading

kenneth104 commented Mar 20, 2023

fxb392 commented Apr 3, 2023 •

edited

Loading

sgb25sgb commented Apr 13, 2023

fxb392 commented Apr 13, 2023

sgb25sgb commented Apr 15, 2023

undo-nothing commented Jun 15, 2023

deapge commented Jul 17, 2023 •

edited

Loading

[BUG/Help] Windows 下 INT-4 量化模型无法加载 #162

[BUG/Help] Windows 下 INT-4 量化模型无法加载 #162

Comments

OedoSoldier commented Mar 19, 2023 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

kenneth104 commented Mar 20, 2023

songxxzp commented Mar 20, 2023 • edited Loading

kenneth104 commented Mar 20, 2023

fxb392 commented Apr 3, 2023 • edited Loading

sgb25sgb commented Apr 13, 2023

fxb392 commented Apr 13, 2023

sgb25sgb commented Apr 15, 2023

undo-nothing commented Jun 15, 2023

deapge commented Jul 17, 2023 • edited Loading

OedoSoldier commented Mar 19, 2023 •

edited

Loading

songxxzp commented Mar 20, 2023 •

edited

Loading

fxb392 commented Apr 3, 2023 •

edited

Loading

deapge commented Jul 17, 2023 •

edited

Loading