RuntimeError: Unknown platform: darwin #6

LeeeSe · 2023-03-14T07:51:37Z

Mac support?

qizhanchan · 2023-03-14T08:17:14Z

Seems built with cuda kernels, thus no Mac

TengLi931128 · 2023-03-14T08:25:12Z

MacBook, got a same Error.

rjk-git · 2023-03-14T08:28:08Z

+1

Sengxian · 2023-03-14T08:28:17Z

Currently, quantization requires the use of custom CUDA kernels. Without quantization, inference should be able to support Macs with FP16 precision. I will fix this issue later.

wangsouc · 2023-03-14T15:02:35Z

+1

SirLYC · 2023-03-15T02:31:29Z

Please notifty us if it's fixed. Thanks!

JokerQyou · 2023-03-15T06:02:54Z

You can actually hack your way in and run the demo on macOS M1 machine. You only need to change 2 lines:

ChatGLM-6B/web_demo.py

Line 5 in 999dd3c

    
           model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

Change .half().cuda() to .float()

https://huggingface.co/THUDM/chatglm-6b/blob/9d1509a1ade49240535535aa020232c8a4b1c114/modeling_chatglm.py#L1155

Remove this line. (This file resides in huggingface cache directory, by default it's ~/.cache/huggingface/)

Notice:

It runs purely on CPU so the inference is relatively slow (~30 seconds for a question like "你是谁？")
Memory usage is doubled (of the total model size), so expect to use around 24 ~ 25 GB.

On the long term I'd expect the authors to optimize it for Apple Silicon chips (use mps for better performance).

Raphtalia-chann · 2023-03-15T12:44:19Z

But the problem still exist and got a text:
Backend MacOSX is interactive backend. Turning interactive mode on.
How to turn on?

duzx16 · 2023-03-15T13:58:16Z

The model implementation without quantization does support MacOS. The error occurs because when you load a remote repo, transformers will check all the libraries imported in modeling_chatglm.py, including cpm_kernel, which doesn't support MacOS. But in fact, you don't need to import cpm_kernel if you don't use quantization.
Since I don't have access to change either transformers or cpm_kernel, I can't fix it by my side. But the solution on the user side is also simple: just clone the repo locally (git clone https://huggingface.co/THUDM/chatglm-6b) and replace "THUDM/chatglm-6b" with your local repo path.

事实上，如果不需要量化的话，我们的模型实现是支持MacOS的。会出现报错的原因是transformers在加载一个远程仓库的时候会尝试加载modeling_chatglm.py中import的所有库，包括cpm_kernel。而cpm_kernel不支持MacOS。但是事实上如果你不会用到量化的话，import cpm_kernel这行代码根本不会被用到。因为我没有权限更改transformers或者cpm_kernel，所以我没有办法在我这边解决这个问题。但是在用户侧解决这个问题其实很简单，只要手动把仓库clone下来（git clone https://huggingface.co/THUDM/chatglm-6b），然后把demo里的"THUDM/chatglm-6b"替换成你本地的目录。

mice33 · 2023-03-16T03:35:31Z

Has anyone been able to successfully run it on a Mac?

jeffwcx · 2023-03-16T04:07:10Z

@mice33 I succeeded.

jeffwcx · 2023-03-16T04:46:44Z

Hardware specifications are as follows:

MacBook Pro
Model Identifier: Mac14,10
Chip: Apple M2 Pro
Total Number of Cores: 12 (8 performance and 4 efficiency)
Memory: 32 GB

My steps are as follows:

1. Download the model and modify the code

brew install git-lfs
# Clone to the `chatglm` folder
git clone https://huggingface.co/THUDM/chatglm-6b chatglm
cd chatglm
git lfs install
# This will take a long time
git lfs pull

Modify the modeling_chatglm.py file and comment out the following two lines:

--- a/modeling_chatglm.py
+++ b/modeling_chatglm.py
@@ -1166,6 +1166,6 @@ class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
         return torch.tensor(return_seqs, dtype=torch.long, device=kwargs['input_ids'].device)
 
     def quantize(self, bits: int):
-        from .quantization import quantize
-        self.transformer = quantize(self.transformer, bits)
+        # from .quantization import quantize
+        # self.transformer = quantize(self.transformer, bits)

2. Modify the demo code

 git clone https://github.com/THUDM/ChatGLM-6B.git
 cd ChatGLM-6B
 pip install -r requirements.txt
 pip install gradio

Modify web_demo.py

--- a/web_demo.py
+++ b/web_demo.py
@@ -1,8 +1,8 @@
 from transformers import AutoModel, AutoTokenizer
 import gradio as gr
 
-tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
+tokenizer = AutoTokenizer.from_pretrained("../chatglm", trust_remote_code=True)
+model = AutoModel.from_pretrained("../chatglm", trust_remote_code=True).float()
 model = model.eval()

then

python web_demo.py

After successful execution, it will display:

 running on local URL:  http://127.0.0.1:7860

noahzark · 2023-03-16T06:54:56Z

Succeeded using @jeffwcx 's solution, thanks for sharing. I cloned the hugging face repo into the repositories folder. And the step 1 modifying part is not neccessary since we directly use the float() and not calling quantize() function locally so cpm_kernel won't be loaded.

But as mentioned above, pure CPU is really slow, looking forward to the mps solution. So the model would be pretty "portable" (lol

The first round takes 25 seconds and the second round takes nearly 4 minutes.

使用 jeffwcx 的方案成功了，感谢分享。因为是本地加载而且不调用量化方法，所以其实可以不用修改注释里的内容（省略第一步的后半部份）。
但是和上面提到的一样，纯 CPU 运算实在是太慢了（如下图所示，第一轮25s，第二轮将近4分钟），期待能有 mps 的方案，就能带着模型出去玩了 :)

MacBook Pro
Model Identifier: Mac14,10
Chip: Apple M2 Max (12c32g)

noahzark · 2023-03-16T08:26:19Z

Tried to force using MPS backend, and got NotImplementedError: Unknown device for graph fuser. Seems warning MPS is not a fusable device

chaucerling · 2023-03-17T09:24:37Z

# install nightly version of pytorch
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

// use mps backend
mps_device = torch.device("mps")
tokenizer = AutoTokenizer.from_pretrained("../chatglm", trust_remote_code=True)
model = AutoModel.from_pretrained("../chatglm", trust_remote_code=True).half()
model.to(mps_device)

The model can be initialized, but an error occurs when generating an output. It appears that some functions of PyTorch may be available on MacOS 13.3 Beta. I will try again later and hope that it will run successfully.

chaucerling · 2023-03-17T18:42:25Z

After upgrading to MacOS 13.3 Beta4, I ran the cli demo but got the error NotImplementedError: Unknown device for graph fuser. This may be because the JIT fusion setting is not compatible with MPS. Therefore, I commented out the JIT setting in modeling_chatglm.py at line 34~37.

# flags required to enable jit fusion kernels
# torch._C._jit_set_profiling_mode(False)
# torch._C._jit_set_profiling_executor(False)
# torch._C._jit_override_can_fuse_on_cpu(True)
# torch._C._jit_override_can_fuse_on_gpu(True)

after running the code once aggain, I got another error invalid type: 'torch.mps.HalfTensor' at line 274. It can be fixed by editing line 268.

dtype = attention_scores.dtype # return torch.float16

After fixing above errors, the code can run without error occurred, but the output can not generate after waiting a long time. I still need some help to identify the issue.

python: 3.10.9
torch: 2.1.0.dev20230317
transformers: 4.26.1

icowan · 2023-03-21T08:03:32Z

Hardware specifications are as follows:

MacBook Pro Model Identifier: Mac14,10 Chip: Apple M2 Pro Total Number of Cores: 12 (8 performance and 4 efficiency) Memory: 32 GB

My steps are as follows:

1. Download the model and modify the code

brew install git-lfs
# Clone to the `chatglm` folder
git clone https://huggingface.co/THUDM/chatglm-6b chatglm
cd chatglm
git lfs install
# This will take a long time
git lfs pull

Modify the modeling_chatglm.py file and comment out the following two lines:

--- a/modeling_chatglm.py
+++ b/modeling_chatglm.py
@@ -1166,6 +1166,6 @@ class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
         return torch.tensor(return_seqs, dtype=torch.long, device=kwargs['input_ids'].device)
 
     def quantize(self, bits: int):
-        from .quantization import quantize
-        self.transformer = quantize(self.transformer, bits)
+        # from .quantization import quantize
+        # self.transformer = quantize(self.transformer, bits)

2. Modify the demo code

 git clone https://github.com/THUDM/ChatGLM-6B.git
 cd ChatGLM-6B
 pip install -r requirements.txt
 pip install gradio

Modify web_demo.py

--- a/web_demo.py
+++ b/web_demo.py
@@ -1,8 +1,8 @@
 from transformers import AutoModel, AutoTokenizer
 import gradio as gr
 
-tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
+tokenizer = AutoTokenizer.from_pretrained("../chatglm", trust_remote_code=True)
+model = AutoModel.from_pretrained("../chatglm", trust_remote_code=True).float()
 model = model.eval()

then

python web_demo.py

After successful execution, it will display:

 running on local URL:  http://127.0.0.1:7860

git lfs pull 了多少G内容？我这磁盘都快满了

Erickrus · 2023-03-21T12:40:28Z

我的Mac 16G 12c, 我从ubuntu:18.04镜像开始安装了docker chatglm-6b-int4:0.1, 设置了docker CPU, RAM参数，然后启动docker
docker run -d --memory-swap -1 -v pwd:/workspace chatglm-6b-int4:0.1
可以启动，CPU拉满，SWAP也占了很多，速度惨不忍睹，可以跑出结果
good luck

jeffwcx · 2023-03-21T14:08:41Z

@icowan 大概26G

taozhijiang · 2023-03-22T16:39:11Z

有没有感觉，CPU没有并行计算，只有一个python占满CPU利用率，其他核心都是空跑闲置……

fxkgwlzm · 2023-03-23T05:02:09Z

配置：12900k加32g 系统win7

可以跑，但是慢，可以忍受

taozhijiang · 2023-03-23T15:37:28Z

使用上面的方法，在macOS上面是可以跑起来，但是无法并行化

import cpm_kernels
Traceback (most recent call last):
File "", line 1, in
File "/Users/xxx/miniconda3/envs/pytorch-1.7.0/lib/python3.8/site-packages/cpm_kernels/init.py", line 1, in
from . import library
File "/Users/xxx/miniconda3/envs/pytorch-1.7.0/lib/python3.8/site-packages/cpm_kernels/library/init.py", line 1, in
from . import nvrtc
File "/Users/xxx/miniconda3/envs/pytorch-1.7.0/lib/python3.8/site-packages/cpm_kernels/library/nvrtc.py", line 5, in
nvrtc = Lib("nvrtc")
File "/Users/xxx/miniconda3/envs/pytorch-1.7.0/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 59, in init
raise RuntimeError("Unknown platform: %s" % sys.platform)
RuntimeError: Unknown platform: darwin

不能并行化，只有一个核跑，速度快不起来，也只能跑着玩玩而已……
PS 话说这个cpm-kernels是不开源的？

JasonChenJC · 2023-04-01T10:02:41Z

我们的模型实现是支持MacOS的

clone本地然后，下载模型路径还是报错

duzx16 · 2023-04-01T10:25:53Z

我们的模型实现是支持MacOS的

clone本地然后，下载模型路径还是报错

完整的报错信息是什么？

jussker · 2023-04-04T00:14:50Z

M2可以试一试在 webdemo.py 文件中，修改model使用的设备：
安装M2的pytorch可以参考 https://developer.apple.com/metal/pytorch/

import torch
...
# model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
# 设置使用 pytorch 的 MPS 加速
model = AutoModel.from_pretrained("./chatglm", trust_remote_code=True).quantize(8).half().to('mps')
...

chang-zy · 2023-04-07T13:05:05Z

git lfs pull
Could not pull

shenzhenjiaszj · 2023-04-07T22:09:10Z

same as you, git lfs pull noting to pulled.

dakele895 · 2023-04-08T01:31:39Z

苹果m1
Traceback (most recent call last):
File "/Users/dukangle/code/gpt/ChatGLM-6B/web_demo.py", line 7, in
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float()
File "/opt/homebrew/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 462, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/opt/homebrew/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 388, in get_class_from_dynamic_module
final_module = get_cached_module_file(
File "/opt/homebrew/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 299, in get_cached_module_file
get_cached_module_file(
File "/opt/homebrew/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 269, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
File "/opt/homebrew/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 134, in check_imports
importlib.import_module(imp)
File "/opt/homebrew/Cellar/python@3.10/3.10.7/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/opt/homebrew/lib/python3.10/site-packages/cpm_kernels/init.py", line 1, in
from . import library
File "/opt/homebrew/lib/python3.10/site-packages/cpm_kernels/library/init.py", line 1, in
from . import nvrtc
File "/opt/homebrew/lib/python3.10/site-packages/cpm_kernels/library/nvrtc.py", line 5, in
nvrtc = Lib("nvrtc")
File "/opt/homebrew/lib/python3.10/site-packages/cpm_kernels/library/base.py", line 59, in init
raise RuntimeError("Unknown platform: %s" % sys.platform)
RuntimeError: Unknown platform: darwin

duzx16 · 2023-04-08T02:45:41Z

@dakele895 #6 (comment)

chang-zy · 2023-04-08T02:53:43Z

之前有个回答说要git lfs pull, 我在电脑上可以不用这一条就跑出来,其余的步骤还是按点赞最多的那个回答来.

dakele895 · 2023-04-08T16:30:56Z

运算好慢

chch1970 · 2023-04-09T07:36:31Z

For "cpm_kernels/library/base.py RuntimeError: Unknown platform: darwin":
replace the line #59

raise RuntimeError("Unknown platform: %s" % sys.platform)

with

self.__lib = None

juleskuehn · 2023-04-09T18:37:57Z

I can run quantized models with CPU on a Macbook Air M2. (GPU does not work.) It is very slow though!

I followed the instructions to install pytorch nightly, but I don't think this is really necessary since we aren't using the mps backend:

conda install pytorch torchvision torchaudio -c pytorch-nightly

Clone the Huggingface repo for the quantized model (e.g. in the parent directory of this repo). You need git-lfs.

brew install git-lfs
git lfs install
git clone https://huggingface.co/THUDM/chatglm-6b-int4

Back in this repo, I had to install some additional requirements:

pip install cchardet chardet gradio

Then, to run the quantized model on CPU, just change line 6 in web_demo.py to:

model = AutoModel.from_pretrained("../chatglm-6b-int4", trust_remote_code=True).float()

TabithaES · 2023-04-09T20:47:30Z

我的Mac 16G 12c, 我从ubuntu:18.04镜像开始安装了docker chatglm-6b-int4:0.1, 设置了docker CPU, RAM参数，然后启动docker docker run -d --memory-swap -1 -v pwd:/workspace chatglm-6b-int4:0.1 可以启动，CPU拉满，SWAP也占了很多，速度惨不忍睹，可以跑出结果 good luck

$ docker run -d --memory-swap -1 -v pwd:/workspace chatglm-6b-int4:0.1
Unable to find image 'chatglm-6b-int4:0.1' locally
docker: Error response from daemon: pull access denied for chatglm-6b-int4, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

~~帮助？помощь? help?~~

Ура! https://old.reddit.com/r/selfhosted/comments/11u0sot/chatglm_an_opensource_selfhosted_dialogue/ + https://github.com/hyperonym/basaran/blob/master/README.md

остаться с носом... 😢

$ docker logs -f peakji92-chatglm-6b
Loading checkpoint shards: 100%|██████████| 8/8 [00:05<00:00,  1.35it/s]
start listening on 0.0.0.0:80
ERROR:waitress:Exception while serving /v1/completions
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/waitress/channel.py", line 428, in service
    task.service()
  File "/usr/local/lib/python3.8/dist-packages/waitress/task.py", line 168, in service
    self.execute()
  File "/usr/local/lib/python3.8/dist-packages/waitress/task.py", line 456, in execute
    for chunk in app_iter:
  File "/usr/local/lib/python3.8/dist-packages/werkzeug/wsgi.py", line 500, in __next__
    return self._next()
  File "/usr/local/lib/python3.8/dist-packages/werkzeug/wrappers/response.py", line 50, in _iter_encoded
    for item in iterable:
  File "/app/basaran/__main__.py", line 163, in stream
    for choice in stream_model(**options):
  File "/app/basaran/model.py", line 73, in __call__
    for (
  File "/app/basaran/model.py", line 237, in generate
    outputs = self.model(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 1160, in forward
    transformer_outputs = self.transformer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 973, in forward
    layer_ret = layer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modeling_chatglm.py", line 611, in forward
    attention_input = self.input_layernorm(hidden_states)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Erickrus · 2023-04-11T08:11:54Z

```shell
docker: Error response from daemon: pull access denied for chatglm-6b-int4, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

As I mentioned, you need build the docker from scratch (ubuntu 18.04).

pchancellor · 2023-04-25T02:58:04Z

The model implementation without quantization does support MacOS. The error occurs because when you load a remote repo, transformers will check all the libraries imported in modeling_chatglm.py, including cpm_kernel, which doesn't support MacOS. But in fact, you don't need to import cpm_kernel if you don't use quantization. Since I don't have access to change either transformers or cpm_kernel, I can't fix it by my side. But the solution on the user side is also simple: just clone the repo locally (git clone https://huggingface.co/THUDM/chatglm-6b) and replace "THUDM/chatglm-6b" with your local repo path.

事实上，如果不需要量化的话，我们的模型实现是支持MacOS的。会出现报错的原因是transformers在加载一个远程仓库的时候会尝试加载modeling_chatglm.py中import的所有库，包括cpm_kernel。而cpm_kernel不支持MacOS。但是事实上如果你不会用到量化的话，import cpm_kernel这行代码根本不会被用到。因为我没有权限更改transformers或者cpm_kernel，所以我没有办法在我这边解决这个问题。但是在用户侧解决这个问题其实很简单，只要手动把仓库clone下来（git clone https://huggingface.co/THUDM/chatglm-6b），然后把demo里的"THUDM/chatglm-6b"替换成你本地的目录。

mac m1运行chatglm-6b-int4模型，已经替换为本地目录，但运行时会加载quantization.py文件，然后加载cpm_kernels时报错，请问这个能怎么解决呢？
Traceback (most recent call last): File "/Users/pengxu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 18, in <module> from cpm_kernels.kernels.base import LazyKernelCModule, KernelFunction, round_up File "/opt/homebrew/lib/python3.11/site-packages/cpm_kernels/__init__.py", line 1, in <module> from . import library File "/opt/homebrew/lib/python3.11/site-packages/cpm_kernels/library/__init__.py", line 1, in <module> from . import nvrtc File "/opt/homebrew/lib/python3.11/site-packages/cpm_kernels/library/nvrtc.py", line 5, in <module> nvrtc = Lib("nvrtc") ^^^^^^^^^^^^ File "/opt/homebrew/lib/python3.11/site-packages/cpm_kernels/library/base.py", line 59, in __init__ raise RuntimeError("Unknown platform: %s" % sys.platform) RuntimeError: Unknown platform: darwin

QqqingYuan · 2023-04-25T12:33:40Z

Mac user can modify chatglm-6b-int4/quantization.py to fix these errors:

cpm_kernel not support MacOS now，so delete line 17 to line 43 and just remain
kernels = None
Apple's gcc does not support OpenMP, you should install llvm and libomp :
brew install llvm libomp
and modify gcc compiler option in line 140 . But there're some bugs and don't work with OpenMP. so
just comment out or delete line 140 to line 154.

kannayshu · 2023-05-05T06:41:42Z

Mac
3.1 GHz 双核Intel Core i7
16 GB 1867 MHz DDR3
Intel Iris Graphics 6100 1536 MB
挺一般的配置也能跑起来

按照上面最高赞jeffwcx的方法成功啦，感谢~
我用的chatglm-6b-int4，第一点的提到的modeling_chatglm.py 那里不用更改，同时我注释掉了modeling_chatglm.py里的第18到46行，关于kernels的
然后第二点 web_demo.py 里面的模型地址换成你本地模型存放地址，别直接硬搬

fangkuann · 2023-05-08T12:48:38Z

M2可以试一试在 webdemo.py 文件中，修改model使用的设备：安装M2的pytorch可以参考 https://developer.apple.com/metal/pytorch/
import torch
...
# model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
# 设置使用 pytorch 的 MPS 加速
model = AutoModel.from_pretrained("./chatglm", trust_remote_code=True).quantize(8).half().to('mps')
...

这个MPS只有M2机器可以跑的通吗?我在M1 16G内存的机器上，一直卡在 Loading checkpoint shards: 100%。

MODEL_PATH="../chatglm-6b" // no quantization
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).quantize(8).half().to('mps')

Dependences

python 3.9.6
torch 2.1.0.dev20230508
transformers 4.27.1
MacOS 13.3 (22E252)

icowan · 2023-05-10T02:53:34Z

Hardware specifications are as follows:

MacBook Pro Model Identifier: Mac14,10 Chip: Apple M2 Pro Total Number of Cores: 12 (8 performance and 4 efficiency) Memory: 32 GB

My steps are as follows:

1. Download the model and modify the code

brew install git-lfs
# Clone to the `chatglm` folder
git clone https://huggingface.co/THUDM/chatglm-6b chatglm
cd chatglm
git lfs install
# This will take a long time
git lfs pull

Modify the modeling_chatglm.py file and comment out the following two lines:

--- a/modeling_chatglm.py
+++ b/modeling_chatglm.py
@@ -1166,6 +1166,6 @@ class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
         return torch.tensor(return_seqs, dtype=torch.long, device=kwargs['input_ids'].device)
 
     def quantize(self, bits: int):
-        from .quantization import quantize
-        self.transformer = quantize(self.transformer, bits)
+        # from .quantization import quantize
+        # self.transformer = quantize(self.transformer, bits)

2. Modify the demo code

 git clone https://github.com/THUDM/ChatGLM-6B.git
 cd ChatGLM-6B
 pip install -r requirements.txt
 pip install gradio

Modify web_demo.py

--- a/web_demo.py
+++ b/web_demo.py
@@ -1,8 +1,8 @@
 from transformers import AutoModel, AutoTokenizer
 import gradio as gr
 
-tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
+tokenizer = AutoTokenizer.from_pretrained("../chatglm", trust_remote_code=True)
+model = AutoModel.from_pretrained("../chatglm", trust_remote_code=True).float()
 model = model.eval()

then

python web_demo.py

After successful execution, it will display:

 running on local URL:  http://127.0.0.1:7860

感谢！m2，64G. 跑起来了，速度还行几个词几个词蹦

Doubledongli · 2023-05-19T01:57:43Z

报错：curl: (7) Failed to connect to raw.githubusercontent.com port 443: Connection refused

BryanOne · 2023-05-24T11:11:21Z

M2可以试一试在 webdemo.py 文件中，修改model使用的设备：安装M2的pytorch可以参考 https://developer.apple.com/metal/pytorch/
import torch
...
# model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
# 设置使用 pytorch 的 MPS 加速
model = AutoModel.from_pretrained("./chatglm", trust_remote_code=True).quantize(8).half().to('mps')
...
这个MPS只有M2机器可以跑的通吗?我在M1 16G内存的机器上，一直卡在 Loading checkpoint shards: 100%。
MODEL_PATH="../chatglm-6b" // no quantization
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).quantize(8).half().to('mps')
Dependences

python 3.9.6

torch 2.1.0.dev20230508

transformers 4.27.1

MacOS 13.3 (22E252)

同M1 16G，相同报错，感觉是内存不够

chenbt-hz · 2023-05-26T06:04:25Z

Hardware specifications are as follows:

MacBook Pro Model Identifier: Mac14,10 Chip: Apple M2 Pro Total Number of Cores: 12 (8 performance and 4 efficiency) Memory: 32 GB

My steps are as follows:

1. Download the model and modify the code

brew install git-lfs
# Clone to the `chatglm` folder
git clone https://huggingface.co/THUDM/chatglm-6b chatglm
cd chatglm
git lfs install
# This will take a long time
git lfs pull

Modify the modeling_chatglm.py file and comment out the following two lines:

--- a/modeling_chatglm.py
+++ b/modeling_chatglm.py
@@ -1166,6 +1166,6 @@ class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
         return torch.tensor(return_seqs, dtype=torch.long, device=kwargs['input_ids'].device)
 
     def quantize(self, bits: int):
-        from .quantization import quantize
-        self.transformer = quantize(self.transformer, bits)
+        # from .quantization import quantize
+        # self.transformer = quantize(self.transformer, bits)

2. Modify the demo code

 git clone https://github.com/THUDM/ChatGLM-6B.git
 cd ChatGLM-6B
 pip install -r requirements.txt
 pip install gradio

Modify web_demo.py

--- a/web_demo.py
+++ b/web_demo.py
@@ -1,8 +1,8 @@
 from transformers import AutoModel, AutoTokenizer
 import gradio as gr
 
-tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
+tokenizer = AutoTokenizer.from_pretrained("../chatglm", trust_remote_code=True)
+model = AutoModel.from_pretrained("../chatglm", trust_remote_code=True).float()
 model = model.eval()

then

python web_demo.py

After successful execution, it will display:

 running on local URL:  http://127.0.0.1:7860

我是本周拉取的模型，按这个步骤无法运行/chatglm-6b-int4起来。/chatglm-6b正常
环境：
MacOS 12.4
Python 3.10.6

最终修改以下代码运行成功
(异常提示还是有的，但是添加了ignore_mismatched_sizes=True参数最终可以运行)：
web_demo.py:
tokenizer = AutoTokenizer.from_pretrained("./chatglm-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("./chatglm-6b-int4", trust_remote_code=True,ignore_mismatched_sizes=True).float()
./chatglm-6b-int4/modeling_chatglm.py
`
def quantize(self, bits: int, quantize_embeddings=False, use_quantization_cache=False, empty_init=False, **kwargs):
if bits == 0:
return

    #from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel

    # if self.quantized:
    #     if self.device == torch.device("cpu"):
    #         logger.info("Already quantized, reloading cpu kernel.")
    #         load_cpu_kernel(**kwargs)
    #     else:
    #         logger.info("Already quantized.")
    #     return self

    self.quantized = True

    self.config.quantization_bit = bits
    self.config.quantization_embeddings = quantize_embeddings

    # self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)

    if self.device == torch.device("cpu"):
        dtype = torch.float32
    else:
        dtype = torch.half

    # if quantize_embeddings:
    #     logger.info("Applying quantization to embeddings")
    #     self.transformer.word_embeddings = QuantizedEmbedding(
    #         weight_bit_width=bits,
    #         weight_tensor=self.transformer.word_embeddings.weight.to(self.device),
    #         num_embeddings=self.transformer.word_embeddings.num_embeddings,
    #         embedding_dim=self.transformer.word_embeddings.embedding_dim,
    #         dtype=dtype,
    #         empty_init=empty_init,
    #         device=self.transformer.word_embeddings.weight.device,
    #     )
    #     self.lm_head = QuantizedLinear(
    #         weight_bit_width=bits,
    #         weight_tensor=self.lm_head.weight.to(self.device),
    #         bias_tensor=None,
    #         in_features=self.lm_head.in_features,
    #         out_features=self.lm_head.out_features,
    #         bias=False,
    #         quantized_weight=self.transformer.word_embeddings.weight,
    #         quantized_weight_scale=self.transformer.word_embeddings.weight_scale,
    #         dtype=dtype,
    #         empty_init=empty_init,
    #         device=self.lm_head.weight.device,
    #     )

    return self

`

quhongwei · 2023-07-24T07:32:08Z

Mac 3.1 GHz 双核Intel Core i7 16 GB 1867 MHz DDR3 Intel Iris Graphics 6100 1536 MB 挺一般的配置也能跑起来

按照上面最高赞jeffwcx的方法成功啦，感谢~ 我用的chatglm-6b-int4，第一点的提到的modeling_chatglm.py 那里不用更改，同时我注释掉了modeling_chatglm.py里的第18到46行，关于kernels的然后第二点 web_demo.py 里面的模型地址换成你本地模型存放地址，别直接硬搬

能贴下第18到46行具体是什么代码吗?

championcp · 2023-12-13T12:31:38Z

请问这个问题有确定的解决方案了吗？

duzx16 mentioned this issue Mar 14, 2023

Unknown platform: darwin #21

Closed

duzx16 mentioned this issue Mar 15, 2023

不支持arm架构的mac os？ #40

Closed

duzx16 closed this as completed Mar 15, 2023

This was referenced Mar 15, 2023

关于macOS #75

Closed

没有n卡情况下无法运行 ReadMe中的纯cpu模式方案无效 #42

Closed

This was referenced Mar 16, 2023

Could not find module "nvcuda.dll" #95

Closed

请问是不是不支持mac m1部署？报错：no drawin #101

Closed

kingzeus mentioned this issue Mar 17, 2023

[BUG] mac根据#6改了下，还是无法运行 #129

Closed

1 task

chaucerling mentioned this issue Mar 19, 2023

[Feature] m系列的mac启用gpu #152

Closed

yanghl12138 mentioned this issue Mar 20, 2023

[BUG/Help] Windows下CPU部署chatglm-6b-int4报错“Could not find module 'nvcuda.dll'” #166

Closed

1 task

kk580kk mentioned this issue Mar 20, 2023

希望能够支持MacOS TsinghuaAI/CPM#7

Open

jiangbinbin-code mentioned this issue Apr 2, 2023

是否只支持英伟达的显卡 #339

Closed

ykk648 mentioned this issue Apr 18, 2023

请问用的什么操作系统呢？ chatchat-space/Langchain-Chatchat#110

Closed

zxy333666 mentioned this issue Jun 13, 2023

[BUG/Help] <title> RuntimeError: [1] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Socket Timeout #1232

Open

1 task

JiangNingRicky mentioned this issue Apr 25, 2024

TypeError: chat() got multiple values for argument 'history' THUDM/VisualGLM-6B#160

Open

RuntimeError: Unknown platform: darwin #6

RuntimeError: Unknown platform: darwin #6

Comments

LeeeSe commented Mar 14, 2023

qizhanchan commented Mar 14, 2023 • edited Loading

TengLi931128 commented Mar 14, 2023

rjk-git commented Mar 14, 2023

Sengxian commented Mar 14, 2023

wangsouc commented Mar 14, 2023

SirLYC commented Mar 15, 2023

JokerQyou commented Mar 15, 2023 • edited Loading

Raphtalia-chann commented Mar 15, 2023

duzx16 commented Mar 15, 2023 • edited Loading

mice33 commented Mar 16, 2023

jeffwcx commented Mar 16, 2023

jeffwcx commented Mar 16, 2023

1. Download the model and modify the code

2. Modify the demo code

noahzark commented Mar 16, 2023

noahzark commented Mar 16, 2023

chaucerling commented Mar 17, 2023 • edited Loading

chaucerling commented Mar 17, 2023 • edited Loading

icowan commented Mar 21, 2023

1. Download the model and modify the code

2. Modify the demo code

Erickrus commented Mar 21, 2023 • edited Loading

jeffwcx commented Mar 21, 2023

taozhijiang commented Mar 22, 2023

fxkgwlzm commented Mar 23, 2023

taozhijiang commented Mar 23, 2023

JasonChenJC commented Apr 1, 2023

duzx16 commented Apr 1, 2023

jussker commented Apr 4, 2023 • edited Loading

chang-zy commented Apr 7, 2023

shenzhenjiaszj commented Apr 7, 2023

dakele895 commented Apr 8, 2023

duzx16 commented Apr 8, 2023

chang-zy commented Apr 8, 2023

dakele895 commented Apr 8, 2023

chch1970 commented Apr 9, 2023 • edited Loading

juleskuehn commented Apr 9, 2023

TabithaES commented Apr 9, 2023 • edited Loading

Erickrus commented Apr 11, 2023

pchancellor commented Apr 25, 2023

QqqingYuan commented Apr 25, 2023 • edited Loading

kannayshu commented May 5, 2023

fangkuann commented May 8, 2023 • edited Loading

icowan commented May 10, 2023

1. Download the model and modify the code

2. Modify the demo code

Doubledongli commented May 19, 2023

BryanOne commented May 24, 2023

chenbt-hz commented May 26, 2023

1. Download the model and modify the code

2. Modify the demo code

quhongwei commented Jul 24, 2023

championcp commented Dec 13, 2023

qizhanchan commented Mar 14, 2023 •

edited

Loading

JokerQyou commented Mar 15, 2023 •

edited

Loading

duzx16 commented Mar 15, 2023 •

edited

Loading

chaucerling commented Mar 17, 2023 •

edited

Loading

chaucerling commented Mar 17, 2023 •

edited

Loading

Erickrus commented Mar 21, 2023 •

edited

Loading

jussker commented Apr 4, 2023 •

edited

Loading

chch1970 commented Apr 9, 2023 •

edited

Loading

TabithaES commented Apr 9, 2023 •

edited

Loading

QqqingYuan commented Apr 25, 2023 •

edited

Loading

fangkuann commented May 8, 2023 •

edited

Loading