-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Unknown platform: darwin #6
Comments
Seems built with cuda kernels, thus no Mac |
MacBook, got a same Error. |
+1 |
Currently, quantization requires the use of custom CUDA kernels. Without quantization, inference should be able to support Macs with FP16 precision. I will fix this issue later. |
+1 |
Please notifty us if it's fixed. Thanks! |
You can actually hack your way in and run the demo on macOS M1 machine. You only need to change 2 lines: Line 5 in 999dd3c
Change Remove this line. (This file resides in huggingface cache directory, by default it's Notice:
On the long term I'd expect the authors to optimize it for Apple Silicon chips (use |
But the problem still exist and got a text: |
The model implementation without quantization does support MacOS. The error occurs because when you load a remote repo, 事实上,如果不需要量化的话,我们的模型实现是支持MacOS的。会出现报错的原因是 |
Has anyone been able to successfully run it on a Mac? |
@mice33 I succeeded. |
Hardware specifications are as follows: MacBook Pro My steps are as follows: 1. Download the model and modify the codebrew install git-lfs
# Clone to the `chatglm` folder
git clone https://huggingface.co/THUDM/chatglm-6b chatglm
cd chatglm
git lfs install
# This will take a long time
git lfs pull Modify the --- a/modeling_chatglm.py
+++ b/modeling_chatglm.py
@@ -1166,6 +1166,6 @@ class ChatGLMForConditionalGeneration(ChatGLMPreTrainedModel):
return torch.tensor(return_seqs, dtype=torch.long, device=kwargs['input_ids'].device)
def quantize(self, bits: int):
- from .quantization import quantize
- self.transformer = quantize(self.transformer, bits)
+ # from .quantization import quantize
+ # self.transformer = quantize(self.transformer, bits) 2. Modify the demo code git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B
pip install -r requirements.txt
pip install gradio Modify --- a/web_demo.py
+++ b/web_demo.py
@@ -1,8 +1,8 @@
from transformers import AutoModel, AutoTokenizer
import gradio as gr
-tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
+tokenizer = AutoTokenizer.from_pretrained("../chatglm", trust_remote_code=True)
+model = AutoModel.from_pretrained("../chatglm", trust_remote_code=True).float()
model = model.eval() then python web_demo.py After successful execution, it will display:
|
Succeeded using @jeffwcx 's solution, thanks for sharing. I cloned the hugging face repo into the repositories folder. And the step 1 modifying part is not neccessary since we directly use the float() and not calling quantize() function locally so cpm_kernel won't be loaded. But as mentioned above, pure CPU is really slow, looking forward to the mps solution. So the model would be pretty "portable" (lol The first round takes 25 seconds and the second round takes nearly 4 minutes. 使用 jeffwcx 的方案成功了,感谢分享。因为是本地加载而且不调用量化方法,所以其实可以不用修改注释里的内容(省略第一步的后半部份)。 MacBook Pro |
// use mps backend
mps_device = torch.device("mps")
tokenizer = AutoTokenizer.from_pretrained("../chatglm", trust_remote_code=True)
model = AutoModel.from_pretrained("../chatglm", trust_remote_code=True).half()
model.to(mps_device) The model can be initialized, but an error occurs when generating an output. It appears that some functions of PyTorch may be available on MacOS 13.3 Beta. I will try again later and hope that it will run successfully. |
After upgrading to MacOS 13.3 Beta4, I ran the cli demo but got the error
after running the code once aggain, I got another error
After fixing above errors, the code can run without error occurred, but the output can not generate after waiting a long time. I still need some help to identify the issue. python: 3.10.9 |
|
我的Mac 16G 12c, 我从ubuntu:18.04镜像开始安装了docker chatglm-6b-int4:0.1, 设置了docker CPU, RAM参数,然后启动docker |
@icowan 大概26G |
有没有感觉,CPU没有并行计算,只有一个python占满CPU利用率,其他核心都是空跑闲置…… |
配置:12900k加32g 系统win7 可以跑,但是慢,可以忍受 |
使用上面的方法,在macOS上面是可以跑起来,但是无法并行化
不能并行化,只有一个核跑,速度快不起来,也只能跑着玩玩而已…… |
clone本地然后,下载模型路径还是报错 |
完整的报错信息是什么? |
M2可以试一试在 webdemo.py 文件中,修改model使用的设备: import torch
...
# model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
# 设置使用 pytorch 的 MPS 加速
model = AutoModel.from_pretrained("./chatglm", trust_remote_code=True).quantize(8).half().to('mps')
... |
git lfs pull |
same as you, git lfs pull noting to pulled. |
苹果m1 |
之前有个回答说要git lfs pull, 我在电脑上可以不用这一条就跑出来,其余的步骤还是按点赞最多的那个回答来. |
For "cpm_kernels/library/base.py RuntimeError: Unknown platform: darwin":
with
|
I can run quantized models with CPU on a Macbook Air M2. (GPU does not work.) It is very slow though! I followed the instructions to install pytorch nightly, but I don't think this is really necessary since we aren't using the mps backend: conda install pytorch torchvision torchaudio -c pytorch-nightly Clone the Huggingface repo for the quantized model (e.g. in the parent directory of this repo). You need git-lfs. brew install git-lfs
git lfs install
git clone https://huggingface.co/THUDM/chatglm-6b-int4 Back in this repo, I had to install some additional requirements: pip install cchardet chardet gradio Then, to run the quantized model on CPU, just change line 6 in model = AutoModel.from_pretrained("../chatglm-6b-int4", trust_remote_code=True).float() |
$ docker run -d --memory-swap -1 -v pwd:/workspace chatglm-6b-int4:0.1
Unable to find image 'chatglm-6b-int4:0.1' locally
docker: Error response from daemon: pull access denied for chatglm-6b-int4, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
Ура! https://old.reddit.com/r/selfhosted/comments/11u0sot/chatglm_an_opensource_selfhosted_dialogue/ + https://github.com/hyperonym/basaran/blob/master/README.md остаться с носом... 😢
|
As I mentioned, you need build the docker from scratch (ubuntu 18.04). |
mac m1运行chatglm-6b-int4模型,已经替换为本地目录,但运行时会加载quantization.py文件,然后加载cpm_kernels时报错,请问这个能怎么解决呢? |
Mac user can modify chatglm-6b-int4/quantization.py to fix these errors:
|
Mac 按照上面最高赞jeffwcx的方法成功啦,感谢~ |
这个MPS只有M2机器可以跑的通吗?我在M1 16G内存的机器上,一直卡在 Loading checkpoint shards: 100%。
Dependences
|
感谢!m2,64G. 跑起来了,速度还行几个词几个词蹦 |
报错:curl: (7) Failed to connect to raw.githubusercontent.com port 443: Connection refused |
同M1 16G,相同报错,感觉是内存不够 |
我是本周拉取的模型,按这个步骤无法运行/chatglm-6b-int4起来。/chatglm-6b正常 最终修改以下代码运行成功
` |
能贴下第18到46行具体是什么代码吗? |
请问这个问题有确定的解决方案了吗? |
Mac support?
The text was updated successfully, but these errors were encountered: