Skip to content

Commit

Permalink
Merge pull request #2 from yfyang86/mac-m1-dev
Browse files Browse the repository at this point in the history
 [Document] 更新Mac部署
  • Loading branch information
yfyang86 authored May 3, 2023
2 parents 16b3a43 + 2629a55 commit 8983d4b
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 26 deletions.
17 changes: 6 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,22 +207,17 @@ clang: error: unsupported option '-fopenmp'

```bash
# 第一步: 参考`https://mac.r-project.org/openmp/`
## 假设gcc -v是14.x版本,其他版本见R-Project提供的表格
## 假设: gcc(clang)是14.x版本,其他版本见R-Project提供的表格
curl -O https://mac.r-project.org/openmp/openmp-14.0.6-darwin20-Release.tar.gz
sudo tar fvxz openmp-14.0.6-darwin20-Release.tar.gz -C /
## 此时会安装下面几个文件:
# usr/local/lib/libomp.dylib
# usr/local/include/ompt.h
# usr/local/include/omp.h
# usr/local/include/omp-tools.h
```

针对`chatglm-6b-int4`, 修改[quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/blob/main/quantization.py),主要是把硬编码的`gcc -O3 -fPIC -pthread -fopenmp -std=c99`命令修改成`gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99`[对应代码](https://huggingface.co/THUDM/chatglm-6b-int4/blob/63d66b0572d11cedd5574b38da720299599539b3/quantization.py#L168)见下:
此时会安装下面几个文件:`/usr/local/lib/libomp.dylib`, `/usr/local/include/ompt.h`, `/usr/local/include/omp.h`, `/usr/local/include/omp-tools.h`

然后针对`chatglm-6b-int4`, 修改[quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/blob/main/quantization.py),主要是把硬编码的`gcc -O3 -fPIC -pthread -fopenmp -std=c99`命令修改成`gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99`[对应代码](https://huggingface.co/THUDM/chatglm-6b-int4/blob/63d66b0572d11cedd5574b38da720299599539b3/quantization.py#L168)见下:

```python
# 第二步
## 找到包含`gcc -O3 -fPIC -pthread -fopenmp -std=c99`的这一行
## 修改成
# 第二步: 找到包含`gcc -O3 -fPIC -pthread -fopenmp -std=c99`的这一行,并修改成
compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99 {} -shared -o {}".format(source_code, kernel_file)
```

Expand All @@ -233,7 +228,7 @@ import platform
## ...
## 上述相应部分修改为(请自行改一下缩进):
if platform.uname()[0] == 'Darwin':
compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99-o {}".format(
compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99 -o {}".format(
source_code, kernel_file)
else:
compile_command = "gcc -O3 -fPIC -pthread -fopenmp -std=c99 {} -shared -o {}".format(
Expand Down
26 changes: 11 additions & 15 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,26 +200,22 @@ clang: error: unsupported option '-fopenmp'

Take the quantified int4 version [chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4) for example, the following extra steps are needed:

1. Install `libomp`;
2. Configure `gcc`.
#### Install `libomp`

```bash
# STEP 1: install libopenmp, check `https://mac.r-project.org/openmp/` for details
## Assumption: `gcc -v >= 14.x`, read the R-Poject before run the code:
## Assumption: `gcc(clang) >= 14.x`, read the R-Poject before run the code:
curl -O https://mac.r-project.org/openmp/openmp-14.0.6-darwin20-Release.tar.gz
sudo tar fvxz openmp-14.0.6-darwin20-Release.tar.gz -C /
## Four files are installed:
# usr/local/lib/libomp.dylib
# usr/local/include/ompt.h
# usr/local/include/omp.h
# usr/local/include/omp-tools.h
```
Four files (`/usr/local/lib/libomp.dylib`, `/usr/local/include/ompt.h`, `/usr/local/include/omp.h`, `/usr/local/include/omp-tools.h`) are installed accordingly.

For `chatglm-6b-int4`, modify the [quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/blob/main/quantization.py)file. In the file, change the `gcc -O3 -fPIC -pthread -fopenmp -std=c99` configuration to `gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99`[corresponding python code](https://huggingface.co/THUDM/chatglm-6b-int4/blob/63d66b0572d11cedd5574b38da720299599539b3/quantization.py#L168), i.e.:
#### Configure `gcc` with `-fopenmp`

Next, modify the [quantization.py](https://huggingface.co/THUDM/chatglm-6b-int4/blob/main/quantization.py) file of the `chatglm-6b-int4` project. In the file, change the `gcc -O3 -fPIC -pthread -fopenmp -std=c99` configuration to `gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99` (check the corresponding python code [HERE](https://huggingface.co/THUDM/chatglm-6b-int4/blob/63d66b0572d11cedd5574b38da720299599539b3/quantization.py#L168)), i.e.:

```python
# STEP
## Change the line contains `gcc -O3 -fPIC -pthread -fopenmp -std=c99` to:
# STEP 2: Change the line contains `gcc -O3 -fPIC -pthread -fopenmp -std=c99` to:
compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99 {} -shared -o {}".format(source_code, kernel_file)
```

Expand All @@ -231,14 +227,14 @@ import platform
## ...
## change the corresponding lines to:
if platform.uname()[0] == 'Darwin':
compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99-o {}".format(
compile_command = "gcc -O3 -fPIC -Xclang -fopenmp -pthread -lomp -std=c99 -o {}".format(
source_code, kernel_file)
else:
compile_command = "gcc -O3 -fPIC -pthread -fopenmp -std=c99 {} -shared -o {}".format(
source_code, kernel_file)
```

> Notice: If you have run the `chatglm` project and failed, you may want to clean the cache of Huggingface before your next try, i.e. `rm -rf ${HOME}/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4`. Since `rm` is used, please MAKE SURE that the command deletes the right files.
> Notice: If you have run the `ChatGLM` project and failed, you may want to clean the cache of Huggingface before your next try, i.e. `rm -rf ${HOME}/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4`. Since `rm` is used, please MAKE SURE that the command deletes the right files.
### GPU Inference on Mac
For Macs (and MacBooks) with Apple Silicon, it is possible to use the MPS backend to run ChatGLM-6B on the GPU. First, you need to refer to Apple's [official instructions](https://developer.apple.com/metal/pytorch) to install PyTorch-Nightly.
Expand All @@ -248,15 +244,15 @@ Currently you must [load the model locally](README_en.md#load-the-model-locally)
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).half().to('mps')
```

For Mac users with Mac >= 13.3, one may encounter errors related to `half()` method. Use `float()` instead:
For Mac users with Mac OS >= 13.3, one may encounter errors related to the `half()` method. Use the `float()` method instead:

```python
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).float().to('mps')
```

Then you can use GPU-accelerated model inference on Mac.

> Notice: there is no promblem to run the non-quantified version of ChatGLM with MPS. One could check [this issue](https://github.com/THUDM/ChatGLM-6B/issues/462) to run the quantified version with MPS as the backend (and get `ERRORS`). Unzip/unpack [kernel](https://huggingface.co/THUDM/chatglm-6b/blob/658202d88ac4bb782b99e99ac3adff58b4d0b813/quantization.py#L27) as an `ELF` file shows its backend is `cuda`.
> Notice: there is no promblem to run the non-quantified version of ChatGLM with MPS. One could check [this issue](https://github.com/THUDM/ChatGLM-6B/issues/462) to run the quantified version with MPS as the backend (and get `ERRORS`). Unpacking [kernel](https://huggingface.co/THUDM/chatglm-6b/blob/658202d88ac4bb782b99e99ac3adff58b4d0b813/quantization.py#L27) as an `ELF` file shows its backend is `cuda`, which fails on MPS currently (`torch 2.1.0.dev20230502`).
### Multi-GPU Deployment
If you have multiple GPUs, but the memory size of each GPU is not sufficient to accommodate the entire model, you can split the model across multiple GPUs.
Expand Down

0 comments on commit 8983d4b

Please sign in to comment.