From 68fb5fdac5f04e5431c3da79153b920c3d3beba2 Mon Sep 17 00:00:00 2001
From: idontkonwher <zequnyang@outlook.com>
Date: Mon, 23 Sep 2024 19:37:19 +0800
Subject: [PATCH 1/3] [METAX] Support llama for MX C550

---
 llm/metax/llama/README.md | 153 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+)
 create mode 100644 llm/metax/llama/README.md

diff --git a/llm/metax/llama/README.md b/llm/metax/llama/README.md
new file mode 100644
index 000000000000..16c3f4f85ae2
--- /dev/null
+++ b/llm/metax/llama/README.md
@@ -0,0 +1,153 @@
+## 🚣‍♂️ 使用PaddleNLP在MX C550下跑通llama2-13b模型 🚣
+
+PaddleNLP在曦云®C550（[了解沐曦](https://www.metax-tech.com/)）上对llama2-13B模型进行了深度适配和优化，该套件实现了曦云C550和GPU的训推入口完全统一，达到了『无缝切换』的效果。
+曦云 C500 系列 GPU 是沐曦基于自主知识产权 GPU IP 打造的旗舰系列产品，具有强大的多精度混合算力，64GB 大容量高带宽内存，以及先进的多卡互联 MetaLink 技术。它搭载MXMACA®软件栈，全面兼容主流 GPU 生态，应用迁移零成本，
+可方便快捷地支撑智算、通用计算和数据处理等应用场景。
+
+## 🚀 快速开始 🚀
+
+### （0）在开始之前，您需要有一台插有曦云C550机器，对此机器的系统要求如下：
+
+| 芯片类型 | vbios版本 | MXMACA版本      |
+| -------- | --------- | --------------- |
+| 曦云C550 | ≥ 1.13  | ≥ 2.23.0.1018 |
+
+**注：如果需要验证您的机器是否插有曦云C550 GPU，只需系统环境下输入以下命令，看是否有输出：**
+
+```
+mx-smi
+
+#输出如下
+mx-smi  version: 2.1.6
+
+=================== MetaX System Management Interface Log ===================
+Timestamp                                         : Mon Sep 23 06:24:52 2024
+
+Attached GPUs                                     : 8
++---------------------------------------------------------------------------------+
+| MX-SMI 2.1.6                        Kernel Mode Driver Version: 2.5.014         |
+| MACA Version: 2.23.0.1018           BIOS Version: 1.13.4.0                      |
+|------------------------------------+---------------------+----------------------+
+| GPU         NAME                   | Bus-id              | GPU-Util             |
+| Temp        Power                  | Memory-Usage        |                      |
+|====================================+=====================+======================|
+| 0           MXC550                 | 0000:2a:00.0        | 0%                   |
+| 31C         44W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+| 1           MXC550                 | 0000:3a:00.0        | 0%                   |
+| 31C         46W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+| 2           MXC550                 | 0000:4c:00.0        | 0%                   |
+| 31C         47W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+| 3           MXC550                 | 0000:5c:00.0        | 0%                   |
+| 31C         46W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+| 4           MXC550                 | 0000:aa:00.0        | 0%                   |
+| 30C         46W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+| 5           MXC550                 | 0000:ba:00.0        | 0%                   |
+| 31C         47W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+| 6           MXC550                 | 0000:ca:00.0        | 0%                   |
+| 30C         46W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+| 7           MXC550                 | 0000:da:00.0        | 0%                   |
+| 30C         47W                    | 810/65536 MiB       |                      |
++------------------------------------+---------------------+----------------------+
+
++---------------------------------------------------------------------------------+
+| Process:                                                                        |
+|  GPU                    PID         Process Name                 GPU Memory     |
+|                                                                  Usage(MiB)     |
+|=================================================================================|
+|  no process found                                                               |
++---------------------------------------------------------------------------------+
+```
+
+### （1）环境准备：(这将花费您5~55min时间)
+
+1. 使用容器构建运行环境（可选）
+
+```
+ # 您可以使用 --device=/dev/dri/card0 指定仅GPU 0在容器内可见（其它卡同理），--device=/dev/dri 表示所有GPU可见
+docker run -it --rm --device=/dev/dri
+    --device=/dev/mxcd --group-add video -network=host --uts=host --ipc=host --privileged=true --shm-size 128g {image id}
+```
+
+2. 安装MXMACA软件栈
+
+```
+# 假设您已下载并解压好MXMACA驱动
+sudo bash /path/to/maca_package/mxmaca-sdk-install.sh
+您可以联系 MetaX 或访问 https://sw-download.metax-tech.com 获取对应的安装包。
+```
+
+3. 安装PaddlePaddle
+
+①如果您已经通过Metax获取了 PaddlePaddle安装包，您可以直接进行安装：
+
+`pip install paddlepaddle_gpu-2.6.0+mc*.whl`
+
+②您也可以通过源码自行编译PaddlePaddle安装包，请确保您已经正确安装MXMACA软件栈。编译过程使用了基于MXMACA的cu-bridge编译工具，您可以访问[文档](https://gitee.com/p4ul/cu-bridge/tree/master/docs/02_User_Manual)获取更多信息。
+
+```
+
+# 1. 访问 PaddlePaddle github仓库clone代码并切换至mxmaca分支.
+git clone https://github.com/PaddlePaddle/Paddle.git
+git checkout release-mxmaca/2.6
+# 2. 拉取第三方依赖
+git submodule update --init
+# 3. 配置环境变量
+export MACA_PATH=/real?maca/install/path
+export CUDA_PATH=/real/cuda/install/path
+export CUCC_PATH=${MACA_PATH}/tools/cu-bridge
+export PATH=${CUDA_PATH}/bin:${CUCC_PATH}/bin:${CUCC_PATH}/tools:${MACA_PATH}/bin:$PATH
+export LD_LIBRARY_PATH=${MACA_PATH}/lib:${MACA_PATH}/mxgpu_llvm/lib:${LD_LIBRARY_PATH}
+# 4. 检查配置是否正确
+cucc --version
+# 5. 执行编译
+makdir -p build && cd build
+cmake_maca .. -DPY_VERSION=3.8 -DWITH_GPU=ON -DWITH_DISTRIBUTE=ON -DWITH_NCCL=ON
+make_maca -j64
+# 6. 等待编译完成后安装whl包
+pip install python/dist/paddlepaddle_gpu*.whl
+```
+
+4. 克隆PaddleNLP仓库代码，并安装依赖
+
+```
+# PaddleNLP是基于paddlepaddle『飞桨』的自然语言处理和大语言模型(LLM)开发库，存放了基于『飞桨』框架实现的各种大模型，llama2-13B模型也包含其中。为了便于您更好地使用PaddleNLP，您需要clone整个仓库。
+git clone https://github.com/PaddlePaddle/PaddleNLP.git
+cd PaddleNLP
+python -m pip install -r requirements.txt
+python -m pip install -e .
+```
+
+### （2）推理：(这将花费您5~10min时间)
+
+1. 尝试运行推理demo
+
+```
+cd llm
+python predictor.py --model_name_or_path meta-llama/llama-13b-chat --dtype bfloat16 --output_file "infer.json" --batch_size 1
+```
+
+成功运行后，可以查看到推理结果的生成，样例如下：
+
+```
+***********Source**********
+解释一下温故而知新
+***********Target**********
+
+***********Output**********
+ "温故而知新" (wēn gù er zhī xīn) is a Chinese idiom that means "to know the old in order to discover the new." It is often used to describe the idea that one can gain a deeper understanding of something by studying its history and roots, rather than just focusing on the present moment.
+
+The phrase is often used in the context of learning and education, where it suggests that students should study the classics and the history of their subject in order to gain a more profound understanding of it. By understanding the origins and development of a subject, students can gain a deeper appreciation of its principles and concepts, and be better equipped to apply them in new and innovative ways.
+
+In a broader sense, "温故而知新" can also be applied to life in general. By studying the past and understanding the traditions and cultural heritage of one's community, individuals can gain a deeper understanding of the present and be better prepared to face the challenges of the future.
+
+In short, "温故而知新" is a reminder that understanding the past is essential to moving forward and making progress in any field or aspect of life.
+```
+
+2. 您也可以尝试参考 [文档](../../../legacy/examples/benchmark/wiki_lambada/README.md) 中的说明使用 wikitext 数据集验证推理精度。

From ad594aba744bf934cb500c2ac75d0cdf38c62a9f Mon Sep 17 00:00:00 2001
From: idontkonwher <zequnyang@outlook.com>
Date: Thu, 10 Oct 2024 18:17:16 +0800
Subject: [PATCH 2/3] fix typo, add paddlenlp branch, add greedy strategy for
 demo

---
 llm/metax/llama/README.md | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/llm/metax/llama/README.md b/llm/metax/llama/README.md
index 16c3f4f85ae2..51550b3a2530 100644
--- a/llm/metax/llama/README.md
+++ b/llm/metax/llama/README.md
@@ -99,7 +99,7 @@ git checkout release-mxmaca/2.6
 # 2. 拉取第三方依赖
 git submodule update --init
 # 3. 配置环境变量
-export MACA_PATH=/real?maca/install/path
+export MACA_PATH=/real/maca/install/path
 export CUDA_PATH=/real/cuda/install/path
 export CUCC_PATH=${MACA_PATH}/tools/cu-bridge
 export PATH=${CUDA_PATH}/bin:${CUCC_PATH}/bin:${CUCC_PATH}/tools:${MACA_PATH}/bin:$PATH
@@ -120,6 +120,7 @@ pip install python/dist/paddlepaddle_gpu*.whl
 # PaddleNLP是基于paddlepaddle『飞桨』的自然语言处理和大语言模型(LLM)开发库，存放了基于『飞桨』框架实现的各种大模型，llama2-13B模型也包含其中。为了便于您更好地使用PaddleNLP，您需要clone整个仓库。
 git clone https://github.com/PaddlePaddle/PaddleNLP.git
 cd PaddleNLP
+git checkout origin/release/3.0-beta1
 python -m pip install -r requirements.txt
 python -m pip install -e .
 ```
@@ -129,8 +130,8 @@ python -m pip install -e .
 1. 尝试运行推理demo
 
 ```
-cd llm
-python predictor.py --model_name_or_path meta-llama/llama-13b-chat --dtype bfloat16 --output_file "infer.json" --batch_size 1
+cd llm/predict
+python predictor.py --model_name_or_path meta-llama/Llama-2-13b-chat --dtype bfloat16 --output_file "infer.json" --batch_size 1 --decode_strategy "greedy_search"
 ```
 
 成功运行后，可以查看到推理结果的生成，样例如下：
@@ -141,13 +142,13 @@ python predictor.py --model_name_or_path meta-llama/llama-13b-chat --dtype bfloa
 ***********Target**********
 
 ***********Output**********
- "温故而知新" (wēn gù er zhī xīn) is a Chinese idiom that means "to know the old in order to discover the new." It is often used to describe the idea that one can gain a deeper understanding of something by studying its history and roots, rather than just focusing on the present moment.
+ "温故而知新" (wēn gù er zhī xīn) is a Chinese idiom that means "to know the old and appreciate the new." It is often used to describe the idea that one can gain a deeper understanding and appreciation of something by studying its history and traditions, and then applying that knowledge to new situations and challenges.
 
-The phrase is often used in the context of learning and education, where it suggests that students should study the classics and the history of their subject in order to gain a more profound understanding of it. By understanding the origins and development of a subject, students can gain a deeper appreciation of its principles and concepts, and be better equipped to apply them in new and innovative ways.
+The word "温" (wēn) in this idiom means "old" or "ancient," and "故" (gù) means "former" or "past." The word "知" (zhī) means "to know" or "to understand," and "新" (xīn) means "new."
 
-In a broader sense, "温故而知新" can also be applied to life in general. By studying the past and understanding the traditions and cultural heritage of one's community, individuals can gain a deeper understanding of the present and be better prepared to face the challenges of the future.
+This idiom is often used in the context of education, where it is believed that students should be taught the traditional methods and theories of a subject before being introduced to new and innovative ideas. By understanding the history and foundations of a subject, students can better appreciate and apply the new ideas and techniques that they are learning.
 
-In short, "温故而知新" is a reminder that understanding the past is essential to moving forward and making progress in any field or aspect of life.
+In addition to education, "温故而知新" can also be applied to other areas of life, such as business, where it is important to understand the traditions and practices of the industry before introducing new products or services. By understanding the past and the foundations of a particular field, one can gain a deeper appreciation of the present and make more informed decisions about the future.
 ```
 
 2. 您也可以尝试参考 [文档](../../../legacy/examples/benchmark/wiki_lambada/README.md) 中的说明使用 wikitext 数据集验证推理精度。

From 4a9b2eae3d68ecbcc40cf1dd9f165a7e2903d67c Mon Sep 17 00:00:00 2001
From: idontkonwher <zequnyang@outlook.com>
Date: Mon, 28 Oct 2024 00:00:40 +0800
Subject: [PATCH 3/3] add MXMACA software stack acquisition path

---
 llm/metax/llama/README.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/llm/metax/llama/README.md b/llm/metax/llama/README.md
index 51550b3a2530..a1ed77f3dda6 100644
--- a/llm/metax/llama/README.md
+++ b/llm/metax/llama/README.md
@@ -72,15 +72,17 @@ Attached GPUs                                     : 8
 ```
  # 您可以使用 --device=/dev/dri/card0 指定仅GPU 0在容器内可见（其它卡同理），--device=/dev/dri 表示所有GPU可见
 docker run -it --rm --device=/dev/dri
-    --device=/dev/mxcd --group-add video -network=host --uts=host --ipc=host --privileged=true --shm-size 128g {image id}
+    --device=/dev/mxcd --group-add video -network=host --uts=host --ipc=host --privileged=true --shm-size 128g registry.baidubce.com/paddlepaddle/paddle:2.6.1-gpu-cuda11.7-cudnn8.4-trt8.4
 ```
 
 2. 安装MXMACA软件栈
 
+   > 您可以联系 fae_support@metax-tech.com 以获取MXMACA安装包及技术支持， 已授权用户可以访问[沐曦软件中心](https://sw-download.metax-tech.com/login)获取相关安装包。
+   >
+
 ```
 # 假设您已下载并解压好MXMACA驱动
 sudo bash /path/to/maca_package/mxmaca-sdk-install.sh
-您可以联系 MetaX 或访问 https://sw-download.metax-tech.com 获取对应的安装包。
 ```
 
 3. 安装PaddlePaddle