add config file for model chatglm2,gemma,yuan #9139

Mangodadada · 2024-09-13T09:20:23Z

PR types

PR changes

Description

paddle-bot · 2024-09-13T09:20:28Z

Thanks for your contribution!

CLAassistant · 2024-09-13T09:20:33Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

huxinye seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2024-09-19T05:52:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.08%. Comparing base (cd4e816) to head (2356563).
Report is 300 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9139      +/-   ##
===========================================
+ Coverage    53.06%   53.08%   +0.02%     
===========================================
  Files          656      657       +1     
  Lines       106147   106521     +374     
===========================================
+ Hits         56324    56548     +224     
- Misses       49823    49973     +150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Mangodadada · 2024-09-20T07:11:51Z

fix some dpo code and add a dpo dox

lugimzzz · 2024-09-25T11:31:20Z

llm/docs/dpo.md

@@ -0,0 +1,174 @@
+# 飞桨大模型套件 DPO 文档


更新一下模型的支持列表建议把DPO改成DPO/SimPO/ORPOhttps://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm#%EF%B8%8F-%E6%94%AF%E6%8C%81%E6%A8%A1%E5%9E%8B%E5%88%97%E8%A1%A8-%EF%B8%8F

同时提一下DPO基础上出现一些衍生算法，比如SimPO、。。。。，我们可以直接通过修改config配置切换不同算法

lugimzzz · 2024-09-25T11:33:09Z

llm/docs/dpo.md

+...
+```
+
+为了方便测试，我们也提供了广告生成数据集可以直接使用：


广告生成数据集改成ultrafeedback_binarized demo数据集

已经按照要求修改了

lugimzzz · 2024-09-25T11:34:12Z

llm/docs/dpo.md

+# DPO 启动命令参考
+python -u  -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_argument.json
+```
+


新增一个###2.4DPO LoRA

已经按照要求修改了

lugimzzz · 2024-09-25T11:38:28Z

llm/docs/dpo.md

+```bash
+# DPO 启动命令参考
+python -u  -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_argument.json
+```


在下面新增一个Note表示通过修改loss_type,目前支持SimPO、ORPO。。。。等算法，顺便附上相应的论文跳转链接

已经按照要求修改了

lugimzzz · 2024-09-25T11:41:05Z

llm/docs/dpo.md

+接下来我们将以**Llama 3**为例介绍如何使用统一脚本进行 DPO。
+### 2.1 环境准备
+- PaddlePaddle 3.0-beta
+- PaddleNLP   3.0.0b1


PaddleNLP develop

已经按照要求修改了

lugimzzz · 2024-09-25T11:49:32Z

llm/docs/dpo.md

+### DPO 参数（DPOArguments）
+- `beta`: DPO 损失函数的 beta 参数，默认为 0.1。
+- `simpo_gamma`: SimPO 损失函数的 gamma 参数，默认为 0.5。
+- `normalize_logps`: 是否应用对数概率归一化，默认为 True。


去掉这个，把argument里面也去掉，没用到

已经按照要求修改了

lugimzzz · 2024-09-25T11:51:37Z

llm/docs/dpo.md

+     </font>
+</div>
+
+序列构造完成后我们需要将多个序列拼接一个 batch，并进行 padding（零填充）操作，使每个 batch 长度相同。


是多个序列构造为一个序列，组batch这个不需要提

已经按照要求修改了

lugimzzz · 2024-09-25T11:52:25Z

llm/docs/dpo.md

+序列拼接后将各序列的掩码拼接形成 batch 掩码
+
+<div align="center">
+    <img width="500" alt="llm" src="https://github.com/user-attachments/assets/37203f1c-b213-4521-8b70-980f4814cdbc">


这个图和后面的图重复了，要一个就行，建议保留下面那个

已经按照要求修改了

lugimzzz · 2024-09-25T11:54:45Z

llm/docs/dpo.md

+在训练过程中，我们需要防止模型在生成序列时提前看到未来的信息，具体来说，在处理序列数据时，模型只能看到当前 token 及其之前的 token，而不能看到之后的 token，我们使用掩码来实现这一功能。针对 dpo 数据序列，每条数据包含的内容是是提示，chosen 和 rejected，我们需要让模型读到的数据是 **提示+chosen** 和 **提示+rejected**，同样使用掩码实现。
+
+<div align="center">
+    <img width="500" alt="llm" src="https://github.com/user-attachments/assets/20a3622f-c33d-48d4-af5e-777464a83dfb">


可以两个图拼在一块

已经按照要求修改了

lugimzzz · 2024-09-25T11:57:20Z

llm/docs/dpo.md

+     </font>
+</div>
+
+在训练过程中，我们需要防止模型在生成序列时提前看到未来的信息，具体来说，在处理序列数据时，模型只能看到当前 token 及其之前的 token，而不能看到之后的 token，我们使用掩码来实现这一功能。针对 dpo 数据序列，每条数据包含的内容是是提示，chosen 和 rejected，我们需要让模型读到的数据是 **提示+chosen** 和 **提示+rejected**，同样使用掩码实现。


在训练过程中，我们通过重新构造attention_mask的方式，无需考虑Attention计算过程中序列边界的问题。https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/docs/algorithm_overview.md#11-greedy-zero-padding

已经按照要求修改了

lugimzzz · 2024-09-25T11:57:54Z

需要考虑软链到PaddleNLP/docs目录下

lugimzzz · 2024-10-14T10:52:52Z

docs/index.rst

@@ -57,6 +57,7 @@
   大模型统一存储文档 <llm/docs/unified_checkpoint.md>
   混合并行训练教程 <llm/docs/llm_trainer.rst>
   模型权重转换教程 <llm/docs/torch2paddle.md>
+   大模型DPO文档 <llm/docs/dpo.md>


补充软链接

lugimzzz · 2024-10-14T10:53:18Z

llm/docs/dpo.md

+    # 到达运行目录
+```
+### 2.2 数据准备
+我们支持的精调数据格式是每行包含一个字典的 json 文件，每个字典包含以下字段：


不是精调数据格式是偏好数据格式

lugimzzz · 2024-10-14T10:53:46Z

llm/docs/dpo.md

+...
+```
+
+为了方便测试，我们将[ultrafeedback_binarized demo](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)数据集处理成广告生成数据集，使用方式如下：


我们将ultrafeedback_binarized demo数据集处理成对应的数据集格式

lugimzzz · 2024-10-14T10:54:17Z

llm/docs/dpo.md

+simpo([SimPO](https://arxiv.org/abs/2405.14734)),默认为 `sigmoid`。
+- `pref_loss_ratio`: DPO 损失比率，默认为 1.0。
+- `sft_loss_ratio`: SFT 损失比率，默认为 0.0。
+- `dpop_lambda`: dpop_lambda，默认为 50，详情可见文档[DPOP](https://arxiv.org/pdf/2402.13228)


文档换成论文

lugimzzz · 2024-10-14T10:55:22Z

llm/docs/dpo.md

+     </font>
+</div>
+
+序列构造完成后我们需要将多个序列构造为一个序列，并进行 padding（零填充）操作，使每个构造后的序列长度相同。


padding不是零填充，这个数据流的算法叫零填充。写填充上pad tokens，注意同步修改图片

lugimzzz · 2024-10-14T10:56:34Z

llm/README.md

 | [Bloom](./config/bloom)                | ❌        | ✅   | ✅    | ✅             | 🚧  | 🚧   | ✅            | ✅             |
 | [GPT-3](./config/gpt-3)                | ✅        | ✅   | 🚧   | 🚧            | 🚧  | 🚧   | 🚧           | ✅             |
 | [OPT](./config/opt)                    | 🚧       | ✅   | ✅    | 🚧            | 🚧  | 🚧   | 🚧           | ✅             |
+| [Gemma](./config/gemma)                | 🚧       | ✅   |🚧     | 🚧            | ✅  | 🚧   | 🚧           | 🚧             |
+| [Yuan](./config/yuan)                  | ✅       | ✅   |✅     | 🚧            | ✅  | 🚧   | 🚧           | 🚧             |
+


在这里补充一下dpo.md的文档链接https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm#31-dpo

lugimzzz · 2024-10-16T06:53:43Z

llm/docs/dpo.md

+     </font>
+</div>
+
+序列构造完成后我们需要将多个序列构造为一个序列，并填充写填充上 pad tokens，使每个构造后的序列长度相同。


这句话语序有点问题

lugimzzz · 2024-10-16T06:54:08Z

llm/docs/dpo.md

+- `tensor_parallel_degree`: 此参数 tensor_parallel_degree 表示将一层 transformer 结构的份数，该方法对通信开销较大,但可以节约显存，建议 tensor_parallel_degree<=8, 尽量使用机器内部通信。
+- `sharding_parallel_degree`: 分组参数切片的数据并行大小。
+- `sharding`: 是否使用 Sharding 数据并行功能，这里为 `"stage1"`。
+- `recompute`: 重计算，暂支持 full 策略。开启后可降低显存以达到增大 batch size 的目的。


这一点没写！！！！

lugimzzz · 2024-10-16T06:54:54Z

llm/docs/dpo.md

+wget https://bj.bcebos.com/paddlenlp/datasets/examples/ultrafeedback_binarized.tar.gz
+tar -zxvf ultrafeedback_binarized.tar.gz
+```
+### 2.3 全参 DPO


你这里写了全参和LoRA DPO两种情况要不考虑换个标题比如DPO训练之类

lugimzzz

LGTM

add config file for model chatglm2,gemma,yuan

c4ef7da

huxinye added 2 commits September 18, 2024 16:26

fix some bug in dpo

cc28617

dpo docs

a37c112

huxinye added 2 commits September 25, 2024 17:14

modify dpo doc

929b6ef

modify dpo doc

2b6f1b7

lugimzzz reviewed Sep 25, 2024

View reviewed changes

huxinye added 2 commits October 14, 2024 17:21

fix dpo.md file, add soft link, fix readme.md

c5c38f2

delete useless argument

a88aa83

lugimzzz reviewed Oct 14, 2024

View reviewed changes

huxinye added 2 commits October 14, 2024 19:43

add soft link and doc explanation

74332bc

fix dpo.md

87b961c

lugimzzz reviewed Oct 16, 2024

View reviewed changes

fix dpo.md

2356563

lugimzzz approved these changes Oct 16, 2024

View reviewed changes

ZHUI merged commit cf0f478 into PaddlePaddle:develop Oct 16, 2024
9 of 12 checks passed

add config file for model chatglm2,gemma,yuan #9139

add config file for model chatglm2,gemma,yuan #9139

Conversation

Mangodadada commented Sep 13, 2024

PR types

PR changes

Description

paddle-bot bot commented Sep 13, 2024

CLAassistant commented Sep 13, 2024

codecov bot commented Sep 19, 2024 • edited Loading

Codecov Report

Mangodadada commented Sep 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lugimzzz commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lugimzzz left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 19, 2024 •

edited

Loading