Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add config file for model chatglm2,gemma,yuan #9139

Merged
merged 10 commits into from
Oct 16, 2024

Conversation

Mangodadada
Copy link
Contributor

PR types

PR changes

Description

Copy link

paddle-bot bot commented Sep 13, 2024

Thanks for your contribution!

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


huxinye seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

codecov bot commented Sep 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.08%. Comparing base (cd4e816) to head (2356563).
Report is 300 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9139      +/-   ##
===========================================
+ Coverage    53.06%   53.08%   +0.02%     
===========================================
  Files          656      657       +1     
  Lines       106147   106521     +374     
===========================================
+ Hits         56324    56548     +224     
- Misses       49823    49973     +150     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Mangodadada
Copy link
Contributor Author

fix some dpo code and add a dpo dox

@@ -0,0 +1,174 @@
# 飞桨大模型套件 DPO 文档
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更新一下模型的支持列表建议把DPO改成DPO/SimPO/ORPOhttps://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm#%EF%B8%8F-%E6%94%AF%E6%8C%81%E6%A8%A1%E5%9E%8B%E5%88%97%E8%A1%A8-%EF%B8%8F
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同时提一下DPO基础上出现一些衍生算法,比如SimPO、。。。。,我们可以直接通过修改config配置切换不同算法

llm/docs/dpo.md Outdated
...
```

为了方便测试,我们也提供了广告生成数据集可以直接使用:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

广告生成数据集改成ultrafeedback_binarized demo数据集

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

# DPO 启动命令参考
python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_argument.json
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增一个###2.4DPO LoRA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

```bash
# DPO 启动命令参考
python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_argument.json
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在下面新增一个Note表示通过修改loss_type,目前支持SimPO、ORPO。。。。等算法,顺便附上相应的论文跳转链接

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

llm/docs/dpo.md Outdated
接下来我们将以**Llama 3**为例介绍如何使用统一脚本进行 DPO。
### 2.1 环境准备
- PaddlePaddle 3.0-beta
- PaddleNLP 3.0.0b1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PaddleNLP develop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

llm/docs/dpo.md Outdated
### DPO 参数(DPOArguments)
- `beta`: DPO 损失函数的 beta 参数,默认为 0.1。
- `simpo_gamma`: SimPO 损失函数的 gamma 参数,默认为 0.5。
- `normalize_logps`: 是否应用对数概率归一化,默认为 True。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去掉这个,把argument里面也去掉,没用到

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

llm/docs/dpo.md Outdated
</font>
</div>

序列构造完成后我们需要将多个序列拼接一个 batch,并进行 padding(零填充)操作,使每个 batch 长度相同。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是多个序列构造为一个序列,组batch这个不需要提

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

llm/docs/dpo.md Outdated
序列拼接后将各序列的掩码拼接形成 batch 掩码

<div align="center">
<img width="500" alt="llm" src="https://github.com/user-attachments/assets/37203f1c-b213-4521-8b70-980f4814cdbc">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个图和后面的图重复了,要一个就行,建议保留下面那个

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

llm/docs/dpo.md Outdated
在训练过程中,我们需要防止模型在生成序列时提前看到未来的信息,具体来说,在处理序列数据时,模型只能看到当前 token 及其之前的 token,而不能看到之后的 token,我们使用掩码来实现这一功能。针对 dpo 数据序列,每条数据包含的内容是是提示,chosen 和 rejected,我们需要让模型读到的数据是 **提示+chosen** 和 **提示+rejected**,同样使用掩码实现。

<div align="center">
<img width="500" alt="llm" src="https://github.com/user-attachments/assets/20a3622f-c33d-48d4-af5e-777464a83dfb">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以两个图拼在一块

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

llm/docs/dpo.md Outdated
</font>
</div>

在训练过程中,我们需要防止模型在生成序列时提前看到未来的信息,具体来说,在处理序列数据时,模型只能看到当前 token 及其之前的 token,而不能看到之后的 token,我们使用掩码来实现这一功能。针对 dpo 数据序列,每条数据包含的内容是是提示,chosen 和 rejected,我们需要让模型读到的数据是 **提示+chosen** 和 **提示+rejected**,同样使用掩码实现。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在训练过程中,我们通过重新构造attention_mask的方式,无需考虑Attention计算过程中序列边界的问题。https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/docs/algorithm_overview.md#11-greedy-zero-padding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了

@lugimzzz
Copy link
Contributor

需要考虑软链到PaddleNLP/docs目录下

@@ -57,6 +57,7 @@
大模型统一存储文档 <llm/docs/unified_checkpoint.md>
混合并行训练教程 <llm/docs/llm_trainer.rst>
模型权重转换教程 <llm/docs/torch2paddle.md>
大模型DPO文档 <llm/docs/dpo.md>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充软链接

llm/docs/dpo.md Outdated
# 到达运行目录
```
### 2.2 数据准备
我们支持的精调数据格式是每行包含一个字典的 json 文件,每个字典包含以下字段:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是精调数据格式是偏好数据格式

llm/docs/dpo.md Outdated
...
```

为了方便测试,我们将[ultrafeedback_binarized demo](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)数据集处理成广告生成数据集,使用方式如下:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们将ultrafeedback_binarized demo数据集处理成对应的数据集格式

llm/docs/dpo.md Outdated
simpo([SimPO](https://arxiv.org/abs/2405.14734)),默认为 `sigmoid`。
- `pref_loss_ratio`: DPO 损失比率,默认为 1.0。
- `sft_loss_ratio`: SFT 损失比率,默认为 0.0。
- `dpop_lambda`: dpop_lambda,默认为 50,详情可见文档[DPOP](https://arxiv.org/pdf/2402.13228)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档换成论文

llm/docs/dpo.md Outdated
</font>
</div>

序列构造完成后我们需要将多个序列构造为一个序列,并进行 padding(零填充)操作,使每个构造后的序列长度相同。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padding不是零填充,这个数据流的算法叫零填充。写填充上pad tokens,注意同步修改图片

| [Bloom](./config/bloom) | ❌ | ✅ | ✅ | ✅ | 🚧 | 🚧 | ✅ | ✅ |
| [GPT-3](./config/gpt-3) | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | ✅ |
| [OPT](./config/opt) | 🚧 | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | ✅ |
| [Gemma](./config/gemma) | 🚧 | ✅ |🚧 | 🚧 | ✅ | 🚧 | 🚧 | 🚧 |
| [Yuan](./config/yuan) | ✅ | ✅ |✅ | 🚧 | ✅ | 🚧 | 🚧 | 🚧 |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm/docs/dpo.md Outdated
</font>
</div>

序列构造完成后我们需要将多个序列构造为一个序列,并填充写填充上 pad tokens,使每个构造后的序列长度相同。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这句话语序有点问题

llm/docs/dpo.md Outdated
- `tensor_parallel_degree`: 此参数 tensor_parallel_degree 表示将一层 transformer 结构的份数,该方法对通信开销较大,但可以节约显存,建议 tensor_parallel_degree<=8, 尽量使用机器内部通信。
- `sharding_parallel_degree`: 分组参数切片的数据并行大小。
- `sharding`: 是否使用 Sharding 数据并行功能,这里为 `"stage1"`。
- `recompute`: 重计算,暂支持 full 策略。开启后可降低显存以达到增大 batch size 的目的。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一点没写!!!!

llm/docs/dpo.md Outdated
wget https://bj.bcebos.com/paddlenlp/datasets/examples/ultrafeedback_binarized.tar.gz
tar -zxvf ultrafeedback_binarized.tar.gz
```
### 2.3 全参 DPO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这里写了全参和LoRA DPO两种情况要不考虑换个标题比如DPO训练之类

Copy link
Contributor

@lugimzzz lugimzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit cf0f478 into PaddlePaddle:develop Oct 16, 2024
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants