-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add config file for model chatglm2,gemma,yuan #9139
Conversation
Thanks for your contribution! |
huxinye seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #9139 +/- ##
===========================================
+ Coverage 53.06% 53.08% +0.02%
===========================================
Files 656 657 +1
Lines 106147 106521 +374
===========================================
+ Hits 56324 56548 +224
- Misses 49823 49973 +150 ☔ View full report in Codecov by Sentry. |
fix some dpo code and add a dpo dox |
@@ -0,0 +1,174 @@ | |||
# 飞桨大模型套件 DPO 文档 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同时提一下DPO基础上出现一些衍生算法,比如SimPO、。。。。,我们可以直接通过修改config配置切换不同算法
llm/docs/dpo.md
Outdated
... | ||
``` | ||
|
||
为了方便测试,我们也提供了广告生成数据集可以直接使用: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
广告生成数据集改成ultrafeedback_binarized demo数据集
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
# DPO 启动命令参考 | ||
python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_argument.json | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增一个###2.4DPO LoRA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
```bash | ||
# DPO 启动命令参考 | ||
python -u -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./alignment/dpo/run_dpo.py ./config/llama/dpo_argument.json | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在下面新增一个Note表示通过修改loss_type,目前支持SimPO、ORPO。。。。等算法,顺便附上相应的论文跳转链接
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
llm/docs/dpo.md
Outdated
接下来我们将以**Llama 3**为例介绍如何使用统一脚本进行 DPO。 | ||
### 2.1 环境准备 | ||
- PaddlePaddle 3.0-beta | ||
- PaddleNLP 3.0.0b1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddleNLP develop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
llm/docs/dpo.md
Outdated
### DPO 参数(DPOArguments) | ||
- `beta`: DPO 损失函数的 beta 参数,默认为 0.1。 | ||
- `simpo_gamma`: SimPO 损失函数的 gamma 参数,默认为 0.5。 | ||
- `normalize_logps`: 是否应用对数概率归一化,默认为 True。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
去掉这个,把argument里面也去掉,没用到
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
llm/docs/dpo.md
Outdated
</font> | ||
</div> | ||
|
||
序列构造完成后我们需要将多个序列拼接一个 batch,并进行 padding(零填充)操作,使每个 batch 长度相同。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是多个序列构造为一个序列,组batch这个不需要提
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
llm/docs/dpo.md
Outdated
序列拼接后将各序列的掩码拼接形成 batch 掩码 | ||
|
||
<div align="center"> | ||
<img width="500" alt="llm" src="https://github.com/user-attachments/assets/37203f1c-b213-4521-8b70-980f4814cdbc"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个图和后面的图重复了,要一个就行,建议保留下面那个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
llm/docs/dpo.md
Outdated
在训练过程中,我们需要防止模型在生成序列时提前看到未来的信息,具体来说,在处理序列数据时,模型只能看到当前 token 及其之前的 token,而不能看到之后的 token,我们使用掩码来实现这一功能。针对 dpo 数据序列,每条数据包含的内容是是提示,chosen 和 rejected,我们需要让模型读到的数据是 **提示+chosen** 和 **提示+rejected**,同样使用掩码实现。 | ||
|
||
<div align="center"> | ||
<img width="500" alt="llm" src="https://github.com/user-attachments/assets/20a3622f-c33d-48d4-af5e-777464a83dfb"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以两个图拼在一块
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
llm/docs/dpo.md
Outdated
</font> | ||
</div> | ||
|
||
在训练过程中,我们需要防止模型在生成序列时提前看到未来的信息,具体来说,在处理序列数据时,模型只能看到当前 token 及其之前的 token,而不能看到之后的 token,我们使用掩码来实现这一功能。针对 dpo 数据序列,每条数据包含的内容是是提示,chosen 和 rejected,我们需要让模型读到的数据是 **提示+chosen** 和 **提示+rejected**,同样使用掩码实现。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在训练过程中,我们通过重新构造attention_mask的方式,无需考虑Attention计算过程中序列边界的问题。https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/docs/algorithm_overview.md#11-greedy-zero-padding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改了
需要考虑软链到PaddleNLP/docs目录下 |
@@ -57,6 +57,7 @@ | |||
大模型统一存储文档 <llm/docs/unified_checkpoint.md> | |||
混合并行训练教程 <llm/docs/llm_trainer.rst> | |||
模型权重转换教程 <llm/docs/torch2paddle.md> | |||
大模型DPO文档 <llm/docs/dpo.md> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
补充软链接
llm/docs/dpo.md
Outdated
# 到达运行目录 | ||
``` | ||
### 2.2 数据准备 | ||
我们支持的精调数据格式是每行包含一个字典的 json 文件,每个字典包含以下字段: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是精调数据格式是偏好数据格式
llm/docs/dpo.md
Outdated
... | ||
``` | ||
|
||
为了方便测试,我们将[ultrafeedback_binarized demo](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)数据集处理成广告生成数据集,使用方式如下: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我们将ultrafeedback_binarized demo数据集处理成对应的数据集格式
llm/docs/dpo.md
Outdated
simpo([SimPO](https://arxiv.org/abs/2405.14734)),默认为 `sigmoid`。 | ||
- `pref_loss_ratio`: DPO 损失比率,默认为 1.0。 | ||
- `sft_loss_ratio`: SFT 损失比率,默认为 0.0。 | ||
- `dpop_lambda`: dpop_lambda,默认为 50,详情可见文档[DPOP](https://arxiv.org/pdf/2402.13228) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档换成论文
llm/docs/dpo.md
Outdated
</font> | ||
</div> | ||
|
||
序列构造完成后我们需要将多个序列构造为一个序列,并进行 padding(零填充)操作,使每个构造后的序列长度相同。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
padding不是零填充,这个数据流的算法叫零填充。写填充上pad tokens,注意同步修改图片
| [Bloom](./config/bloom) | ❌ | ✅ | ✅ | ✅ | 🚧 | 🚧 | ✅ | ✅ | | ||
| [GPT-3](./config/gpt-3) | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | 🚧 | ✅ | | ||
| [OPT](./config/opt) | 🚧 | ✅ | ✅ | 🚧 | 🚧 | 🚧 | 🚧 | ✅ | | ||
| [Gemma](./config/gemma) | 🚧 | ✅ |🚧 | 🚧 | ✅ | 🚧 | 🚧 | 🚧 | | ||
| [Yuan](./config/yuan) | ✅ | ✅ |✅ | 🚧 | ✅ | 🚧 | 🚧 | 🚧 | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llm/docs/dpo.md
Outdated
</font> | ||
</div> | ||
|
||
序列构造完成后我们需要将多个序列构造为一个序列,并填充写填充上 pad tokens,使每个构造后的序列长度相同。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这句话语序有点问题
llm/docs/dpo.md
Outdated
- `tensor_parallel_degree`: 此参数 tensor_parallel_degree 表示将一层 transformer 结构的份数,该方法对通信开销较大,但可以节约显存,建议 tensor_parallel_degree<=8, 尽量使用机器内部通信。 | ||
- `sharding_parallel_degree`: 分组参数切片的数据并行大小。 | ||
- `sharding`: 是否使用 Sharding 数据并行功能,这里为 `"stage1"`。 | ||
- `recompute`: 重计算,暂支持 full 策略。开启后可降低显存以达到增大 batch size 的目的。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一点没写!!!!
llm/docs/dpo.md
Outdated
wget https://bj.bcebos.com/paddlenlp/datasets/examples/ultrafeedback_binarized.tar.gz | ||
tar -zxvf ultrafeedback_binarized.tar.gz | ||
``` | ||
### 2.3 全参 DPO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这里写了全参和LoRA DPO两种情况要不考虑换个标题比如DPO训练之类
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
PR changes
Description