shibing624 · shibing624 · Apr 18, 2024 · Apr 17, 2024
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624) 
+[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)
 
 <div align="center">
  <a href="https://github.com/shibing624/MedicalGPT">
@@ -19,7 +19,7 @@
 
 ## 📖 Introduction
 
-**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, 
+**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining,
 Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preference Optimization).
 
 **MedicalGPT** 训练医疗大模型，实现了包括增量预训练、有监督微调、RLHF(奖励建模、强化学习训练)和DPO(直接偏好优化)。
@@ -60,7 +60,7 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(
 
 - 第一阶段：PT(Continue PreTraining)增量预训练，在海量领域文档数据上二次预训练GPT模型，以适应领域数据分布（可选）
 - 第二阶段：SFT(Supervised Fine-tuning)有监督微调，构造指令微调数据集，在预训练模型基础上做指令精调，以对齐指令意图，并注入领域知识
-- 第三阶段 
+- 第三阶段
  - RLHF(Reinforcement Learning from Human Feedback)基于人类反馈对语言模型进行强化学习，分为两步：
  - RM(Reward Model)奖励模型建模，构造人类偏好排序数据集，训练奖励模型，用来建模人类偏好，主要是"HHH"原则，具体是"helpful, honest, harmless"
  - RL(Reinforcement Learning)强化学习，用奖励模型来训练SFT模型，生成模型使用奖励或惩罚来更新其策略，以便生成更高质量、更符合人类偏好的文本
@@ -71,7 +71,7 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(
 ### Release Models
 
 
-| Model | Base Model | Introduction | 
+| Model | Base Model | Introduction |
 |:------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型，医疗问答效果有提升，发布微调后的LoRA权重(单轮对话) |
 | [shibing624/ziya-llama-13b-medical-merged](https://huggingface.co/shibing624/ziya-llama-13b-medical-merged) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型，医疗问答效果有提升，发布微调后的完整模型权重(单轮对话) |
@@ -105,15 +105,15 @@ CUDA_VISIBLE_DEVICES=0 python gradio_demo.py --model_type base_model_type --base
 
 ## 💾 Install
 #### Updating the requirements
-From time to time, the `requirements.txt` changes. To update, use this command:
+`requirements.txt`会不时更新. 使用以下命令更新依赖:
 
 ```markdown
 git clone https://github.com/shibing624/MedicalGPT
 cd MedicalGPT
 pip install -r requirements.txt --upgrade
 ```
 
-#### Hardware Requirement(显存/VRAM)
+#### Hardware Requirement (显存/VRAM)
 
 
 | 训练方法 | 精度 | 7B | 13B | 30B | 65B | 8x7B |
@@ -127,14 +127,14 @@ pip install -r requirements.txt --upgrade
 
 Training Stage:
 
-| Stage | Introduction | Python script | Shell script | 
+| Stage | Introduction | Python script | Shell script |
 |:-------------------------------|:-------------|:--------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
-| Continue Pretraining | 增量预训练 | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) | 
-| Supervised Fine-tuning | 有监督微调 | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) | 
-| Direct Preference Optimization | 直接偏好优化 | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) | 
-| Reward Modeling | 奖励模型建模 | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) | 
-| Reinforcement Learning | 强化学习 | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) | 
-| ORPO | 概率偏好优化 | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) | 
+| Continue Pretraining | 增量预训练 | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |
+| Supervised Fine-tuning | 有监督微调 | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |
+| Direct Preference Optimization | 直接偏好优化 | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) |
+| Reward Modeling | 奖励模型建模 | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |
+| Reinforcement Learning | 强化学习 | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) |
+| ORPO | 概率偏好优化 | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) |
 
 - 提供完整PT+SFT+DPO全阶段串起来训练的pipeline：[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) ，其对应的colab： [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb)，运行完大概需要15分钟，我运行成功后的副本colab：[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kMIe3pTec2snQvLBA00Br8ND1_zwy3Gr?usp=sharing)
 - 提供完整PT+SFT+RLHF全阶段串起来训练的pipeline：[run_training_ppo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) ，其对应的colab： [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) ，运行完大概需要20分钟，我运行成功后的副本colab：[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RGkbev8D85gR33HJYxqNdnEThODvGUsS?usp=sharing)
@@ -209,7 +209,7 @@ yi:
 - [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
 - [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B)
 
-## 💻 Inference 
+## 💻 Inference
 训练完成后，现在我们加载训练好的模型，验证模型生成文本的效果。
 
 ```shell
@@ -267,7 +267,7 @@ CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 inference_multigpu_demo.py
 </details>
 
 
-## 📚 Dataset 
+## 📚 Dataset
 ### 医疗数据集
 
 - 240万条中文医疗数据集(包括预训练、指令微调和奖励数据集)：[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)
@@ -342,7 +342,7 @@ MedicalGPT项目代码的授权协议为 [The Apache License 2.0](/LICENSE)，
 
 之后即可提交PR。
 
-## 💕 Acknowledgements 
+## 💕 Acknowledgements
 
 - [Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf)
 - [tloen/alpaca-lora](https://github.com/tloen/alpaca-lora/blob/main/finetune.py)

diff --git a/README_EN.md b/README_EN.md
@@ -1,4 +1,4 @@
-[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624) 
+[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)
 
 <div align="center">
  <a href="https://github.com/shibing624/MedicalGPT">
@@ -19,7 +19,7 @@
 
 ## 📖 Introduction
 
-**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, 
+**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining,
 Supervised Finetuning, Reward Modeling and Reinforcement Learning.
 
 
@@ -117,7 +117,17 @@ sh run_ppo.sh
 [Training Detail wiki](https://github.com/shibing624/MedicalGPT/wiki/Training-Details)
 
 
-### Hardware Requirement(VRAM)
+## 💾 Install
+#### Updating the requirements
+From time to time, the `requirements.txt` changes. To update, use this command:
+
+```markdown
+git clone https://github.com/shibing624/MedicalGPT
+cd MedicalGPT
+pip install -r requirements.txt --upgrade
+```
+
+### Hardware Requirement (VRAM)
 
 | Method | Bits | 7B | 13B | 30B | 65B | 8x7B |
 | ------ | ---- | ----- | ----- | ----- | ------ | ------ |
@@ -126,7 +136,7 @@ sh run_ppo.sh
 | QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | 80GB |
 | QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 32GB |
 
-## 🔥 Inference 
+## 🔥 Inference
 After the training is complete, now we load the trained model to verify the effect of the model generating text.
 
 ```shell
@@ -160,7 +170,7 @@ Parameter Description:
 <br/>
 
 
-## 📚 Dataset 
+## 📚 Dataset
 
 - 2.4 million Chinese medical datasets (including pre-training, instruction fine-tuning and reward datasets): [shibing624/medical](https://huggingface.co/datasets/shibing624/medical)
 
@@ -208,7 +218,7 @@ The project code is still very rough. If you have improved the code, you are wel
 
 Then you can submit a PR.
 
-## 💕 Acknowledgements 
+## 💕 Acknowledgements
 
 - [tloen/alpaca-lora](https://github.com/tloen/alpaca-lora/blob/main/finetune.py)
 - [ymcui/Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)

diff --git a/requirements.txt b/requirements.txt
@@ -1,10 +1,10 @@
+accelerate~=0.27.2
+datasets>=2.14.6
 loguru
-transformers>=4.39.3
+peft~=0.10.0
 sentencepiece
-datasets>=2.14.6
-tqdm
+scikit-learn
 tensorboard
 tqdm>=4.47.0
-peft~=0.10.0
-accelerate~=0.27.2
+transformers>=4.39.3
 trl~=0.8.3
diff --git a/reward_modeling.py b/reward_modeling.py
@@ -13,7 +13,7 @@
 import torch
 from datasets import load_dataset
 from loguru import logger
-from peft import LoraConfig, TaskType, get_peft_model, PeftModel, prepare_model_for_int8_training
+from peft import LoraConfig, TaskType, get_peft_model, PeftModel, prepare_model_for_kbit_training
 from sklearn.metrics import mean_squared_error, mean_absolute_error
 from torch.utils.data import Dataset
 from transformers import (
@@ -425,7 +425,7 @@ def main():
  else:
  logger.info("Init new peft model")
  if model_args.load_in_8bit:
- model = prepare_model_for_int8_training(model)
+ model = prepare_model_for_kbit_training(model)
  target_modules = script_args.target_modules.split(',') if script_args.target_modules else None
  if target_modules and 'all' in target_modules:
  target_modules = find_all_linear_names(model, int4=False, int8=model_args.load_in_8bit)