Skip to content

Commit

Permalink
Add: zh/dpo-trl.md (#1390)
Browse files Browse the repository at this point in the history
* update soc3-zn

* Update _blog.yml

Try to resolve conflicts

* Update: proofreading zh/ethics-soc-3.md

* add how-to-generate cn version

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* unity game in hf space translation completed

* Update: punctuations of how-to-generate.md

* hf-bitsandbytes-integration cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Proofread hf-bitsandbytes-integration.md

* Proofread: red-teaming.md

* Update: add red-teaming to zh/_blog.yml

* Update _blog.yml

* Update: add red-teaming to zh/_blog.yml

Fix: red-teaming title in zh/_blog.yml

* Fix: red-teaming PPLM translation

* deep-learning-with-proteins cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Add: stackllama.md

* if blog translation completed

* Update unity-in-spaces.md

Add a link for AI game

* Update if.md

Fix “普罗大众” to “普惠大众”

* deep-learning-with-proteins cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* add starcoder cn

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: formatting and punctuations of starcoder.md

* add starcoder cn

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: proofreading zh/unity-in-spaces.md

* fix(annotated-diffusion.md): fix image shape desc in PIL and Tensor (#1080)

modifiy the comment after ToTensor with the correct image shape CHW

* Add text-to-video blog (#1058)

Adds an overview of text-to-video generative models, task specific challenges, datasets, and more.

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix broken link in text-to-video.md (#1083)

* Update: proofreading zh/unity-in-spaces.md

Fix: incorrect _blog.yml format

* Update: proofreading zh/deep-learning-with-proteins.md

* update ethics-diffusers-cn (#6)

* update ethics-diffusers

* update ethics-diffusers

---------

Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: proofreading zh/ethics-diffusers.md

* 1. introducing-csearch done (#11)

2. text-to-video done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: proofread zh/text-to-video.md

* Update: proofreading zh/introducing-csearch.md

* generative-ai-models-on-intel-cpu cn done (#13)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/generative-ai-models-on-intel-cpu.md
Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com>

* add starchat-alpha zh translation (#10)

* Preparing blogpost annoucing `safetensors` security audit + official support. (#1096)

* Preparing blogpost annoucing `safetensors` security audit + official
support.

* Taking into account comments + Grammarly.

* Update safetensors-official.md

* Apply suggestions from code review

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update safetensors-official.md

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

* Update safetensors-official.md

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

* Adding thumbnail.

* Include changes from Stella.

* Update safetensors-official.md

* Update with Stella's comments.

* Remove problematic sentence.

* Rename + some rephrasing.

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Update safetensors-security-audit.md

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Last fixes.

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Hotfixing safetensors. (#1131)

* Removing the checklist formatting is busted. (#1132)

* Update safetensors-security-audit.md (#1134)

* [time series transformers] update dataloader API (#1135)

* update dataloader API

* revert comment

* add back Cached transform

* New post: Hugging Face and IBM (#1130)

* Initial version

* Minor fixes

* Update huggingface-and-ibm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update huggingface-and-ibm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Resize image

* Update blog index

---------

Co-authored-by: Julien Simon <julsimon@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Show authors of safetensors blog post (#1137)

Update: proofread zh/starchat-alpha.md

* add megatron-training & assisted-generation (#8)

* add megatron-training

* add megatron-training

* add megatron-training

* add megatron-training

* add assisted-generation

* add assisted-generation

* add assisted-generation

* Update: proofreading zh/assisted-generation

* Update: proofread zh/megatron-training.md

* rwkv model blog translation completed (#12)

* rwkv model blog translation completed

* add 3 additional parts in the blog tail

* Update: proofread zh/rwkv.md

* Fix: missing subtitle/notes for image references.

* encoder-decoder cn done (#14)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: proofread zh/encoder-decoder.md

* constrained-beam-search cn done (#15)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/constrained-beam-search.md

* Update: zh/unity-api.md + zh/unity-asr.md

* unity ai speech recognition blog translation completed

* add (GameObject) to attach its Chinese translation

* finish unity-api translation

* add unity series entry to zh/_blog.yml

* Update: proofread zh/unity-{api,asr}.md

* Update zh/falcon.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: zh/falcon.md

* instruction-tuning-sd cn done (#21)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: zh/instruction-tuning-sd.md

* fine-tune-whisper cn done (#23)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: zh/fine-tune-whisper.md

* add mms_adapters and policy (#22)

Update: zh/policy-ntia-rfc.md

* Update: refine zh/mms_adapters.md

Update: remove incompleted file

* Update: zh/llm-leaderboard.md, zh/autoformer.md

* add llm-leaderboard CN translation

* add CN translation for autoformer

* Update: proofreading zh/autoformer.md

* BridgeTower blog post (#1118)

* Update BridgeTower blog post (#1277)

* LLM Eval: minor typos and nits (#1263)

* Fix anchor link to custom pipeline section. (#485)

* Update: zh/llm-leaderboard.md, zh/autoformer.md

* add llm-leaderboard CN translation

* add CN translation for autoformer

Update: proofreading zh/autoformer.md

Update: proofreading zh/llm-leaderboard.md

* Update: proofreading zh/ethics-soc-4.md

* Update "How to deploy LLM" blog post to use `huggingface_hub` in example  (#1290)

* Use InferenceClient from huggingface_hub

* Update inference-endpoints-llm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update BridgeTower blog post (#1295)

* Removed duplicate numbering (#1171)

* Update: zh/evaluating-mmlu-leaderboard.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

Update: proofreading zh/evaluating-mmlu-leaderboard.md

* Translate train-optimize-sd-intel.md to zh (#16)

* Translate "stackllama" into Chinese

* Create train-optimize-sd-intel.md

Add new

Update: zh/train-optimize-sd-intel.md

* Update: zh/dedup.md & zh/stable-diffusion-finetuning-intel.md

* dedup cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* stable-diffusion-finetuning-intel cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/stable-diffusion-finetuning-intel.md

* Update: proofread zh/dedup.md

* Update: zh/inference-endpoints-llm.md

Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

Update: proofread zh/inference-endpoints-llm.md

* Update: zh/llama2.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Proofread: zh/llama2.md

* Update: zh/diffusers-turns-1.md

Proofread: zh/diffusers-turns-1.md

* Fix: zh/diffusers-turns-1.md wrong meta data format

Policy blog: Open ML Considerations in the EU AI Act (#1342)

* Create .gitignore

* Add files via upload

* Create eu-ai-act-oss.md

* Delete .gitignore

* Update eu-ai-act-oss.md

* Update eu-ai-act-oss.md

* Update eu-ai-act-oss.md

* Update _blog.yml

* Update eu-ai-act-oss.md

* Update: zh/game-jam-first-edition-results.md

Update: zh/game-jam-first-edition-results.md

* Add: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md

* 3 Gaudi posts cn done:
 - bridgetower.md
 - getting-started-habana.md
 - habana-gaudi-2-benchmark.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md

* Add: zh/transformers-design-philosophy.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/transformers-design-philosophy.md

* Add: zh/os-llms.md

* Translate os-llms.md
* Update _blog.yml

Update: proofread zh/os-llms.md

* Add: zh/dpo-trl.md

* update soc3-zn

* Update _blog.yml

Try to resolve conflicts

* Update: proofreading zh/ethics-soc-3.md

* add how-to-generate cn version

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* unity game in hf space translation completed

* Update: punctuations of how-to-generate.md

* hf-bitsandbytes-integration cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Proofread hf-bitsandbytes-integration.md

* Proofread: red-teaming.md

* Update: add red-teaming to zh/_blog.yml

* Update _blog.yml

* Update: add red-teaming to zh/_blog.yml

Fix: red-teaming title in zh/_blog.yml

* Fix: red-teaming PPLM translation

* deep-learning-with-proteins cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Add: stackllama.md

* if blog translation completed

* Update unity-in-spaces.md

Add a link for AI game

* Update if.md

Fix “普罗大众” to “普惠大众”

* deep-learning-with-proteins cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* add starcoder cn

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: formatting and punctuations of starcoder.md

* add starcoder cn

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: proofreading zh/unity-in-spaces.md

* fix(annotated-diffusion.md): fix image shape desc in PIL and Tensor (#1080)

modifiy the comment after ToTensor with the correct image shape CHW

* Add text-to-video blog (#1058)

Adds an overview of text-to-video generative models, task specific challenges, datasets, and more.

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix broken link in text-to-video.md (#1083)

* Update: proofreading zh/unity-in-spaces.md

Fix: incorrect _blog.yml format

* Update: proofreading zh/deep-learning-with-proteins.md

* update ethics-diffusers-cn (#6)

* update ethics-diffusers

* update ethics-diffusers

---------

Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: proofreading zh/ethics-diffusers.md

* 1. introducing-csearch done (#11)

2. text-to-video done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: proofread zh/text-to-video.md

* Update: proofreading zh/introducing-csearch.md

* generative-ai-models-on-intel-cpu cn done (#13)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/generative-ai-models-on-intel-cpu.md
Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com>

* add starchat-alpha zh translation (#10)

* Preparing blogpost annoucing `safetensors` security audit + official support. (#1096)

* Preparing blogpost annoucing `safetensors` security audit + official
support.

* Taking into account comments + Grammarly.

* Update safetensors-official.md

* Apply suggestions from code review

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update safetensors-official.md

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

* Update safetensors-official.md

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Apply suggestions from code review

* Adding thumbnail.

* Include changes from Stella.

* Update safetensors-official.md

* Update with Stella's comments.

* Remove problematic sentence.

* Rename + some rephrasing.

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Update safetensors-security-audit.md

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Last fixes.

* Apply suggestions from code review

Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>

* Hotfixing safetensors. (#1131)

* Removing the checklist formatting is busted. (#1132)

* Update safetensors-security-audit.md (#1134)

* [time series transformers] update dataloader API (#1135)

* update dataloader API

* revert comment

* add back Cached transform

* New post: Hugging Face and IBM (#1130)

* Initial version

* Minor fixes

* Update huggingface-and-ibm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update huggingface-and-ibm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Resize image

* Update blog index

---------

Co-authored-by: Julien Simon <julsimon@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Show authors of safetensors blog post (#1137)

Update: proofread zh/starchat-alpha.md

* add megatron-training & assisted-generation (#8)

* add megatron-training

* add megatron-training

* add megatron-training

* add megatron-training

* add assisted-generation

* add assisted-generation

* add assisted-generation

* Update: proofreading zh/assisted-generation

* Update: proofread zh/megatron-training.md

* rwkv model blog translation completed (#12)

* rwkv model blog translation completed

* add 3 additional parts in the blog tail

* Update: proofread zh/rwkv.md

* Fix: missing subtitle/notes for image references.

* encoder-decoder cn done (#14)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: proofread zh/encoder-decoder.md

* constrained-beam-search cn done (#15)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/constrained-beam-search.md

* Update: zh/unity-api.md + zh/unity-asr.md

* unity ai speech recognition blog translation completed

* add (GameObject) to attach its Chinese translation

* finish unity-api translation

* add unity series entry to zh/_blog.yml

* Update: proofread zh/unity-{api,asr}.md

* Update zh/falcon.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: zh/falcon.md

* instruction-tuning-sd cn done (#21)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: zh/instruction-tuning-sd.md

* fine-tune-whisper cn done (#23)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update: zh/fine-tune-whisper.md

* add mms_adapters and policy (#22)

Update: zh/policy-ntia-rfc.md

* Update: refine zh/mms_adapters.md

Update: remove incompleted file

* Update: zh/llm-leaderboard.md, zh/autoformer.md

* add llm-leaderboard CN translation

* add CN translation for autoformer

* Update: proofreading zh/autoformer.md

* BridgeTower blog post (#1118)

* Update BridgeTower blog post (#1277)

* LLM Eval: minor typos and nits (#1263)

* Fix anchor link to custom pipeline section. (#485)

* Update: zh/llm-leaderboard.md, zh/autoformer.md

* add llm-leaderboard CN translation

* add CN translation for autoformer

Update: proofreading zh/autoformer.md

Update: proofreading zh/llm-leaderboard.md

* Update: proofreading zh/ethics-soc-4.md

* Update "How to deploy LLM" blog post to use `huggingface_hub` in example  (#1290)

* Use InferenceClient from huggingface_hub

* Update inference-endpoints-llm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update BridgeTower blog post (#1295)

* Removed duplicate numbering (#1171)

* Update: zh/evaluating-mmlu-leaderboard.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

Update: proofreading zh/evaluating-mmlu-leaderboard.md

* Translate train-optimize-sd-intel.md to zh (#16)

* Translate "stackllama" into Chinese

* Create train-optimize-sd-intel.md

Add new

Update: zh/train-optimize-sd-intel.md

* Update: zh/dedup.md & zh/stable-diffusion-finetuning-intel.md

* dedup cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* stable-diffusion-finetuning-intel cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/stable-diffusion-finetuning-intel.md

* Update: proofread zh/dedup.md

* Update: zh/inference-endpoints-llm.md

Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

Update: proofread zh/inference-endpoints-llm.md

* Update: zh/llama2.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Proofread: zh/llama2.md

* Update: zh/diffusers-turns-1.md

Proofread: zh/diffusers-turns-1.md

* Fix: zh/diffusers-turns-1.md wrong meta data format

Policy blog: Open ML Considerations in the EU AI Act (#1342)

* Create .gitignore

* Add files via upload

* Create eu-ai-act-oss.md

* Delete .gitignore

* Update eu-ai-act-oss.md

* Update eu-ai-act-oss.md

* Update eu-ai-act-oss.md

* Update _blog.yml

* Update eu-ai-act-oss.md

* Update: zh/game-jam-first-edition-results.md

Update: zh/game-jam-first-edition-results.md

* Add: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md

* 3 Gaudi posts cn done:
 - bridgetower.md
 - getting-started-habana.md
 - habana-gaudi-2-benchmark.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/bridgetower.md, zh/getting-started-habana.md, zh/habana-gaudi-2-benchmark.md

* Add: zh/transformers-design-philosophy.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

Update: proofread zh/transformers-design-philosophy.md

* dpo-trl cn done

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com>
Co-authored-by: innovation64 <liyang19991126@126.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>
Co-authored-by: SuSung-boy <872414318@qq.com>
Co-authored-by: Zhongdong Yang <yangzd1996@outlook.com>
Co-authored-by: Luke Cheng <2258420+chenglu@users.noreply.github.com>
Co-authored-by: yaoqih <40328311+yaoqih@users.noreply.github.com>
Co-authored-by: Shiliang Chen <36809537+csl122@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: 李洋 <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Hoi2022 <120370631+Hoi2022@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>
Co-authored-by: Victor Muštar <victor.mustar@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Julien Simon <3436143+juliensimon@users.noreply.github.com>
Co-authored-by: Julien Simon <julsimon@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: gxy-gxy <57594446+gxy-gxy@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Eswar Divi <76403422+EswarDivi@users.noreply.github.com>
Co-authored-by: Qi Zhang <82949744+Vermillion-de@users.noreply.github.com>

* Update: proofread zh/dpo-trl.md

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: Yang, Zhongdong <zhongdong_y@outlook.com>
Co-authored-by: innovation64 <liyang19991126@126.com>
Co-authored-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: SuSung-boy <872414318@qq.com>
Co-authored-by: Luke Cheng <2258420+chenglu@users.noreply.github.com>
Co-authored-by: yaoqih <40328311+yaoqih@users.noreply.github.com>
Co-authored-by: Shiliang Chen <36809537+csl122@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: 李洋 <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Yao Matrix <yaoweifeng0301@126.com>
Co-authored-by: Hoi2022 <120370631+Hoi2022@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: DeltaPenrose <128761972+DeltaPenrose@users.noreply.github.com>
Co-authored-by: Victor Muštar <victor.mustar@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Julien Simon <3436143+juliensimon@users.noreply.github.com>
Co-authored-by: Julien Simon <julsimon@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: gxy-gxy <57594446+gxy-gxy@users.noreply.github.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Eswar Divi <76403422+EswarDivi@users.noreply.github.com>
Co-authored-by: Qi Zhang <82949744+Vermillion-de@users.noreply.github.com>
  • Loading branch information
1 parent 5baf661 commit 970dc3b
Show file tree
Hide file tree
Showing 2 changed files with 206 additions and 1 deletion.
13 changes: 12 additions & 1 deletion zh/_blog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -807,4 +807,15 @@
- guide
- cv
- diffusion
- game-dev
- game-dev

- local: dpo-trl
title: "使用 DPO 微调 Llama 2"
author: kashif
thumbnail: /blog/assets/157_dpo_trl/dpo_thumbnail.png
date: August 8, 2023
tags:
- rl
- rlhf
- nlp

194 changes: 194 additions & 0 deletions zh/dpo-trl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
---
title: "使用 DPO 微调 Llama 2"
thumbnail: /blog/assets/157_dpo_trl/dpo_thumbnail.png
authors:
- user: kashif
- user: ybelkada
- user: lvwerra
translators:
- user: MatrixYao
- user: zhongdongy
proofreader: true
---

# 使用 DPO 微调 Llama 2

<!-- {blog_metadata} -->
<!-- {authors} -->

## 简介

基于人类反馈的强化学习 (Reinforcement Learning from Human Feedback,RLHF) 事实上已成为 GPT-4 或 Claude 等 LLM 训练的最后一步,它可以确保语言模型的输出符合人类在闲聊或安全性等方面的期望。然而,它也给 NLP 引入了一些 RL 相关的复杂性: 既要构建一个好的奖励函数,并训练一个模型用以估计每个状态的价值 (value) ; 又要注意最终生成的 LLM 不能与原始模型相差太远,如果太远的话会使得模型容易产生乱码而非有意义的文本。该过程非常复杂,涉及到许多复杂的组件,而这些组件本身在训练过程中又是动态变化的,因此把它们料理好并不容易。

Rafailov、Sharma、Mitchell 等人最近发表了一篇论文 [Direct Preference Optimization](https://arxiv.org/abs/2305.18290),论文提出将现有方法使用的基于强化学习的目标转换为可以通过简单的二元交叉熵损失直接优化的目标,这一做法大大简化了 LLM 的提纯过程。

本文介绍了直接偏好优化 (Direct Preference Optimization,DPO) 法,该方法现已集成至 [TRL 库](https://github.com/lvwerra/trl) 中。同时,我们还展示了如何在 [stack-exchange preference](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) 数据集上微调最新的 Llama v2 7B 模型, `stack-exchange preference` 数据集中包含了各个 `stack-exchange` 门户上的各种问题及其排序后的回答。

## DPO 与 PPO

在通过 RL 优化人类衍生偏好时,一直以来的传统做法是使用一个辅助奖励模型来微调目标模型,以通过 RL 机制最大化目标模型所能获得的奖励。直观上,我们使用奖励模型向待优化模型提供反馈,以促使它多生成高奖励输出,少生成低奖励输出。同时,我们使用冻结的参考模型来确保输出偏差不会太大,且继续保持输出的多样性。这通常需要在目标函数设计时,除了奖励最大化目标外再添加一个相对于参考模型的 KL 惩罚项,这样做有助于防止模型学习作弊或钻营奖励模型。

DPO 绕过了建模奖励函数这一步,这源于一个关键洞见: 从奖励函数到最优 RL 策略的分析映射。这个映射直观地度量了给定奖励函数与给定偏好数据的匹配程度。有了它,作者就可与将基于奖励和参考模型的 RL 损失直接转换为仅基于参考模型的损失,从而直接在偏好数据上优化语言模型!因此,DPO 从寻找最小化 RLHF 损失的最佳方案开始,通过改变参量的方式推导出一个 _仅需_ 参考模型的损失!

有了它,我们可以直接优化该似然目标,而不需要奖励模型或繁琐的强化学习优化过程。

## 如何使用 TRL 进行训练

如前所述,一个典型的 RLHF 流水线通常包含以下几个环节:

1. 有监督微调 (supervised fine-tuning,SFT)
2. 用偏好标签标注数据
3. 基于偏好数据训练奖励模型
4. RL 优化

TRL 库包含了所有这些环节所需的工具程序。而 DPO 训练直接消灭了奖励建模和 RL 这两个环节 (环节 3 和 4),直接根据标注好的偏好数据优化 DPO 目标。

使用 DPO,我们仍然需要执行环节 1,但我们仅需在 TRL 中向 `DPOTrainer` 提供环节 2 准备好的偏好数据,而不再需要环节 3 和 4。标注好的偏好数据需要遵循特定的格式,它是一个含有以下 3 个键的字典:

- `prompt` : 即推理时输入给模型的提示
- `chosen` : 即针对给定提示的较优回答
- `rejected` : 即针对给定提示的较劣回答或非给定提示的回答

例如,对于 `stack-exchange preference` 数据集,我们可以通过以下工具函数将数据集中的样本映射至上述字典格式并删除所有原始列:

```python
def return_prompt_and_responses(samples) -> Dict[str, str, str]:
return {
"prompt": [
"Question: " + question + "\n\nAnswer: "
for question in samples["question"]
],
"chosen": samples["response_j"], # rated better than k
"rejected": samples["response_k"], # rated worse than j
}

dataset = load_dataset(
"lvwerra/stack-exchange-paired",
split="train",
data_dir="data/rl"
)
original_columns = dataset.column_names

dataset.map(
return_prompt_and_responses,
batched=True,
remove_columns=original_columns
)
```

一旦有了排序数据集,DPO 损失其实本质上就是一种有监督损失,其经由参考模型获得隐式奖励。因此,从上层来看,`DPOTrainer` 需要我们输入待优化的基础模型以及参考模型:

```python
dpo_trainer = DPOTrainer(
model, # 经 SFT 的基础模型
model_ref, # 一般为经 SFT 的基础模型的一个拷贝
beta=0.1, # DPO 的温度超参
train_dataset=dataset, # 上文准备好的数据集
tokenizer=tokenizer, # 分词器
args=training_args, # 训练参数,如: batch size, 学习率等
)
```

其中,超参 `beta` 是 DPO 损失的温度,通常在 `0.1``0.5` 之间。它控制了我们对参考模型的关注程度,`beta` 越小,我们就越忽略参考模型。对训练器初始化后,我们就可以简单调用以下方法,使用给定的 `training_args` 在给定数据集上进行训练了:

```python
dpo_trainer.train()
```

## 基于 Llama v2 进行实验

在 TRL 中实现 DPO 训练器的好处是,人们可以利用 TRL 及其依赖库 (如 Peft 和 Accelerate) 中已有的 LLM 相关功能。有了这些库,我们甚至可以使用 [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 库提供的 [QLoRA 技术](https://huggingface.co/blog/4bit-transformers-bitsandbytes) 来训练 Llama v2 模型。

### 有监督微调

如上文所述,我们先用 TRL 的 `SFTTrainer` 在 SFT 数据子集上使用 [QLoRA](https://arxiv.org/abs/2305.14314) 对 7B Llama v2 模型进行有监督微调:

```python
# load the base model in 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)

base_model = AutoModelForCausalLM.from_pretrained(
script_args.model_name, # "meta-llama/Llama-2-7b-hf"
quantization_config=bnb_config,
device_map={"": 0},
trust_remote_code=True,
use_auth_token=True,
)
base_model.config.use_cache = False

# add LoRA layers on top of the quantized base model
peft_config = LoraConfig(
r=script_args.lora_r,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
target_modules=["q_proj", "v_proj"],
bias="none",
task_type="CAUSAL_LM",
)
...
trainer = SFTTrainer(
model=base_model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
peft_config=peft_config,
packing=True,
max_seq_length=None,
tokenizer=tokenizer,
args=training_args, # HF Trainer arguments
)
trainer.train()
```

### DPO 训练

SFT 结束后,我们保存好生成的模型。接着,我们继续进行 DPO 训练,我们把 SFT 生成的模型作为 DPO 的基础模型和参考模型,并在上文生成的 `stack-exchange preference` 数据上,以 DPO 为目标函数训练模型。我们选择对模型进行 LoRa 微调,因此我们使用 Peft 的 `AutoPeftModelForCausalLM` 函数加载模型:

```python
model = AutoPeftModelForCausalLM.from_pretrained(
script_args.model_name_or_path, # location of saved SFT model
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
load_in_4bit=True,
is_trainable=True,
)
model_ref = AutoPeftModelForCausalLM.from_pretrained(
script_args.model_name_or_path, # same model as the main one
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
load_in_4bit=True,
)
...
dpo_trainer = DPOTrainer(
model,
model_ref,
args=training_args,
beta=script_args.beta,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
peft_config=peft_config,
)
dpo_trainer.train()
dpo_trainer.save_model()
```

可以看出,我们以 4 比特的方式加载模型,然后通过 `peft_config` 参数选择 QLora 方法对其进行训练。训练器还会用评估数据集评估训练进度,并报告一些关键指标,例如可以选择通过 WandB 记录并显示隐式奖励。最后,我们可以将训练好的模型推送到 HuggingFace Hub。

## 总结

SFT 和 DPO 训练脚本的完整源代码可在该目录 [examples/stack_llama_2](https://github.com/lvwerra/trl/tree/main/examples/research_projects/stack_llama_2) 处找到,训好的已合并模型也已上传至 HF Hub (见 [此处](https://huggingface.co/kashif/stack-llama-2))。

你可以在 [这儿](https://wandb.ai/krasul/huggingface/runs/c54lmder) 找到我们的模型在训练过程的 WandB 日志,其中包含了 `DPOTrainer` 在训练和评估期间记录下来的以下奖励指标:

- `rewards/chosen (较优回答的奖励) ` : 针对较优回答,策略模型与参考模型的对数概率二者之差的均值,按 `beta` 缩放。
- `rewards/rejected (较劣回答的奖励) ` : 针对较劣回答,策略模型与参考模型的对数概率二者之差的均值,按 `beta` 缩放。
- `rewards/accuracy (奖励准确率) ` : 较优回答的奖励大于相应较劣回答的奖励的频率的均值
- `rewards/margins (奖励余裕值) ` : 较优回答的奖励与相应较劣回答的奖励二者之差的均值。

直观上讲,在训练过程中,我们希望余裕值增加并且准确率达到 1.0,换句话说,较优回答的奖励高于较劣回答的奖励 (或余裕值大于零)。随后,我们还可以在评估数据集上计算这些指标。

我们希望我们代码的发布可以降低读者的入门门槛,让大家可以在自己的数据集上尝试这种大语言模型对齐方法,我们迫不及待地想看到你会用它做哪些事情!如果你想试试我们训练出来的模型,可以玩玩这个 space: [trl-lib/stack-llama](https://huggingface.co/spaces/trl-lib/stack-llama)

0 comments on commit 970dc3b

Please sign in to comment.