Skip to content

Releases: huggingface/trl

v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models

16 Feb 08:22
0f13e51
Compare
Choose a tag to compare

DPO important fixes

We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:

  • [DPO] average_log_prob when loss is IPO by @kashif in #1265

We also fixed important bugs with respect to DPO / PEFT and Flash Attention

Data processing is now faster for multi-GPU envs

Other DPO bugfixes:

  • [PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289
  • Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
  • fix padding in dpo trainer by @pacman100 in #1284
  • Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
  • [DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307

Faster data processing and other enhancements:

Automatic tagging for all models

Models now gets tagged correctly even if users do not call trainer.push_to_hub()

What's Changed

New Contributors

Full Changelog: v0.7.10...v0.7.11

v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests

19 Jan 10:58
09ca760
Compare
Choose a tag to compare

v0.7.10: Minor fixes, Automatic templating, setup_chat_format API, stronger tests

This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.

Read more about it here: https://huggingface.co/docs/trl/sft_trainer#dataset-format-support

The release also introduces a new API setup_chat_format to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml format and we can add more formats in the future

Read more about it here: https://huggingface.co/docs/trl/sft_trainer#add-special-tokens-for-chat-format

We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py and sft.py should be well -battletested. If you see any issue with the script, please let us know on GitHub.

What's Changed

New Contributors

Full Changelog: v0.7.9...v0.7.10

v0.7.9: Patch release for DPO & SFTTrainer

09 Jan 12:06
7a95cc8
Compare
Choose a tag to compare

v0.7.9: Patch release for DPO & SFTTrainer

This is a patch release that fixes critical issues with SFTTrainer & DPOTrainer, together with minor fixes for PPOTrainer and DataCollatorForCompletionOnlyLM

What's Changed

Full Changelog: v0.7.8...v0.7.9

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

09 Jan 04:17
Compare
Choose a tag to compare

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

Unsloth tag for xxxTrainer

If users use Unsloth library, the unsloth tag gets automatically pushed on the Hub.

DPO fixes

Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster

  • Allow separate devices for target/ref models. by @jondurbin in #1190
  • Allow swapping PEFT adapters for target/ref model. by @jondurbin in #1193
  • Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in #1154

DDPO + PEFT

Now DDPO supports PEFT

Other fixes

New Contributors

Full Changelog: v0.7.7...v0.7.8

v0.7.7

26 Dec 09:27
Compare
Choose a tag to compare

v0.7.7: Patch release PPO & DDPO tags

A fix has been introduce to fix a breaking change with PPOTrainer.push_to_hub() and DDPOTrainer.push_to_hub()

What's Changed

New Contributors

Full Changelog: v0.7.6...v0.7.7

v0.7.6: Patch release - Multi-tag instead of single tags for `xxxTrainer`

22 Dec 14:10
Compare
Choose a tag to compare

Patch release: Multi-tag instead of single tags for xxxTrainer

This is a patch release to push multiple tags (e.g. trl & sft) instead of one tag

What's Changed

Full Changelog: v0.7.5...v0.7.6

v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

22 Dec 13:09
Compare
Choose a tag to compare

IPO & KTO & cDPO loss, DPOTrainer enhancements, automatic tags for xxxTrainer

Important enhancements for DPOTrainer

This release introduces many new features in TRL for DPOTrainer:

  • IPO-loss for a better generalization of DPO algorithm
  • KTO & cDPO loss
  • You can also pass pre-computed logits to DPOTrainer

Automatic xxxTrainer tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

What's Changed

New Contributors

Full Changelog: v0.7.4...v0.7.5

v0.7.4: Patch Release

10 Nov 15:07
Compare
Choose a tag to compare

Patch Release

This release is a patch release that addresses an issue for users that have TRL installed without PEFT

What's Changed

Full Changelog: v0.7.3...v0.7.4

v0.7.3:`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

10 Nov 15:06
Compare
Choose a tag to compare

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

Read more about it here

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

  • [DPO] fix DPO + GC issues by @younesbelkada in #927
  • [core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO by @younesbelkada in #912

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

What's Changed

New Contributors

Full Changelog: v0.7.2...v0.7.3

v0.7.2

12 Oct 13:32
Compare
Choose a tag to compare

0.7.2: Flash Attention documentation and Minor bugfixes

In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer

How to use Flash Attention with SFTTrainer:

What's Changed

New Contributors

Full Changelog: v0.7.1...v0.7.2