This section of the documentation is widely outdated and rely only on PPO.
Ideally, we should have a clear documentation that shows how to use peft with SFT, DPO and GRPO at least, via the peft_config argument. We could have additional subsection about QLoRA and prompt-tuning.