Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/dataset_formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,7 @@ Choosing the right dataset type depends on the task you are working on and the s
| [`RewardTrainer`] | [Preference (implicit prompt recommended)](#preference) |
| [`RLOOTrainer`] | [Prompt-only](#prompt-only) |
| [`SFTTrainer`] | [Language modeling](#language-modeling) or [Prompt-completion](#prompt-completion) |
| [`XPOTrainer`] | [Prompt-only](#prompt-only) |
| [`experimental.xpo.XPOTrainer`] | [Prompt-only](#prompt-only) |

## Using any dataset with TRL: preprocessing and conversion

Expand Down
2 changes: 1 addition & 1 deletion docs/source/example_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Scripts are maintained in the [`trl/scripts`](https://github.com/huggingface/trl
| [`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models, so users may see unexpected behaviour in other model architectures. |
| [`examples/scripts/sft_vlm_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_gemma3.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Gemma 3 model on vision to text tasks. |
| [`examples/scripts/sft_vlm_smol_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_smol_vlm.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a SmolVLM model. |
| [`examples/scripts/xpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/xpo.py) | This script shows how to use the [`XPOTrainer`] to fine-tune a model. |
| [`examples/scripts/xpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/xpo.py) | This script shows how to use the [`experimental.xpo.XPOTrainer`] to fine-tune a model. |

## Distributed Training (for scripts)

Expand Down
6 changes: 3 additions & 3 deletions docs/source/vllm_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ trainer.train()

```python
from datasets import load_dataset
from trl import XPOTrainer, XPOConfig
from trl.experimental.xpo import XPOTrainer, XPOConfig

dataset = load_dataset("trl-lib/tldr", split="train")

Expand Down Expand Up @@ -392,7 +392,7 @@ training_args = NashMDConfig(
<hfoption id="XPO">

```python
from trl import XPOConfig
from trl.experimental.xpo import XPOConfig

training_args = XPOConfig(
...,
Expand Down Expand Up @@ -467,7 +467,7 @@ training_args = NashMDConfig(
<hfoption id="XPO">

```python
from trl import XPOConfig
from trl.experimental.xpo import XPOConfig

training_args = XPOConfig(
...,
Expand Down
4 changes: 2 additions & 2 deletions docs/source/xpo_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,11 +156,11 @@ While training and evaluating we record the following reward metrics:

## XPOTrainer

[[autodoc]] XPOTrainer
[[autodoc]] experimental.xpo.XPOTrainer
- train
- save_model
- push_to_hub

## XPOConfig

[[autodoc]] XPOConfig
[[autodoc]] experimental.xpo.XPOConfig
2 changes: 1 addition & 1 deletion tests/test_xpo_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
if is_peft_available():
from peft import LoraConfig, get_peft_model


@pytest.mark.low_priority
class TestXPOTrainer(TrlTestCase):
def setup_method(self):
self.model_id = "trl-internal-testing/tiny-Qwen2ForCausalLM-2.5"
Expand Down