huggingface · qgallouedec · Nov 6, 2025 · Nov 6, 2025
diff --git a/docs/source/dataset_formats.md b/docs/source/dataset_formats.md
@@ -401,7 +401,7 @@ Choosing the right dataset type depends on the task you are working on and the s
 | [`RewardTrainer`] | [Preference (implicit prompt recommended)](#preference) |
 | [`RLOOTrainer`] | [Prompt-only](#prompt-only) |
 | [`SFTTrainer`] | [Language modeling](#language-modeling) or [Prompt-completion](#prompt-completion) |
-| [`XPOTrainer`] | [Prompt-only](#prompt-only) |
+| [`experimental.xpo.XPOTrainer`] | [Prompt-only](#prompt-only) |
 
 ## Using any dataset with TRL: preprocessing and conversion
 

diff --git a/docs/source/example_overview.md b/docs/source/example_overview.md
@@ -66,7 +66,7 @@ Scripts are maintained in the [`trl/scripts`](https://github.com/huggingface/trl
 | [`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models, so users may see unexpected behaviour in other model architectures. |
 | [`examples/scripts/sft_vlm_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_gemma3.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Gemma 3 model on vision to text tasks. |
 | [`examples/scripts/sft_vlm_smol_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm_smol_vlm.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a SmolVLM model. |
-| [`examples/scripts/xpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/xpo.py) | This script shows how to use the [`XPOTrainer`] to fine-tune a model. |
+| [`examples/scripts/xpo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/xpo.py) | This script shows how to use the [`experimental.xpo.XPOTrainer`] to fine-tune a model. |
 
 ## Distributed Training (for scripts)
 

diff --git a/docs/source/vllm_integration.md b/docs/source/vllm_integration.md
@@ -135,7 +135,7 @@ trainer.train()
 
 ```python
 from datasets import load_dataset
-from trl import XPOTrainer, XPOConfig
+from trl.experimental.xpo import XPOTrainer, XPOConfig
 
 dataset = load_dataset("trl-lib/tldr", split="train")
 
@@ -392,7 +392,7 @@ training_args = NashMDConfig(
 <hfoption id="XPO">
 
 ```python
-from trl import XPOConfig
+from trl.experimental.xpo import XPOConfig
 
 training_args = XPOConfig(
     ...,
@@ -467,7 +467,7 @@ training_args = NashMDConfig(
 <hfoption id="XPO">
 
 ```python
-from trl import XPOConfig
+from trl.experimental.xpo import XPOConfig
 
 training_args = XPOConfig(
     ...,

diff --git a/docs/source/xpo_trainer.md b/docs/source/xpo_trainer.md
@@ -156,11 +156,11 @@ While training and evaluating we record the following reward metrics:
 
 ## XPOTrainer
 
-[[autodoc]] XPOTrainer
+[[autodoc]] experimental.xpo.XPOTrainer
     - train
     - save_model
     - push_to_hub
 
 ## XPOConfig
 
-[[autodoc]] XPOConfig
+[[autodoc]] experimental.xpo.XPOConfig
diff --git a/tests/test_xpo_trainer.py b/tests/test_xpo_trainer.py
@@ -25,7 +25,7 @@
 if is_peft_available():
     from peft import LoraConfig, get_peft_model
 
-
+@pytest.mark.low_priority
 class TestXPOTrainer(TrlTestCase):
     def setup_method(self):
         self.model_id = "trl-internal-testing/tiny-Qwen2ForCausalLM-2.5"