docs: Unify model examples to use trl-lib namespace #4431

behroozazarkhalili · 2025-11-02T21:49:28Z

Summary

Unifies model namespace usage in documentation examples to use the common trl-lib namespace as requested in issue #4385.

Resolves #4385

Changes

Files Modified:

docs/source/peft_integration.md - 1 model reference updated
docs/source/use_model.md - 3 model references updated

Replacements:

edbeeching/gpt-neo-125M-imdb → trl-lib/Qwen2-0.5B-XPO
kashif/stack-llama-2 → trl-lib/Qwen2-0.5B-XPO

All personal developer namespace models in documentation examples now use the unified trl-lib namespace. Official organization models (meta-llama, microsoft, google, etc.) and research project references (cleanrl) are intentionally preserved as they serve specific purposes.

Verification

✅ All occurrences of personal developer namespaces replaced
✅ Replacement model (trl-lib/Qwen2-0.5B-XPO) is already widely used in TRL documentation
✅ No other personal developer namespace models found in docs
✅ Official and research project namespaces appropriately preserved

Resolves #4385 - Replace edbeeching/gpt-neo-125M-imdb with trl-lib/Qwen2-0.5B-XPO in peft_integration.md - Replace kashif/stack-llama-2 with trl-lib/Qwen2-0.5B-XPO in use_model.md (3 occurrences) - All personal developer namespace models now use common trl-lib namespace

HuggingFaceDocBuilderDev · 2025-11-02T21:52:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/peft_integration.md

qgallouedec · 2025-11-04T23:38:38Z

this one is missing:

$ python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --judge_model gpt-4o-mini --num_examples 1000

and all of these:

trl/examples/scripts/evals/judge_tldr.py

Lines 34 to 50 in 6f906d5

    
           python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/rloo_tldr --num_examples 1000 
        
           Model win rate: 31.40% 
        
           python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/rloo_tldr --judge_model gpt-3.5-turbo-0125 --num_examples 1000 
        
           Model win rate: 51.60% 
        
           python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/rloo_tldr --judge_model gpt-4o-mini --num_examples 1000 
        
           Model win rate: 51.20% 
        
           python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --num_examples 1000 
        
           Model win rate: 46.30% 
        
           python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --judge_model gpt-3.5-turbo-0125 --num_examples 1000 
        
           Model win rate: 52.50% 
        
           python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --judge_model gpt-4o-mini --num_examples 1000 
        
           Model win rate: 63.00%

but they'll require train and push a model to the org trl-lib

Address reviewer feedback by replacing trl-lib/Qwen2-0.5B-XPO with the official Qwen/Qwen2.5-0.5B model in all use_model.md examples. Changes: - Replace model references in 3 locations to use Qwen organization model - More consistent with rest of TRL documentation - Less misleading than custom trl-lib namespace model

Update all model references in use_model.md to use Qwen/Qwen3-0.6B as specifically requested by qgallouedec. Changes: - Replace Qwen/Qwen2.5-0.5B with Qwen/Qwen3-0.6B in all 3 locations - Simpler model reference consistent with reviewer's suggestion

behroozazarkhalili · 2025-11-05T02:02:52Z

@qgallouedec I've addressed the comments:

✅ Completed:

Updated docs/source/use_model.md to use Qwen/Qwen3-0.6B (3 locations)
Confirmed docs/source/peft_integration.md already reverted to edbeeching/gpt-neo-125M-imdb

⏳ Pending (blocked):

examples/scripts/evals/judge_tldr.py updates: As you noted, this requires training and pushing PPO/RLOO TLDR models to the trl-lib org first. The current examples use vwxyzjn/rloo_tldr and vwxyzjn/ppo_tldr which don't have trl-lib equivalents yet.

Should we:

Keep the current examples until trl-lib models are available?
Track this as a follow-up issue once models are trained?
Use alternative trl-lib models (though they're different architectures)?

Merge branch 'main' into docs/unify-trl-lib-namespace

91e540c

qgallouedec reviewed Nov 4, 2025

View reviewed changes

docs/source/peft_integration.md Outdated Show resolved Hide resolved

qgallouedec and others added 2 commits November 4, 2025 16:32

Apply suggestion from @qgallouedec

6f906d5

Merge branch 'main' into docs/unify-trl-lib-namespace

800a4d9

behroozazarkhalili and others added 3 commits November 4, 2025 17:54

Merge branch 'main' into docs/unify-trl-lib-namespace

9bf8db4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Unify model examples to use trl-lib namespace #4431

docs: Unify model examples to use trl-lib namespace #4431

Uh oh!

behroozazarkhalili commented Nov 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 2, 2025

Uh oh!

Uh oh!

qgallouedec commented Nov 4, 2025

Uh oh!

behroozazarkhalili commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

docs: Unify model examples to use trl-lib namespace #4431

Are you sure you want to change the base?

docs: Unify model examples to use trl-lib namespace #4431

Uh oh!

Conversation

behroozazarkhalili commented Nov 2, 2025

Summary

Changes

Verification

Uh oh!

HuggingFaceDocBuilderDev commented Nov 2, 2025

Uh oh!

Uh oh!

qgallouedec commented Nov 4, 2025

Uh oh!

behroozazarkhalili commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants