Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Summary

Unifies model namespace usage in documentation examples to use the common trl-lib namespace as requested in issue #4385.

Resolves #4385

Changes

Files Modified:

  • docs/source/peft_integration.md - 1 model reference updated
  • docs/source/use_model.md - 3 model references updated

Replacements:

  • edbeeching/gpt-neo-125M-imdbtrl-lib/Qwen2-0.5B-XPO
  • kashif/stack-llama-2trl-lib/Qwen2-0.5B-XPO

All personal developer namespace models in documentation examples now use the unified trl-lib namespace. Official organization models (meta-llama, microsoft, google, etc.) and research project references (cleanrl) are intentionally preserved as they serve specific purposes.

Verification

  • ✅ All occurrences of personal developer namespaces replaced
  • ✅ Replacement model (trl-lib/Qwen2-0.5B-XPO) is already widely used in TRL documentation
  • ✅ No other personal developer namespace models found in docs
  • ✅ Official and research project namespaces appropriately preserved

Resolves #4385

- Replace edbeeching/gpt-neo-125M-imdb with trl-lib/Qwen2-0.5B-XPO in peft_integration.md
- Replace kashif/stack-llama-2 with trl-lib/Qwen2-0.5B-XPO in use_model.md (3 occurrences)
- All personal developer namespace models now use common trl-lib namespace
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec
Copy link
Member

this one is missing:

$ python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --judge_model gpt-4o-mini --num_examples 1000

and all of these:

python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/rloo_tldr --num_examples 1000
Model win rate: 31.40%
python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/rloo_tldr --judge_model gpt-3.5-turbo-0125 --num_examples 1000
Model win rate: 51.60%
python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/rloo_tldr --judge_model gpt-4o-mini --num_examples 1000
Model win rate: 51.20%
python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --num_examples 1000
Model win rate: 46.30%
python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --judge_model gpt-3.5-turbo-0125 --num_examples 1000
Model win rate: 52.50%
python examples/scripts/evals/judge_tldr.py --model_name_or_path vwxyzjn/ppo_tldr --judge_model gpt-4o-mini --num_examples 1000
Model win rate: 63.00%

but they'll require train and push a model to the org trl-lib

behroozazarkhalili and others added 3 commits November 4, 2025 17:54
Address reviewer feedback by replacing trl-lib/Qwen2-0.5B-XPO with the
official Qwen/Qwen2.5-0.5B model in all use_model.md examples.

Changes:
- Replace model references in 3 locations to use Qwen organization model
- More consistent with rest of TRL documentation
- Less misleading than custom trl-lib namespace model
Update all model references in use_model.md to use Qwen/Qwen3-0.6B
as specifically requested by qgallouedec.

Changes:
- Replace Qwen/Qwen2.5-0.5B with Qwen/Qwen3-0.6B in all 3 locations
- Simpler model reference consistent with reviewer's suggestion
@behroozazarkhalili
Copy link
Collaborator Author

@qgallouedec I've addressed the comments:

Completed:

  • Updated docs/source/use_model.md to use Qwen/Qwen3-0.6B (3 locations)
  • Confirmed docs/source/peft_integration.md already reverted to edbeeching/gpt-neo-125M-imdb

Pending (blocked):

  • examples/scripts/evals/judge_tldr.py updates: As you noted, this requires training and pushing PPO/RLOO TLDR models to the trl-lib org first. The current examples use vwxyzjn/rloo_tldr and vwxyzjn/ppo_tldr which don't have trl-lib equivalents yet.

Should we:

  1. Keep the current examples until trl-lib models are available?
  2. Track this as a follow-up issue once models are trained?
  3. Use alternative trl-lib models (though they're different architectures)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use a common 'trl-lib` namespace for the models/datasets/spaces

4 participants