Simplify Config by pan-x-c · Pull Request #325 · agentscope-ai/Trinity-RFT

pan-x-c · 2025-10-16T11:38:41Z

Description

remove buffer in trainer config
add multi_step_grpo algorithm to avoid setting step_wise_grpo advantage function

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2025-10-16T11:39:05Z

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on simplifying the configuration process within the system, particularly for multi-step GRPO algorithms. It introduces a dedicated algorithm type to make multi-step training setup more intuitive and removes a redundant buffer configuration from the trainer. Additionally, it standardizes the default sampling strategy across several algorithms, ensuring consistency and clarity in the configuration. The changes aim to improve user experience by reducing complexity and potential for misconfiguration.

Highlights

Configuration Simplification: The buffer configuration has been removed from the trainer config, streamlining the overall configuration structure.
New Algorithm Type: A new multi_step_grpo algorithm has been introduced, eliminating the need to explicitly set step_wise_grpo as an advantage function for multi-step training.
Default Sampling Strategy Update: The default sample_strategy for various algorithms (PPO, GRPO, OPMD, AsymRE, DPO, TOPR, CISPO, sPPO, REC) has been updated from 'warmup' to 'default'.
Documentation and Example Updates: All relevant documentation and example YAML configuration files have been updated to reflect the new multi_step_grpo algorithm type and the removal of the advantage_fn parameter for this specific use case. Minor typo fixes were also applied in the Chinese documentation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

pan-x-c · 2025-10-16T11:39:13Z

/unittest-module-common

gemini-code-assist

Code Review

This pull request simplifies the configuration for multi-step training scenarios by introducing a new multi_step_grpo algorithm type. This new type encapsulates the previous need to specify algorithm_type: grpo and advantage_fn: step_wise_grpo separately. The changes are consistently applied across various example configuration files and documentation in both English and Chinese. Additionally, the veRLConfig is simplified by removing a redundant buffer configuration.

My review includes one point of feedback regarding maintainability. A widespread change to the sample_strategy default value across multiple algorithms was included but not mentioned in the PR description. It is recommended to move such changes to a separate PR to maintain a clean and understandable commit history.

trinity/algorithm/algorithm.py

github-actions · 2025-10-16T11:45:59Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
30	30	0	0	0	0	320ms

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	35ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	3ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	58ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	46ms
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	20ms
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	20ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	22ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms

Github Test Reporter by CTRF 💚

pan-x-c added 2 commits October 16, 2025 19:26

simplify config

87157a7

fix config

3d83076

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

trinity/algorithm/algorithm.py Show resolved Hide resolved

chenyushuo approved these changes Oct 16, 2025

View reviewed changes

pan-x-c merged commit 2fd62a4 into agentscope-ai:main Oct 16, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify Config#325

Simplify Config#325
pan-x-c merged 2 commits intoagentscope-ai:mainfrom
pan-x-c:feature/simplify_config

pan-x-c commented Oct 16, 2025

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Uh oh!

pan-x-c commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pan-x-c commented Oct 16, 2025

Description

Checklist

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

pan-x-c commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Oct 16, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants