Skip to content

Simplify Config#325

Merged
pan-x-c merged 2 commits intoagentscope-ai:mainfrom
pan-x-c:feature/simplify_config
Oct 16, 2025
Merged

Simplify Config#325
pan-x-c merged 2 commits intoagentscope-ai:mainfrom
pan-x-c:feature/simplify_config

Conversation

@pan-x-c
Copy link
Collaborator

@pan-x-c pan-x-c commented Oct 16, 2025

Description

  1. remove buffer in trainer config
  2. add multi_step_grpo algorithm to avoid setting step_wise_grpo advantage function

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on simplifying the configuration process within the system, particularly for multi-step GRPO algorithms. It introduces a dedicated algorithm type to make multi-step training setup more intuitive and removes a redundant buffer configuration from the trainer. Additionally, it standardizes the default sampling strategy across several algorithms, ensuring consistency and clarity in the configuration. The changes aim to improve user experience by reducing complexity and potential for misconfiguration.

Highlights

  • Configuration Simplification: The buffer configuration has been removed from the trainer config, streamlining the overall configuration structure.
  • New Algorithm Type: A new multi_step_grpo algorithm has been introduced, eliminating the need to explicitly set step_wise_grpo as an advantage function for multi-step training.
  • Default Sampling Strategy Update: The default sample_strategy for various algorithms (PPO, GRPO, OPMD, AsymRE, DPO, TOPR, CISPO, sPPO, REC) has been updated from 'warmup' to 'default'.
  • Documentation and Example Updates: All relevant documentation and example YAML configuration files have been updated to reflect the new multi_step_grpo algorithm type and the removal of the advantage_fn parameter for this specific use case. Minor typo fixes were also applied in the Chinese documentation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Oct 16, 2025

/unittest-module-common

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the configuration for multi-step training scenarios by introducing a new multi_step_grpo algorithm type. This new type encapsulates the previous need to specify algorithm_type: grpo and advantage_fn: step_wise_grpo separately. The changes are consistently applied across various example configuration files and documentation in both English and Chinese. Additionally, the veRLConfig is simplified by removing a redundant buffer configuration.

My review includes one point of feedback regarding maintainability. A widespread change to the sample_strategy default value across multiple algorithms was included but not mentioned in the PR description. It is recommended to move such changes to a separate PR to maintain a clean and understandable commit history.

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
30 30 0 0 0 0 320ms

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 35ms
tests/common/config_test.py::TestConfig::test_config_flatten 1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 3ms
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 58ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 36ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 46ms
tests/common/vllm_test.py::TestModelLen_0::test_model_len 20ms
tests/common/vllm_test.py::TestModelLen_1::test_model_len 20ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 22ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 20ms

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit 2fd62a4 into agentscope-ai:main Oct 16, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants