Enhance experience replay for priority queue buffer by yanxi-chen · Pull Request #306 · agentscope-ai/Trinity-RFT

yanxi-chen · 2025-09-29T11:47:37Z

Description

Enhance experience replay for priority queue buffer.

Upgrade priority function API: additionally returns a bool variable indicating whether the retrieved experiences should be put back into the buffer (previously this is always true, which can cause endless reuse of the same experiences)
For linear_decay, change default decay value from 0.1 to 2.0; this could make the replay mechanism more reliable, with less dependence on setting the cooldown-time parameter appropriately
Add new priority function: linear decay + randomization + use count control (with corresponding unittest)
For priority queue, change from capacity = min(storage_config.capacity, 2 * train_batch_size) to capacity = storage_config.capacity (and update unittests accordingly)
Make the definition of "capacity" for priority queue consistent with that of common queue, which is the total number of items (previously it is defined by the number of keys in priority_groups)

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2025-09-29T11:47:52Z

Summary of Changes

Hello @yanxi-chen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the experience replay buffer's priority queue mechanism by introducing a more flexible priority function interface. This change allows for conditional insertion of experiences into the buffer, enabling finer-grained control over which data points are retained. A new strategy is also added to limit the reuse of experiences, and the default decay rate for an existing priority function is adjusted to reflect these improvements.

Highlights

Enhanced Priority Function Interface: The priority function interface has been updated to return a boolean value alongside the priority score, indicating whether an item should be placed into the queue. This allows for more dynamic control over buffer contents.
New Priority Control Strategy: A new priority function, linear_decay_use_count_control_priority, has been introduced. This function prevents experiences from being re-added to the buffer if their use_count exceeds a specified limit, helping to manage experience diversity.
Adjusted Default Decay Rate: The default decay rate for the linear_decay_priority function has been increased from 0.1 to 2.0, impacting how model_version and use_count influence an item's priority.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances the experience replay mechanism for the priority queue buffer. The priority functions are updated to return a boolean flag, put_into_queue, which provides more control over whether an item is added to the queue. A new priority function, linear_decay_use_count_control_priority, is introduced to demonstrate this by limiting the number of times an experience can be used. The changes are logical and well-implemented. My main feedback is to add docstrings to the new priority functions to improve documentation, as noted in the PR checklist.

trinity/buffer/storage/queue.py

…riority function

chenyushuo · 2025-10-15T08:09:30Z

/unittest-module-buffer

github-actions · 2025-10-15T08:12:31Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
26	26	0	0	0	0	101ms

Tests

Test Name	Status	Duration
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	15ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer	✅	4ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft	✅	5ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo	✅	6ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	3ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	8ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control	✅	6ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	6ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	1ms
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_buffer_read_write	✅	5ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1	✅	4ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3	✅	3ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5	✅	4ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-10-15T08:45:23Z

/unittest-all

pan-x-c · 2025-10-15T11:28:38Z

/unittest-module-common

github-actions · 2025-10-15T11:35:19Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
30	30	0	0	0	0	318ms

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	35ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	3ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	58ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	45ms
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	21ms
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	20ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	22ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms

Github Test Reporter by CTRF 💚

yanxi-chen added 2 commits September 29, 2025 19:44

Enhance experience replay for priority queue

d21d175

Fix pre-commit

c25e0b1

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

trinity/buffer/storage/queue.py Show resolved Hide resolved

yanxi-chen added 3 commits October 10, 2025 10:47

Support randomization in priority calculation

bfa2671

Update doc string

e66cf17

Update capacity definition for priority queue; add unittest for new p…

8b7779b

…riority function

yanxi-chen changed the title ~~[WIP] Enhance experience replay for priority queue buffer~~ Enhance experience replay for priority queue buffer Oct 11, 2025

Merge branch 'main' into dev/cyx/enhance_exp_replay

ba34124

Merge branch 'main' into dev/cyx/enhance_exp_replay

d9ca49a

pan-x-c approved these changes Oct 15, 2025

View reviewed changes

pan-x-c merged commit 3d12bd9 into agentscope-ai:main Oct 15, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance experience replay for priority queue buffer#306

Enhance experience replay for priority queue buffer#306
pan-x-c merged 7 commits intoagentscope-ai:mainfrom
yanxi-chen:dev/cyx/enhance_exp_replay

yanxi-chen commented Sep 29, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Sep 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chenyushuo commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

pan-x-c commented Oct 15, 2025

Uh oh!

pan-x-c commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yanxi-chen commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot commented Sep 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chenyushuo commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Summary

Tests

Uh oh!

pan-x-c commented Oct 15, 2025

Uh oh!

pan-x-c commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yanxi-chen commented Sep 29, 2025 •

edited

Loading