Skip to content

Enhance experience replay for priority queue buffer#306

Merged
pan-x-c merged 7 commits intoagentscope-ai:mainfrom
yanxi-chen:dev/cyx/enhance_exp_replay
Oct 15, 2025
Merged

Enhance experience replay for priority queue buffer#306
pan-x-c merged 7 commits intoagentscope-ai:mainfrom
yanxi-chen:dev/cyx/enhance_exp_replay

Conversation

@yanxi-chen
Copy link
Collaborator

@yanxi-chen yanxi-chen commented Sep 29, 2025

Description

Enhance experience replay for priority queue buffer.

  • Upgrade priority function API: additionally returns a bool variable indicating whether the retrieved experiences should be put back into the buffer (previously this is always true, which can cause endless reuse of the same experiences)
  • For linear_decay, change default decay value from 0.1 to 2.0; this could make the replay mechanism more reliable, with less dependence on setting the cooldown-time parameter appropriately
  • Add new priority function: linear decay + randomization + use count control (with corresponding unittest)
  • For priority queue, change from capacity = min(storage_config.capacity, 2 * train_batch_size) to capacity = storage_config.capacity (and update unittests accordingly)
  • Make the definition of "capacity" for priority queue consistent with that of common queue, which is the total number of items (previously it is defined by the number of keys in priority_groups)

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yanxi-chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the experience replay buffer's priority queue mechanism by introducing a more flexible priority function interface. This change allows for conditional insertion of experiences into the buffer, enabling finer-grained control over which data points are retained. A new strategy is also added to limit the reuse of experiences, and the default decay rate for an existing priority function is adjusted to reflect these improvements.

Highlights

  • Enhanced Priority Function Interface: The priority function interface has been updated to return a boolean value alongside the priority score, indicating whether an item should be placed into the queue. This allows for more dynamic control over buffer contents.
  • New Priority Control Strategy: A new priority function, linear_decay_use_count_control_priority, has been introduced. This function prevents experiences from being re-added to the buffer if their use_count exceeds a specified limit, helping to manage experience diversity.
  • Adjusted Default Decay Rate: The default decay rate for the linear_decay_priority function has been increased from 0.1 to 2.0, impacting how model_version and use_count influence an item's priority.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the experience replay mechanism for the priority queue buffer. The priority functions are updated to return a boolean flag, put_into_queue, which provides more control over whether an item is added to the queue. A new priority function, linear_decay_use_count_control_priority, is introduced to demonstrate this by limiting the number of times an experience can be used. The changes are logical and well-implemented. My main feedback is to add docstrings to the new priority functions to improve documentation, as noted in the PR checklist.

@yanxi-chen yanxi-chen changed the title [WIP] Enhance experience replay for priority queue buffer Enhance experience replay for priority queue buffer Oct 11, 2025
@chenyushuo
Copy link
Collaborator

/unittest-module-buffer

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
26 26 0 0 0 0 101ms

Tests

Test Name Status Flaky Duration
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 15ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 4ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 5ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 6ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 3ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 1ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 1ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 1ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 8ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 6ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 5ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 6ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 1ms
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_buffer_read_write 5ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0 1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1 4ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2 1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3 3ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4 1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5 4ms

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator

pan-x-c commented Oct 15, 2025

/unittest-all

@pan-x-c
Copy link
Collaborator

pan-x-c commented Oct 15, 2025

/unittest-module-common

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
30 30 0 0 0 0 318ms

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 35ms
tests/common/config_test.py::TestConfig::test_config_flatten 1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 1ms
tests/common/config_test.py::TestConfig::test_load_default_config 3ms
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 58ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 35ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 45ms
tests/common/vllm_test.py::TestModelLen_0::test_model_len 21ms
tests/common/vllm_test.py::TestModelLen_1::test_model_len 20ms
tests/common/vllm_test.py::TestAPIServer::test_api 24ms
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 22ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 20ms

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit 3d12bd9 into agentscope-ai:main Oct 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants