Skip to content

Support vLLM v0.14.1#488

Merged
hiyuchang merged 6 commits intoagentscope-ai:mainfrom
pan-x-c:feature/vllm_14
Jan 27, 2026
Merged

Support vLLM v0.14.1#488
hiyuchang merged 6 commits intoagentscope-ai:mainfrom
pan-x-c:feature/vllm_14

Conversation

@pan-x-c
Copy link
Collaborator

@pan-x-c pan-x-c commented Jan 23, 2026

Description

This PR adds support for vLLM v0.13.0, v0.14.0, v0.14.1

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on updating the system's integration with vLLM to support its latest versions, 0.13.0 and 0.14.0. It involves a significant refactoring of how the vLLM API server is initialized, introducing a more modular and version-aware approach. This ensures that the system can leverage the performance and features of newer vLLM releases while maintaining stability and correct functionality across different versions.

Highlights

  • vLLM Version Compatibility: Extended support for vLLM to include versions 0.13.0 and 0.14.0, ensuring the system remains compatible with the latest advancements in the vLLM library.
  • API Server Initialization Refactoring: The logic for launching the vLLM API server has been refactored into a new centralized get_api_server function. This function dynamically selects and applies the correct version-specific patch for the API server, simplifying maintenance and improving adaptability to future vLLM updates.
  • New vLLM v0.13.0+ API Patch: A new patch file, api_patch_v13.py, has been introduced to specifically handle the API server implementation details for vLLM versions 0.13.0 and above. This patch includes adaptations for running the server within a Ray actor environment, addressing signal handling and socket management.
  • Prompt Logprobs Patch Update: The existing patch for extracting prompt logprobs has been updated to extend its compatibility range, now supporting vLLM versions up to 0.14.0.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for vLLM versions v0.13.0 and v0.14.0. The core change involves refactoring the API server initialization to dynamically load version-specific patches based on the detected vLLM version. A new api_patch_v13.py file has been added to handle the API changes in vLLM versions 0.13.0 and above, and existing version compatibility ranges in worker_patch.py have been updated accordingly. This approach enhances maintainability by centralizing version-dependent logic and ensures compatibility with newer vLLM releases. The changes are well-implemented and align with the stated objective of the pull request.

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Jan 26, 2026

/unittest-module-common

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
42 41 0 1 0 0 14m 5s

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 36.7s
tests/common/config_test.py::TestConfig::test_chat_template_path 74ms
tests/common/config_test.py::TestConfig::test_config_flatten 30ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 432ms
tests/common/config_test.py::TestConfig::test_default_workflow 72ms
tests/common/config_test.py::TestConfig::test_load_default_config 3.8s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 73ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 74ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 1.5s
tests/common/experience_test.py::TestEID::test_eid_properties 1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 1ms
tests/common/experience_test.py::TestExperience::test_assertions 1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 1ms
tests/common/experience_test.py::TestExperience::test_gather 1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience 1ms
tests/common/experience_test.py::TestExperience::test_to_dict 1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 1m 17s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 1m 6s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 34.2s
tests/common/vllm_test.py::TestModelLen_0::test_model_len 59.3s
tests/common/vllm_test.py::TestModelLen_1::test_model_len 21.5s
tests/common/vllm_test.py::TestModelLen_2::test_model_len 51.3s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 51.8s
tests/common/vllm_test.py::TestAPIServer::test_api 53.6s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 21.6s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 24.2s
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 240ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 242ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 52.6s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 51.9s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 2m 36s
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 1m 13s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Jan 26, 2026

/unittest-module-common

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
53 51 1 1 0 0 9m 34s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api The test failed in the call phase

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 9h 15m
tests/common/config_test.py::TestConfig::test_chat_template_path 1m 14s
tests/common/config_test.py::TestConfig::test_config_flatten 31.0s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 2m 35s
tests/common/config_test.py::TestConfig::test_default_workflow 1m 13s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 2m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 1m 16s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 1m 15s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 26m 39s
tests/common/experience_test.py::TestEID::test_eid_properties 511ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 494ms
tests/common/experience_test.py::TestExperience::test_assertions 333ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 383ms
tests/common/experience_test.py::TestExperience::test_gather 992ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 610ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 14.8s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 359ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.8s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 368ms
tests/common/experience_test.py::TestExperience::test_to_dict 330ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 711ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 809ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 951ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 532ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 665ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 794ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 835ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 745ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 263ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 251ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 237ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 257ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 277ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 272ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 238ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 230ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 43m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 11h 15m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 12h 58m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 7h 40m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 8h 47m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 57m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 46m
tests/common/vllm_test.py::TestAPIServer::test_api 8h 5m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 26m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 8m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 682ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 4m 10s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 51s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 48m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 8h 24m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 34h 4m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 21m 14s

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Jan 26, 2026

/unittest-module-common

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
53 52 0 1 0 0 10m 16s

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 9h 19m
tests/common/config_test.py::TestConfig::test_chat_template_path 1m 14s
tests/common/config_test.py::TestConfig::test_config_flatten 30.8s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 2m 33s
tests/common/config_test.py::TestConfig::test_default_workflow 1m 12s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 7m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 1m 16s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 1m 14s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 33m 38s
tests/common/experience_test.py::TestEID::test_eid_properties 498ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 484ms
tests/common/experience_test.py::TestExperience::test_assertions 328ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 395ms
tests/common/experience_test.py::TestExperience::test_gather 982ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 570ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 15.0s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 351ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.9s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 377ms
tests/common/experience_test.py::TestExperience::test_to_dict 337ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 707ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 542ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 771ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 487ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 607ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 972ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 1.2s
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 739ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 258ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 241ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 221ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 227ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 262ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 255ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 228ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 219ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 51m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 11h
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 13h 2m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 9h 18m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 7h 40m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 49m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 16m
tests/common/vllm_test.py::TestAPIServer::test_api 8h 8m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 24m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 7m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 725ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 3m 57s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 51s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 47m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 8h 37m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 34h 6m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 41m

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c changed the title Support vLLM v0.14.0 Support vLLM v0.14.1 Jan 26, 2026
@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Jan 26, 2026

/unittest-all

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Jan 26, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for vLLM versions up to 0.14.1, which is a valuable update. The changes are well-structured, with version-specific logic correctly encapsulated in patch files. The refactoring to use a factory function get_api_server for creating the API server is a good improvement for maintainability.

I've added one suggestion to improve code clarity in the new api_patch_v13.py file.

Additionally, I've noticed a potential bug that I couldn't comment on directly due to tooling limitations. The file scripts/install.py was renamed to scripts/data/install.py, but unlike the similar change in scripts/data/start_servers.py, the relative path to env_mapping.json was not updated. This will likely cause a FileNotFoundError when running scripts/data/install.py. Please ensure this path is corrected.

Overall, great work on extending vLLM support.

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
246 239 0 7 0 0 1h 10m

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo 5.2s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage 3.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo 5.1s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage 3.5s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias 2.0s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std 1.7s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage 2.0s
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold 2.4s
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn 1.8s
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback 942ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss 948ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy 873ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob 851ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn 804ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn 781ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn 833ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn 814ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes 883ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn 866ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss 2.1s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss 2.0s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss 3.6s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss 1.5s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss 1.2s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking 1.3s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss 1.9s
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss 997ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 2h 51m
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation 1h 38m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 44m 57s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 1h 12m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 1h 24m
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 2m 28s
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 24m 20s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 9m 9s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 7m 41s
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 13m 41s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 17m 14s
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 12m
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 3m 44s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 1h 49m
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 35m 4s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 1h 11m
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 54m 43s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 54m 9s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 59m 4s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration 13m 52s
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 6.4s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy 33m 37s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy 26m 29s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy 29m 13s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy 28m 50s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy 1h 13m
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy 30m 44s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy 26m 12s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy 28m 55s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy 26m 15s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy 58m 12s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0 1h 34m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1 42m 18s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write 49m 44s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0 1m 16s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1 56.0s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2 1m 27s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3 1m 27s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4 1m 28s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5 1m 34s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6 1m 48s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple 46.5s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file 7m 47s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql 51m 8s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file 40.5s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql 48m 7s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file 40.5s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql 57m 35s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 11h 3m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 37m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 23m 23s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 5m 17s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 3h 59m
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 8h 55m
tests/common/config_test.py::TestConfig::test_chat_template_path 1m 15s
tests/common/config_test.py::TestConfig::test_config_flatten 31.4s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 2m 34s
tests/common/config_test.py::TestConfig::test_default_workflow 1m 12s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 3m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 1m 14s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 1m 13s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 2m 46s
tests/common/experience_test.py::TestEID::test_eid_properties 538ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 504ms
tests/common/experience_test.py::TestExperience::test_assertions 330ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 410ms
tests/common/experience_test.py::TestExperience::test_gather 798ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 575ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 15.1s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 367ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.0s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 380ms
tests/common/experience_test.py::TestExperience::test_to_dict 343ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 683ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 547ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 834ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 516ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 592ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 1.6s
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 797ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 713ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 255ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 247ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 233ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 229ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 282ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 276ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 238ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 221ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 38m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 11h 3m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 10h 42m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 8h 44m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 7h 50m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 32m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 41m
tests/common/vllm_test.py::TestAPIServer::test_api 8h 9m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 30m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 3m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 752ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 3m 47s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 46s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 50m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 8h 34m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 33h 57m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 26m
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 13h 30m
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 13h 46m
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 16h 28m
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 50h 12m
tests/explorer/explorer_test.py::ServeTest::test_serve 15h 23m
tests/explorer/proxy_test.py::RecorderTest::test_recorder 1m 26s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 1h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 1h 25m
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 3h 34m
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 5h 41m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 1h 24m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 1h 23m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 1h 29m
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 1h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 2h 31m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 4h 9m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 2h 27m
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 2h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 7h 2m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 2h 13m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 3h 51m
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 2h 46m
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 2.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 10m 2s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 965ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 16m 42s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 998ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 16m 43s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 12.7s
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 17.7s
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 13m 58s
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 4.8s
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 12.8s
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 8.6s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 1m 41s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 3m 21s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 6h 31m
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 6h 31m
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 1h 6m
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 12m 49s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 13.6s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 2m 17s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 2h 14m
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 7h 29m
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer 23h 23m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer 28h
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer 23h
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer 28h 31m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer 23h 20m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer 19h 28m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer 20h 3m
tests/service/data_juicer_test.py::TestDataJuicer::test_config 18m 47s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start 5h 57m
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators 5h 37m
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline 4h 6m
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 38h 23m
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 65h 27m
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 19h 32m
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 14h 30m
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 14h 15m
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 15h 45m
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 18h 28m
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 34h 1m
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 9h 15m
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 8h 57m
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 8h 16m
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 26h 39m
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 25h 43m
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 38h 19m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 40h 3m
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 87h 36m
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 27h 56m
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 29h 35m
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer ⏭️ 17m 11s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer ⏭️ 15m 58s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 40h 55m
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 14h 11m
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 12h 43m
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 628ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 279ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner ⏭️ 289ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent 9.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent 1.2s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth 1.7s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string 274ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution 1.8s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed 262ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent 1.1s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer 4.3s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer 1m
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv 5.1s
tests/utils/log_test.py::LogTest::test_actor_log 42m 52s
tests/utils/log_test.py::LogTest::test_group_by_node 30m 40s
tests/utils/log_test.py::LogTest::test_no_actor_log 10m 26s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins 1m 17s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins 1m 14s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins 2h 20m
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins 2h 21m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins 1h 24m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins 1h 18m
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import 38m 54s
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping 8.1s
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping 2.9s
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping 54.1s
tests/utils/registry_test.py::TestRegistry::test_register_module 519ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping 646ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke ⏭️ 357ms

Github Test Reporter by CTRF 💚

@hiyuchang hiyuchang merged commit 9c7d744 into agentscope-ai:main Jan 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments