Skip to content

Refactor Launcher with Typer#502

Merged
pan-x-c merged 8 commits intoagentscope-ai:mainfrom
pan-x-c:feature/refactor_launcher
Feb 9, 2026
Merged

Refactor Launcher with Typer#502
pan-x-c merged 8 commits intoagentscope-ai:mainfrom
pan-x-c:feature/refactor_launcher

Conversation

@pan-x-c
Copy link
Collaborator

@pan-x-c pan-x-c commented Feb 9, 2026

Description

  1. Replace argparse with typer for easier maintenance
  2. Optimized imports to speed up the response time of the trinity CLI (up to 50%)
  3. Add more logging in the workflow runner.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the Trinity CLI to use typer instead of argparse. This change modernizes the CLI definition, improves testability by using TyperCliRunner, and enhances the overall structure of the command-line interface. The update includes modifications to the pyproject.toml file to add typer as a dependency, updates to the test suite to reflect the new CLI structure, and refactoring of the main launcher script to use typer decorators.

Highlights

  • Typer Integration: Replaces argparse with typer for defining command-line interfaces, modernizing the CLI structure.
  • Dependency Update: Adds typer>=0.9.0 as a project dependency in pyproject.toml.
  • Test Refactoring: Updates tests in tests/cli/launcher_test.py to use TyperCliRunner instead of mocking argparse.ArgumentParser.parse_args, improving test accuracy and reducing reliance on mocks.
  • Viewer Update: Adds a run_viewer method to trinity/buffer/viewer.py to start the Streamlit viewer programmatically.
  • CLI Command Restructuring: Refactors trinity/cli/launcher.py to use typer decorators for defining CLI commands, enhancing code readability and maintainability.
Changelog
  • pyproject.toml
    • Added typer>=0.9.0 as a dependency.
  • tests/cli/launcher_test.py
    • Replaced argparse mocking with TyperCliRunner for testing CLI commands.
    • Adjusted test assertions to align with typer's output.
  • trinity/buffer/viewer.py
    • Added run_viewer method to programmatically start the Streamlit viewer.
  • trinity/cli/launcher.py
    • Replaced argparse with typer for defining CLI commands.
    • Refactored command definitions using typer decorators.
    • Removed direct calls to streamlit.cli in favor of ConfigManager.run.
  • trinity/common/experience.py
    • Minor: moved Tensor import to the top.
  • trinity/common/models/model.py
    • Minor: moved openai import inside the function to avoid circular dependency.
  • trinity/explorer/workflow_runner.py
    • Added debug logging.
  • trinity/manager/config_manager.py
    • Added static method run to start the Streamlit app.
  • trinity/manager/synchronizer.py
    • Minor: moved imports inside functions to avoid circular dependency.
Activity
  • The pull request introduces typer and removes argparse to modernize the CLI.
  • Tests have been updated to use TyperCliRunner for better integration testing.
  • The debug command has been refactored to use typer for argument parsing.
  • The studio command now uses ConfigManager.run to start the Streamlit app.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the command-line interface from argparse to typer, which significantly improves the code's readability and maintainability. The changes are well-executed, including updating tests to use typer's testing utilities and introducing lazy imports to improve CLI startup performance. I've found a couple of minor areas for improvement regarding parameter naming for better clarity. Overall, this is a great enhancement.

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Feb 9, 2026

/unittest-module-common

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
55 54 0 1 0 0 11m 13s

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 9h 21m
tests/common/config_test.py::TestConfig::test_chat_template_path 5m 2s
tests/common/config_test.py::TestConfig::test_config_flatten 32.7s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 6m 25s
tests/common/config_test.py::TestConfig::test_default_workflow 5m 1s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 59m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 5m 7s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 5m 20s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 30m 38s
tests/common/experience_test.py::TestEID::test_eid_properties 522ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 485ms
tests/common/experience_test.py::TestExperience::test_assertions 337ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 410ms
tests/common/experience_test.py::TestExperience::test_gather 982ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 571ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 15.3s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 352ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.9s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 362ms
tests/common/experience_test.py::TestExperience::test_to_dict 338ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 715ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 580ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 803ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 494ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 596ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 781ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 683ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 673ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 254ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 219ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 217ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 218ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 267ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 246ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 208ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 224ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 54m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 10h 46m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 13h 9m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 7h 34m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 9h 1m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 39m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 35m
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 7h 28m
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 7h 28m
tests/common/vllm_test.py::TestAPIServer::test_api 8h 19m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 15m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 3m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 742ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 3m 55s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 3m 54s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 52m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 8h 39m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 34h 39m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 4m

Github Test Reporter by CTRF 💚

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Feb 9, 2026

/unittest-diff

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
196 194 0 2 0 0 49m 30s

Skipped

Tests Status
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline 3h 3m
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation 1h 43m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer 42m 51s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft 1h 11m
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo 1h 19m
tests/buffer/file_test.py::TestFileBuffer::test_file_reader 6m 29s
tests/buffer/file_test.py::TestFileBuffer::test_file_writer 28m 20s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter 8m 22s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter 8m 3s
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter 14m 22s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter 17m 29s
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter 12m 22s
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter 3m 51s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse 1h 49m
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity 37m 48s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control 1h 7m
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue 51m 7s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue 50m 50s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity 59m 17s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration 12m 50s
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage 6.7s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy 36m 40s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy 33m 19s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy 30m 1s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy 29m 42s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy 1h 15m
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy 38m 11s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy 29m 52s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy 29m 29s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy 33m 15s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy 59m 29s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0 1h 32m
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1 34m 44s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write 42m
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0 5m 16s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1 4m 50s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2 5m 21s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3 5m 24s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4 5m 37s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5 5m 30s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6 5m 44s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple 4m 40s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file 5m 7s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql 41m 21s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file 41.6s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql 42m 10s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file 41.6s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql 50m 11s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode 12h 34m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command 1h 40m
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc 26m 23s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command 15m 31s
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run 4h 9m
tests/common/config_test.py::TestConfig::test_all_examples_are_valid 8h 58m
tests/common/config_test.py::TestConfig::test_chat_template_path 5m 1s
tests/common/config_test.py::TestConfig::test_config_flatten 32.1s
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid 6m 19s
tests/common/config_test.py::TestConfig::test_default_workflow 4m 55s
tests/common/config_test.py::TestConfig::test_load_default_config 1h 33m
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly 4m 53s
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation 4m 59s
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster 10m 38s
tests/common/experience_test.py::TestEID::test_eid_properties 495ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type 497ms
tests/common/experience_test.py::TestExperience::test_assertions 380ms
tests/common/experience_test.py::TestExperience::test_dpo_experience 408ms
tests/common/experience_test.py::TestExperience::test_gather 801ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward 567ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion 15.3s
tests/common/experience_test.py::TestExperience::test_multi_turn_experience 372ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize 1.0s
tests/common/experience_test.py::TestExperience::test_single_turn_experience 363ms
tests/common/experience_test.py::TestExperience::test_to_dict 335ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion 687ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion 545ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion 883ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields 509ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion 598ms
tests/common/sudoku_test.py::test_9x9_generator_produces_valid_solution 776ms
tests/common/sudoku_test.py::test_9x9_generator_creates_holes 648ms
tests/common/sudoku_test.py::test_9x9_solution_is_fully_filled 615ms
tests/common/sudoku_test.py::test_judge_allows_incomplete_board 245ms
tests/common/sudoku_test.py::test_judge_detects_row_violation 242ms
tests/common/sudoku_test.py::test_judge_detects_column_violation 221ms
tests/common/sudoku_test.py::test_judge_detects_block_violation 221ms
tests/common/sudoku_test.py::test_4x4_generator_produces_valid_solution 271ms
tests/common/sudoku_test.py::test_4x4_solution_is_fully_filled 248ms
tests/common/sudoku_test.py::test_4x4_judge_detects_row_violation 256ms
tests/common/sudoku_test.py::test_4x4_judge_detects_block_violation 226ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate 15h 21m
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate 11h 6m
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate 10h 42m
tests/common/vllm_test.py::TestModelLen_0::test_model_len 8h 44m
tests/common/vllm_test.py::TestModelLen_1::test_model_len 7h 11m
tests/common/vllm_test.py::TestModelLen_2::test_model_len 7h 41m
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len 7h 30m
tests/common/vllm_test.py::TestMessageProcess::test_no_prompt_truncation 7h 33m
tests/common/vllm_test.py::TestMessageProcess::test_truncation_status 7h 23m
tests/common/vllm_test.py::TestAPIServer::test_api 7h 58m
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api 7h 26m
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async 8h 6m
tests/common/vllm_test.py::TestTinkerAsyncAPIServer::test_api_async ⏭️ 843ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask 4m 47s
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools 4m 29s
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls 8h 57m
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls 8h 49m
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate 34h 36m
tests/common/vllm_test.py::TestTinkerAPI::test_tinker_api 11h 25m
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 29h
tests/explorer/explorer_test.py::TestExplorerEvalDetailedStats::test_explorer 19h 50m
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 15h 3m
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 49h 59m
tests/explorer/explorer_test.py::ServeTest::test_serve 15h 36m
tests/explorer/proxy_test.py::RecorderTest::test_recorder 1m 27s
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 1h 18m
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 1h 20m
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 3h 34m
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 5h 33m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 1h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 1h 15m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 1h 16m
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 1h 14m
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 1h 26m
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 1h 17m
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 2h 22m
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 4h
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 2h 19m
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 2h 12m
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 6h 57m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 2h 10m
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 3h 45m
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 2h 55m
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 1.4s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 10m 1s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 889ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 16m 43s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1.3s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 16m 43s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 11.2s
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 16.7s
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 2m 7s
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 3.7s
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 11.4s
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 7.5s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 760ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 1m 40s
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 741ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 3m 21s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 6h 21m
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 6h 31m
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 1h 6m
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 12m 22s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 14.5s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 2m 20s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 2h 14m
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 7h 32m
tests/explorer/workflow_test.py::TestConcurrentWorkflowRunner::test_concurrent_workflow_runner 10h 43m
tests/manager/synchronizer_test.py::TestSynchronizerExit_0::test_synchronizer 44h 22m
tests/manager/synchronizer_test.py::TestSynchronizerExit_1::test_synchronizer 46h 13m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer 35h 51m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer 29h 27m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer 34h 1m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer 43h 42m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_4::test_synchronizer 39h 59m
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_5::test_synchronizer 43h 47m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer 19h 8m
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer 18h
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_2::test_synchronizer 17h 49m
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent 2.3s
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth 1.7s
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string 290ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution 1.7s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent 1.0s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent 1.1s
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed 283ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent 1.0s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer 3.8s
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer 1m 6s
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv 5.1s
tests/utils/log_test.py::LogTest::test_actor_log 35m 45s
tests/utils/log_test.py::LogTest::test_group_by_node 35m 13s
tests/utils/log_test.py::LogTest::test_no_actor_log 14m 19s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins 5m 18s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins 5m 3s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins 2h 28m
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins 2h 28m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins 1h 23m
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins 1h 23m
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import 40m 30s
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping 10.5s
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping 5.8s
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping 1m
tests/utils/registry_test.py::TestRegistry::test_register_module 561ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping 681ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke ⏭️ 444ms

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit a4612cc into agentscope-ai:main Feb 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants