Skip to content

Comments

feat: add integration tests and cron job workflow#219

Merged
simonrosenberg merged 41 commits intomainfrom
add-integration-tests-and-cron-workflow
Sep 15, 2025
Merged

feat: add integration tests and cron job workflow#219
simonrosenberg merged 41 commits intomainfrom
add-integration-tests-and-cron-workflow

Conversation

@simonrosenberg
Copy link
Collaborator

No description provided.

@simonrosenberg simonrosenberg added the integration-test Runs the integration tests and comments the results label Sep 12, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 12, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
openhands
   __init__.py10100% 
openhands/sdk
   __init__.py15286%23–24
   logger.py732171%33, 57, 64–67, 69–71, 124, 130–132, 135–136, 142–144, 151, 156–157
openhands/sdk/agent
   __init__.py30100% 
   agent.py1763580%65, 72, 79, 83, 100, 114, 121–122, 127–128, 199–200, 202–204, 206–208, 243, 257, 280, 312, 346–348, 352–354, 361–362, 366, 370–371, 401, 408
   base.py63690%53, 75–77, 93, 113
openhands/sdk/context
   __init__.py40100% 
   agent_context.py57296%146, 152
   manager.py330%1, 4–5
   view.py48687%47–51, 53
openhands/sdk/context/condenser
   __init__.py30100% 
   condenser.py17382%68–69, 73
   no_op_condenser.py60100% 
openhands/sdk/context/microagents
   __init__.py40100% 
   exceptions.py50100% 
   microagent.py1432582%130, 133–136, 218–221, 229, 251–252, 257–258, 260, 264, 271–273, 281–283, 337, 339–340
   types.py210100% 
openhands/sdk/context/prompts
   __init__.py20100% 
   prompt.py30583%12, 15, 24, 44–45
openhands/sdk/conversation
   __init__.py70100% 
   conversation.py1121190%113, 121–123, 127–128, 184, 260–261, 269–270
   event_store.py101892%50–51, 60, 67, 72–73, 129, 142
   persistence_const.py50100% 
   secrets_manager.py41197%107
   serialization_diff.py00100% 
   state.py94594%126, 149, 185–187
   types.py30100% 
   visualizer.py91594%88, 145, 167, 210, 212
openhands/sdk/event
   __init__.py50100% 
   base.py742072%51, 55, 75, 79–81, 83–88, 90–91, 93–95, 97–98, 100
   condenser.py25580%29, 33–35, 49
   llm_convertible.py1791691%53, 63–64, 69–70, 246, 280–281, 286, 294, 335–336, 341, 374–375, 380
   types.py30100% 
   user_action.py12191%21
   utils.py120100% 
openhands/sdk/io
   __init__.py40100% 
   base.py14471%7, 11, 15, 19
   local.py561671%43–44, 58, 66–78
   memory.py43490%16, 20, 53–54
openhands/sdk/llm
   __init__.py60100% 
   exceptions.py360100% 
   llm.py38611669%224, 229, 242–244, 248–249, 281, 340, 346–347, 442, 455–456, 461–462, 464–465, 468–470, 475–477, 481–483, 504–507, 512–517, 521–522, 531–533, 536–537, 561, 567–568, 614, 663, 669, 672, 681–682, 691, 698, 701, 705–707, 711, 713–718, 720–737, 740–744, 746–747, 753–762, 766–777, 790, 804, 809
   llm_registry.py380100% 
   message.py110496%97, 100, 223–224
   metadata.py150100% 
openhands/sdk/llm/mixins
   fn_call_converter.py34310170%74, 343, 345, 349, 367, 369, 375, 381, 383, 422, 424, 426, 428, 433–434, 518–520, 522, 524, 545–547, 553, 575, 601–602, 610–613, 615, 617, 639, 648, 656, 701–704, 708–711, 723, 727, 738, 748, 797–798, 800, 819–821, 823–826, 829, 833, 844–845, 859, 867, 870–871, 876, 905–908, 912–913, 918–919, 924, 973–974, 980, 994, 1006, 1008–1009, 1012–1014, 1016–1017, 1023–1025, 1027–1028, 1030, 1032, 1036, 1038, 1043, 1045–1046, 1049
   non_native_fc.py39392%64, 75, 91
openhands/sdk/llm/utils
   metrics.py111298%17, 117
   model_features.py400100% 
   retry_mixin.py501178%47, 50, 64, 86, 90, 94–95, 105, 110–111, 116
   telemetry.py1361588%71, 94, 99–100, 112–113, 120, 134, 199, 216, 222, 229, 232, 234, 241
openhands/sdk/mcp
   __init__.py50100% 
   client.py631182%41, 56–57, 78, 82, 94–95, 105–106, 112–113
   definition.py481666%55, 75–80, 82–90
   tool.py401367%36–39, 43, 46, 49–52, 101–102, 107
   utils.py30486%23–24, 27, 30
openhands/sdk/preset
   __init__.py00100% 
   default.py12120%3, 6, 8, 10, 17, 23–24, 26–28, 30–31
openhands/sdk/tool
   __init__.py40100% 
   schema.py1241191%27–29, 31, 40, 230–233, 253, 268
   security_prompt.py30100% 
   tool.py951089%58, 99, 169, 172–178
openhands/sdk/tool/builtins
   __init__.py40100% 
   finish.py26196%33
   think.py321359%24, 27–28, 31, 33–37, 39, 51, 57, 74
openhands/sdk/utils
   __init__.py30100% 
   async_utils.py120100% 
   discriminated_union.py63592%173, 212, 217, 224, 227
   json.py28280%1–3, 5, 7–8, 11, 14–21, 25, 28, 30–31, 34, 37–38, 40, 43, 45–48
   protocol.py30100% 
   pydantic_diff.py571573%36, 44, 50–58, 60–62, 65
   truncate.py100100% 
   visualize.py17664%14–16, 19–20, 22
openhands/tools
   __init__.py9277%53–54
openhands/tools/execute_bash
   __init__.py40100% 
   constants.py90100% 
   definition.py924551%37, 40, 43–44, 46, 49–51, 53–55, 57, 105, 108–110, 113, 115–117, 119, 123–124, 127–129, 131–132, 135–138, 142–144, 149, 153–155, 158–160, 164–165, 167
   impl.py40392%55, 58, 62
   metadata.py50394%95–96, 100
openhands/tools/execute_bash/terminal
   __init__.py60100% 
   factory.py491177%24–25, 30, 32, 35, 37–38, 44–46, 97
   interface.py691578%43, 52, 62, 71, 76, 85, 94, 99, 145, 157, 162, 171, 180, 191, 193
   subprocess_terminal.py2365975%68, 99–100, 126, 132, 139, 146–147, 157–158, 164–165, 179, 181, 185–187, 193, 209, 218–222, 257–259, 264, 276, 290, 314, 316, 325, 346, 362, 367, 373–375, 383–384, 388–389, 391–397, 401–402, 405–406, 408–409, 411–413
   terminal_session.py178895%92, 96–98, 235, 281, 297, 317
   tmux_terminal.py802173%36, 45, 108, 119, 133, 145–152, 160–161, 163–164, 166, 168–170
openhands/tools/execute_bash/utils
   command.py81495%48, 64–66
openhands/tools/str_replace_editor
   __init__.py30100% 
   definition.py65986%87, 99, 119, 122, 125, 132, 134, 136, 138
   editor.py2281195%131, 264, 340, 350, 401–402, 641, 648–649, 663, 668
   exceptions.py220100% 
   impl.py26292%31–32
openhands/tools/str_replace_editor/utils
   __init__.py00100% 
   config.py20100% 
   constants.py50100% 
   diff.py64198%115
   encoding.py54198%81
   file_cache.py95990%44–46, 49–50, 54, 59, 151, 154
   history.py66198%79
   shell.py230100% 
openhands/tools/task_tracker
   __init__.py20100% 
   definition.py1329428%48, 51–53, 55–56, 59–60, 62, 78, 83, 85, 87–88, 91, 94–96, 98–99, 102–108, 110–112, 115, 117–120, 122, 125, 128–129, 131–132, 134–135, 137, 150–151, 154–155, 159, 161, 163–165, 171, 173–174, 179–180, 184, 193–194, 196–198, 202–203, 205–208, 210, 214–215, 217–219, 221–225, 229, 233–234, 236–237, 239, 241–245, 407, 410
openhands/tools/utils
   __init__.py00100% 
TOTAL523490082% 

@simonrosenberg simonrosenberg self-assigned this Sep 12, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@simonrosenberg
Copy link
Collaborator Author

@OpenHands please fix the failing actions on PR #219 at branch add-integration-tests-and-cron-workflow

@openhands-ai
Copy link

openhands-ai bot commented Sep 12, 2025

I'm on it! simonrosenberg can track my progress at all-hands.dev

@simonrosenberg simonrosenberg removed the integration-test Runs the integration tests and comments the results label Sep 12, 2025
@simonrosenberg simonrosenberg added the integration-test Runs the integration tests and comments the results label Sep 12, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #219)
Commit: efb737f
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:
No report file found

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
No report file found

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 12, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #219)
Commit: 65bfb08
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:
No report file found

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
No report file found

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 12, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #219)
Commit: 6ba6805
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:

Integration Tests Report - 6ba6805_sonnet_run

Success rate: 100.00% (1/1)

Total cost: USD 0.00

Test Results

instance_id success reason cost error_message
t01_fix_simple_typo_class_based True Successfully fixed all typos 0 nan

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:

Integration Tests Report - 6ba6805_deepseek_run

Success rate: 0.00% (0/1)

Total cost: USD 0.00

Test Results

instance_id success reason cost error_message
t01_fix_simple_typo_class_based False Test execution failed: litellm.BadRequestError: Litellm_proxyException - {'error': '/chat/completions: Invalid model name passed in model=deepseek/deepseek-reasoner. Call /v1/models to view available models for your key.'} 0 nan

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 12, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Trigger by: Pull Request (integration-test label on PR #219)
Commit: ce66b45
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:

Integration Tests Report - ce66b45_sonnet_run

Success rate: 100.00% (1/1)

Total cost: USD 0.00

Test Results

instance_id success reason cost error_message
t01_fix_simple_typo_class_based True Successfully fixed all typos 0 nan

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:

Integration Tests Report - ce66b45_deepseek_run

Success rate: 0.00% (0/1)

Total cost: USD 0.00

Test Results

instance_id success reason cost error_message
t01_fix_simple_typo_class_based False Test execution failed: litellm.BadRequestError: Litellm_proxyException - {'error': '/chat/completions: Invalid model name passed in model=deepseek-chat. Call /v1/models to view available models for your key.'} 0 nan

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 15, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

2 similar comments
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Integration Tests Report

Trigger: Pull Request (integration-test label on PR #219)
Commit: d882213
Timestamp: 2025-09-15 13:36 UTC

Test Results Summary

Model Success Rate Cost Test Results Artifact Link
Claude Sonnet 4 100.00% $0.0e+00 See details below Download
GPT-5 Mini 100.00% $0.0e+00 See details below Download
DeepSeek Chat 100.00% $0.0e+00 See details below Download

Detailed Results

Claude Sonnet 4

# Integration Tests Report - d882213_sonnet_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |


GPT-5 Mini

# Integration Tests Report - d882213_gpt5_mini_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |


DeepSeek Chat

# Integration Tests Report - d882213_deepseek_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |



Overall Status: 3 models tested
Total Cost: $0.0e+00

@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 15, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

2 similar comments
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

- Removed sys.path.insert() from run_infer.py
- Both scripts now use clean global imports without path manipulation
- Maintained clean import structure with format_cost from separate module
- All imports work correctly with PYTHONPATH environment variable

Co-authored-by: openhands <openhands@all-hands.dev>
@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 15, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

2 similar comments
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@openhands-ai
Copy link

openhands-ai bot commented Sep 15, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run Integration Tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #219 at branch `add-integration-tests-and-cron-workflow`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 15, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

2 similar comments
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Integration Tests Report

Trigger: Pull Request (integration-test label on PR #219)
Commit: 700fff4
Timestamp: 2025-09-15 14:19 UTC

Test Results Summary

Model Success Rate Cost Test Results Artifact Link
GPT-5 Mini 100.00% $0.0e+00 See details below Download
DeepSeek Chat 100.00% $0.0e+00 See details below Download
Claude Sonnet 4 100.00% $0.0e+00 See details below Download

Detailed Results

GPT-5 Mini

# Integration Tests Report - 700fff4_gpt5_mini_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |


DeepSeek Chat

# Integration Tests Report - 700fff4_deepseek_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |


Claude Sonnet 4

# Integration Tests Report - 700fff4_sonnet_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |



Overall Status: 3 models tested
Total Cost: $0.0e+00

@simonrosenberg simonrosenberg merged commit 7dc37fe into main Sep 15, 2025
11 checks passed
@simonrosenberg simonrosenberg deleted the add-integration-tests-and-cron-workflow branch September 15, 2025 14:28
@simonrosenberg simonrosenberg linked an issue Sep 16, 2025 that may be closed by this pull request
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port over integration tests

3 participants