feat: add integration tests and cron job workflow by simonrosenberg · Pull Request #219 · OpenHands/software-agent-sdk

simonrosenberg · 2025-09-12T13:48:18Z

No description provided.

github-actions · 2025-09-12T13:50:16Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-12T13:53:07Z

Coverage Report

File	Stmts	Miss	Cover	Missing
openhands
__init__.py	1	0	100%
openhands/sdk
__init__.py	15	2	86%	23–24
logger.py	73	21	71%	33, 57, 64–67, 69–71, 124, 130–132, 135–136, 142–144, 151, 156–157
openhands/sdk/agent
__init__.py	3	0	100%
agent.py	176	35	80%	65, 72, 79, 83, 100, 114, 121–122, 127–128, 199–200, 202–204, 206–208, 243, 257, 280, 312, 346–348, 352–354, 361–362, 366, 370–371, 401, 408
base.py	63	6	90%	53, 75–77, 93, 113
openhands/sdk/context
__init__.py	4	0	100%
agent_context.py	57	2	96%	146, 152
manager.py	3	3	0%	1, 4–5
view.py	48	6	87%	47–51, 53
openhands/sdk/context/condenser
__init__.py	3	0	100%
condenser.py	17	3	82%	68–69, 73
no_op_condenser.py	6	0	100%
openhands/sdk/context/microagents
__init__.py	4	0	100%
exceptions.py	5	0	100%
microagent.py	143	25	82%	130, 133–136, 218–221, 229, 251–252, 257–258, 260, 264, 271–273, 281–283, 337, 339–340
types.py	21	0	100%
openhands/sdk/context/prompts
__init__.py	2	0	100%
prompt.py	30	5	83%	12, 15, 24, 44–45
openhands/sdk/conversation
__init__.py	7	0	100%
conversation.py	112	11	90%	113, 121–123, 127–128, 184, 260–261, 269–270
event_store.py	101	8	92%	50–51, 60, 67, 72–73, 129, 142
persistence_const.py	5	0	100%
secrets_manager.py	41	1	97%	107
serialization_diff.py	0	0	100%
state.py	94	5	94%	126, 149, 185–187
types.py	3	0	100%
visualizer.py	91	5	94%	88, 145, 167, 210, 212
openhands/sdk/event
__init__.py	5	0	100%
base.py	74	20	72%	51, 55, 75, 79–81, 83–88, 90–91, 93–95, 97–98, 100
condenser.py	25	5	80%	29, 33–35, 49
llm_convertible.py	179	16	91%	53, 63–64, 69–70, 246, 280–281, 286, 294, 335–336, 341, 374–375, 380
types.py	3	0	100%
user_action.py	12	1	91%	21
utils.py	12	0	100%
openhands/sdk/io
__init__.py	4	0	100%
base.py	14	4	71%	7, 11, 15, 19
local.py	56	16	71%	43–44, 58, 66–78
memory.py	43	4	90%	16, 20, 53–54
openhands/sdk/llm
__init__.py	6	0	100%
exceptions.py	36	0	100%
llm.py	386	116	69%	224, 229, 242–244, 248–249, 281, 340, 346–347, 442, 455–456, 461–462, 464–465, 468–470, 475–477, 481–483, 504–507, 512–517, 521–522, 531–533, 536–537, 561, 567–568, 614, 663, 669, 672, 681–682, 691, 698, 701, 705–707, 711, 713–718, 720–737, 740–744, 746–747, 753–762, 766–777, 790, 804, 809
llm_registry.py	38	0	100%
message.py	110	4	96%	97, 100, 223–224
metadata.py	15	0	100%
openhands/sdk/llm/mixins
fn_call_converter.py	343	101	70%	74, 343, 345, 349, 367, 369, 375, 381, 383, 422, 424, 426, 428, 433–434, 518–520, 522, 524, 545–547, 553, 575, 601–602, 610–613, 615, 617, 639, 648, 656, 701–704, 708–711, 723, 727, 738, 748, 797–798, 800, 819–821, 823–826, 829, 833, 844–845, 859, 867, 870–871, 876, 905–908, 912–913, 918–919, 924, 973–974, 980, 994, 1006, 1008–1009, 1012–1014, 1016–1017, 1023–1025, 1027–1028, 1030, 1032, 1036, 1038, 1043, 1045–1046, 1049
non_native_fc.py	39	3	92%	64, 75, 91
openhands/sdk/llm/utils
metrics.py	111	2	98%	17, 117
model_features.py	40	0	100%
retry_mixin.py	50	11	78%	47, 50, 64, 86, 90, 94–95, 105, 110–111, 116
telemetry.py	136	15	88%	71, 94, 99–100, 112–113, 120, 134, 199, 216, 222, 229, 232, 234, 241
openhands/sdk/mcp
__init__.py	5	0	100%
client.py	63	11	82%	41, 56–57, 78, 82, 94–95, 105–106, 112–113
definition.py	48	16	66%	55, 75–80, 82–90
tool.py	40	13	67%	36–39, 43, 46, 49–52, 101–102, 107
utils.py	30	4	86%	23–24, 27, 30
openhands/sdk/preset
__init__.py	0	0	100%
default.py	12	12	0%	3, 6, 8, 10, 17, 23–24, 26–28, 30–31
openhands/sdk/tool
__init__.py	4	0	100%
schema.py	124	11	91%	27–29, 31, 40, 230–233, 253, 268
security_prompt.py	3	0	100%
tool.py	95	10	89%	58, 99, 169, 172–178
openhands/sdk/tool/builtins
__init__.py	4	0	100%
finish.py	26	1	96%	33
think.py	32	13	59%	24, 27–28, 31, 33–37, 39, 51, 57, 74
openhands/sdk/utils
__init__.py	3	0	100%
async_utils.py	12	0	100%
discriminated_union.py	63	5	92%	173, 212, 217, 224, 227
json.py	28	28	0%	1–3, 5, 7–8, 11, 14–21, 25, 28, 30–31, 34, 37–38, 40, 43, 45–48
protocol.py	3	0	100%
pydantic_diff.py	57	15	73%	36, 44, 50–58, 60–62, 65
truncate.py	10	0	100%
visualize.py	17	6	64%	14–16, 19–20, 22
openhands/tools
__init__.py	9	2	77%	53–54
openhands/tools/execute_bash
__init__.py	4	0	100%
constants.py	9	0	100%
definition.py	92	45	51%	37, 40, 43–44, 46, 49–51, 53–55, 57, 105, 108–110, 113, 115–117, 119, 123–124, 127–129, 131–132, 135–138, 142–144, 149, 153–155, 158–160, 164–165, 167
impl.py	40	3	92%	55, 58, 62
metadata.py	50	3	94%	95–96, 100
openhands/tools/execute_bash/terminal
__init__.py	6	0	100%
factory.py	49	11	77%	24–25, 30, 32, 35, 37–38, 44–46, 97
interface.py	69	15	78%	43, 52, 62, 71, 76, 85, 94, 99, 145, 157, 162, 171, 180, 191, 193
subprocess_terminal.py	236	59	75%	68, 99–100, 126, 132, 139, 146–147, 157–158, 164–165, 179, 181, 185–187, 193, 209, 218–222, 257–259, 264, 276, 290, 314, 316, 325, 346, 362, 367, 373–375, 383–384, 388–389, 391–397, 401–402, 405–406, 408–409, 411–413
terminal_session.py	178	8	95%	92, 96–98, 235, 281, 297, 317
tmux_terminal.py	80	21	73%	36, 45, 108, 119, 133, 145–152, 160–161, 163–164, 166, 168–170
openhands/tools/execute_bash/utils
command.py	81	4	95%	48, 64–66
openhands/tools/str_replace_editor
__init__.py	3	0	100%
definition.py	65	9	86%	87, 99, 119, 122, 125, 132, 134, 136, 138
editor.py	228	11	95%	131, 264, 340, 350, 401–402, 641, 648–649, 663, 668
exceptions.py	22	0	100%
impl.py	26	2	92%	31–32
openhands/tools/str_replace_editor/utils
__init__.py	0	0	100%
config.py	2	0	100%
constants.py	5	0	100%
diff.py	64	1	98%	115
encoding.py	54	1	98%	81
file_cache.py	95	9	90%	44–46, 49–50, 54, 59, 151, 154
history.py	66	1	98%	79
shell.py	23	0	100%
openhands/tools/task_tracker
__init__.py	2	0	100%
definition.py	132	94	28%	48, 51–53, 55–56, 59–60, 62, 78, 83, 85, 87–88, 91, 94–96, 98–99, 102–108, 110–112, 115, 117–120, 122, 125, 128–129, 131–132, 134–135, 137, 150–151, 154–155, 159, 161, 163–165, 171, 173–174, 179–180, 184, 193–194, 196–198, 202–203, 205–208, 210, 214–215, 217–219, 221–225, 229, 233–234, 236–237, 239, 241–245, 407, 410
openhands/tools/utils
__init__.py	0	0	100%
TOTAL	5234	900	82%

github-actions · 2025-09-12T14:12:45Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

simonrosenberg · 2025-09-12T14:14:11Z

@OpenHands please fix the failing actions on PR #219 at branch add-integration-tests-and-cron-workflow

openhands-ai · 2025-09-12T14:14:21Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

github-actions · 2025-09-12T14:24:33Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-12T14:24:58Z

Trigger by: Pull Request (integration-test label on PR #219)
Commit: `efb737f`
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:
No report file found

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
No report file found

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

github-actions · 2025-09-12T20:16:34Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-12T20:16:55Z

Trigger by: Pull Request (integration-test label on PR #219)
Commit: `65bfb08`
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:
No report file found

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
No report file found

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

github-actions · 2025-09-12T20:33:14Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-12T20:34:30Z

Trigger by: Pull Request (integration-test label on PR #219)
Commit: 6ba6805
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:

Integration Tests Report - 6ba6805_sonnet_run

Success rate: 100.00% (1/1)

Total cost: USD 0.00

Test Results

instance_id	success	reason	cost	error_message
t01_fix_simple_typo_class_based	True	Successfully fixed all typos	0	nan

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:

Integration Tests Report - 6ba6805_deepseek_run

Success rate: 0.00% (0/1)

Total cost: USD 0.00

Test Results

instance_id	success	reason	cost	error_message
t01_fix_simple_typo_class_based	False	Test execution failed: litellm.BadRequestError: Litellm_proxyException - {'error': '/chat/completions: Invalid model name passed in model=deepseek/deepseek-reasoner. Call `/v1/models` to view available models for your key.'}	0	nan

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

github-actions · 2025-09-12T20:48:35Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-12T20:49:55Z

Trigger by: Pull Request (integration-test label on PR #219)
Commit: ce66b45
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:

Integration Tests Report - ce66b45_sonnet_run

Success rate: 100.00% (1/1)

Total cost: USD 0.00

Test Results

instance_id	success	reason	cost	error_message
t01_fix_simple_typo_class_based	True	Successfully fixed all typos	0	nan

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:

Integration Tests Report - ce66b45_deepseek_run

Success rate: 0.00% (0/1)

Total cost: USD 0.00

Test Results

instance_id	success	reason	cost	error_message
t01_fix_simple_typo_class_based	False	Test execution failed: litellm.BadRequestError: Litellm_proxyException - {'error': '/chat/completions: Invalid model name passed in model=deepseek-chat. Call `/v1/models` to view available models for your key.'}	0	nan

Download testing outputs (includes both Claude Sonnet 4 and DeepSeek results): Download

github-actions · 2025-09-15T13:34:34Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T13:34:34Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T13:35:55Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T13:36:48Z

Integration Tests Report

Trigger: Pull Request (integration-test label on PR #219)
Commit: d882213
Timestamp: 2025-09-15 13:36 UTC

Test Results Summary

Model	Success Rate	Cost	Test Results	Artifact Link
Claude Sonnet 4	100.00%	$0.0e+00	See details below	Download
GPT-5 Mini	100.00%	$0.0e+00	See details below	Download
DeepSeek Chat	100.00%	$0.0e+00	See details below	Download

Detailed Results

Claude Sonnet 4

# Integration Tests Report - d882213_sonnet_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |

GPT-5 Mini

# Integration Tests Report - d882213_gpt5_mini_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |

DeepSeek Chat

# Integration Tests Report - d882213_deepseek_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |

Overall Status: 3 models tested
Total Cost: $0.0e+00

github-actions · 2025-09-15T13:46:58Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T13:49:30Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T13:49:30Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

- Removed sys.path.insert() from run_infer.py - Both scripts now use clean global imports without path manipulation - Maintained clean import structure with format_cost from separate module - All imports work correctly with PYTHONPATH environment variable Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-09-15T14:10:48Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T14:10:48Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T14:10:49Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

openhands-ai · 2025-09-15T14:15:32Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run Integration Tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #219 at branch `add-integration-tests-and-cron-workflow`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

github-actions · 2025-09-15T14:18:17Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T14:18:17Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T14:18:17Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-09-15T14:19:52Z

Integration Tests Report

Trigger: Pull Request (integration-test label on PR #219)
Commit: 700fff4
Timestamp: 2025-09-15 14:19 UTC

Test Results Summary

Model	Success Rate	Cost	Test Results	Artifact Link
GPT-5 Mini	100.00%	$0.0e+00	See details below	Download
DeepSeek Chat	100.00%	$0.0e+00	See details below	Download
Claude Sonnet 4	100.00%	$0.0e+00	See details below	Download

Detailed Results

GPT-5 Mini

# Integration Tests Report - 700fff4_gpt5_mini_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |

DeepSeek Chat

# Integration Tests Report - 700fff4_deepseek_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |

Claude Sonnet 4

# Integration Tests Report - 700fff4_sonnet_run

Success rate: 100.00% (1/1)

Total cost: $0.0e+00

## Test Results

| instance_id                     | success   | reason                       | cost     |   error_message |
|:--------------------------------|:----------|:-----------------------------|:---------|----------------:|
| t01_fix_simple_typo_class_based | True      | Successfully fixed all typos | $0.0e+00 |             nan |

Overall Status: 3 models tested
Total Cost: $0.0e+00

Add integration tests and cron job workflow

1b421b3

simonrosenberg added the integration-test Runs the integration tests and comments the results label Sep 12, 2025

add tabulate

cdfe42d

simonrosenberg self-assigned this Sep 12, 2025

simonrosenberg removed the integration-test Runs the integration tests and comments the results label Sep 12, 2025

add synchronize cron job execution

4de8cc0

simonrosenberg added the integration-test Runs the integration tests and comments the results label Sep 12, 2025

simonrosenberg and others added 2 commits September 12, 2025 21:32

remove config

21036c0

Merge branch 'main' into add-integration-tests-and-cron-workflow

eff189d

simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 12, 2025

fix

87afb35

simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 12, 2025

Fix DeepSeek model name to use correct litellm_proxy/deepseek-chat

70635c8

simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 12, 2025

simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 15, 2025

simonrosenberg requested a review from xingyaoww September 15, 2025 13:39

small update

4695e4a

simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 15, 2025

add path

c6ba8e8

simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Sep 15, 2025

simonrosenberg merged commit 7dc37fe into main Sep 15, 2025
11 checks passed

simonrosenberg deleted the add-integration-tests-and-cron-workflow branch September 15, 2025 14:28

simonrosenberg linked an issue Sep 16, 2025 that may be closed by this pull request

Port over integration tests #150

Closed

2 tasks

Comments

Conversation

simonrosenberg commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

simonrosenberg commented Sep 12, 2025

Uh oh!

openhands-ai bot commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Trigger by: Pull Request (integration-test label on PR #219) Commit: efb737f Integration Tests Report (Claude Sonnet 4) Claude Sonnet 4 LLM Test Results: No report file found

Integration Tests Report (DeepSeek) DeepSeek LLM Test Results: No report file found

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Trigger by: Pull Request (integration-test label on PR #219) Commit: 65bfb08 Integration Tests Report (Claude Sonnet 4) Claude Sonnet 4 LLM Test Results: No report file found

Integration Tests Report (DeepSeek) DeepSeek LLM Test Results: No report file found

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Integration Tests Report - 6ba6805_sonnet_run

Test Results

Integration Tests Report - 6ba6805_deepseek_run

Test Results

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Integration Tests Report - ce66b45_sonnet_run

Test Results

Integration Tests Report - ce66b45_deepseek_run

Test Results

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Integration Tests Report

Test Results Summary

Detailed Results

Claude Sonnet 4

GPT-5 Mini

DeepSeek Chat

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

openhands-ai bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Integration Tests Report

Test Results Summary

github-actions bot commented Sep 12, 2025 •

edited

Loading

Trigger by: Pull Request (integration-test label on PR #219)
Commit: `efb737f`
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:
No report file found

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
No report file found

Trigger by: Pull Request (integration-test label on PR #219)
Commit: `65bfb08`
Integration Tests Report (Claude Sonnet 4)
Claude Sonnet 4 LLM Test Results:
No report file found

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
No report file found