Skip to content

Port over integration tests #150

@xingyaoww

Description

@xingyaoww

"Integration tests" we discussed in the context of this issue should be primarily be "tests that uses LLM and runs the system end to end" - maybe we should name them regression tests?

It should consists of two types:

  • Tests for "Agent behavior": using real-LLM, give the same instruction, and use a testcase to check if the agent is able to perform changes to the workspace to result in a desirable final state. We have integration tests for this at https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation/integration_tests that runs daily and will comment the result to a github issue (workflow). We need to migrate it to this repository and also have it run daily with the same behavior.

  • Tests for "System behavior": using real-LLM to check whether the system components is working or not. Examples would be: checking if the .reasoning_content is successfully returned by the API for changes introduced in this PR: feat: Support reasoning content in Agent SDK #139

Sub-issues

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions