Port over integration tests

"Integration tests" we discussed in the context of this issue should be primarily be "tests that uses LLM and runs the system end to end" - maybe we should name them regression tests?

 
It should consists of two types:

- [ ] Tests for "Agent behavior": using real-LLM, give the same instruction, and use a testcase to check if the agent is able to perform changes to the workspace to result in a desirable final state. We have integration tests for this at https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation/integration_tests that runs daily and will comment the result to a github issue ([workflow](https://github.com/All-Hands-AI/OpenHands/blob/main/.github/workflows/integration-runner.yml)). We need to migrate it to this repository and also have it run daily with the same behavior.

- [ ] Tests for "System behavior": using real-LLM to check whether the system components is working or not. Examples would be: checking if the `.reasoning_content` is successfully returned by the API for changes introduced in this PR: https://github.com/All-Hands-AI/agent-sdk/pull/139


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port over integration tests #150

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Port over integration tests #150

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions