Skip to content

Comments

Integration tests (openhands fix issue 5076)#8

Closed
enyst wants to merge 37 commits intomainfrom
int/openhands-fix-issue-5076
Closed

Integration tests (openhands fix issue 5076)#8
enyst wants to merge 37 commits intomainfrom
int/openhands-fix-issue-5076

Conversation

@enyst
Copy link
Owner

@enyst enyst commented Nov 23, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions


Link of any specific issues this addresses

@github-actions
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: a7ff9aa
Integration Tests Evaluation Report


You can download the full evaluation outputs here.

@github-actions
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: f1ac848
Integration Tests Evaluation Report


You can download the full evaluation outputs here.

@github-actions
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: 9ba684d
Integration Tests Evaluation Report
Success rate: 66.67% (4/6)

instance_id success reason
t02_add_bash_hello True
t05_simple_browsing False The answer is not found in any message. Total messages: 0. Messages: []
t04_git_staging True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 0. Messages: []
t03_jupyter_write_file True
t01_fix_simple_typo True

You can download the full evaluation outputs here.

Repository owner deleted a comment from github-actions bot Nov 25, 2024
Repository owner deleted a comment from github-actions bot Nov 25, 2024
@enyst
Copy link
Owner Author

enyst commented Nov 25, 2024

@openhands-agent Make the integration-runner workflow work also on schedule, a nightly schedule.

@github-actions
Copy link

OpenHands started fixing the pr! You can monitor the progress here.

Repository owner deleted a comment from github-actions bot Nov 25, 2024
Repository owner deleted a comment from github-actions bot Nov 25, 2024
Repository owner deleted a comment from github-actions bot Nov 25, 2024
@github-actions
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: de57c25
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t04_git_staging True
t06_github_pr_browsing True

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

Repository owner deleted a comment from github-actions bot Nov 25, 2024
@github-actions
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

Repository owner deleted a comment from github-actions bot Nov 25, 2024
@github-actions
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: 95425d3
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 100.00% (6/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing True
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t04_git_staging True
t02_add_bash_hello True
t01_fix_simple_typo True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 4.

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

@github-actions
Copy link

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link

Trigger by: Pull Request (integration-test label on PR #8)
Commit: 95425d3
Integration Tests Report (Haiku)
Haiku LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t02_add_bash_hello True
t01_fix_simple_typo True
t05_simple_browsing False The answer is not found in any message. Total messages: 4.
t06_github_pr_browsing True
t04_git_staging True

Integration Tests Report (DeepSeek)
DeepSeek LLM Test Results:
Success rate: 83.33% (5/6)

instance_id success reason
t03_jupyter_write_file True
t05_simple_browsing True
t02_add_bash_hello True
t04_git_staging True
t06_github_pr_browsing False The answer is not found in any message. Total messages: 2.
t01_fix_simple_typo True

Download evaluation outputs (includes both Haiku and DeepSeek results): Download

@enyst enyst force-pushed the int/openhands-fix-issue-5076 branch from b37602a to 0c22181 Compare November 25, 2024 22:19
enyst added a commit that referenced this pull request Nov 25, 2024
* Fix issue OpenHands#5076: Integration test github action

* Update integration-runner.yml

* Update integration-runner.yml

* update variables

* use haiku

* use base url

* fix report name

* Fix pr #8: Integration tests (openhands fix issue 5076)

* Revert "Fix pr #8: Integration tests (openhands fix issue 5076)"

This reverts commit dcd4681.

* Fix pr #8: Integration tests (openhands fix issue 5076)

* use haiku explicitly, in results too

* remove duplicate

* Update .github/workflows/integration-runner.yml

* Revert "Update .github/workflows/integration-runner.yml"

This reverts commit 7e7200e.

* funny space

* Fix pr #8: Integration tests (openhands fix issue 5076)

* artifact fix

* clean up remote runtimes

* clean up runtimes more aggressively - a bit unexpected though

* Fix pr #8: Integration tests (openhands fix issue 5076)

* fix type issue that was preventing checking results

* try with waiting time

* add eval notes

* increase timeouts

* try with CI local builds

* fix eval output

* set debug

* fix tests!

* fix outputs

* keep details in logs, not github comment

* tweak schedule

* lint-y

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants