feat: adding swe-bench docker to improve evaluation #246

shubhras01 · 2024-07-02T14:22:54Z

adds swe-bench-docker repo code to improve and run evaluation on docker
tasks:
add evaluation function as part of run_eval script
build docker images and push it to public docker repo
use the same docker image to run composio-swe

ellipsis-dev · 2024-07-02T14:22:57Z

Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev

codiumai-pr-agent-pro · 2024-07-04T15:08:21Z

CI Failure Feedback 🧐

(Checks updated until commit `6ba478d`)

Action: test (ubuntu-latest, 3.11)
Failed stage: Unittests [❌]
Failed test name: composio/tools/local/shelltool/tests/test_workspace.py
Failure summary: The action failed because there were import errors in the test file `composio/tools/local/shelltool/tests/test_workspace.py`. The specific error was `ImportError: cannot import name 'ExecutionEnvironment' from` `'composio.tools.env.factory'`. This indicates that the `ExecutionEnvironment` class or function is missing or incorrectly named in the `composio.tools.env.factory` module.
Relevant error logs: 1: ##[group]Operating System 2: Ubuntu ... 495: * [new branch] featembed-tool -> origin/featembed-tool 496: * [new branch] fix/readme -> origin/fix/readme 497: * [new branch] fix/readme-logo -> origin/fix/readme-logo 498: * [new branch] fix/swe-agent -> origin/fix/swe-agent 499: * [new branch] ft-add-better-help-text -> origin/ft-add-better-help-text 500: * [new branch] ft-apps-id -> origin/ft-apps-id 501: * [new branch] ft-bring-back-core-sdk -> origin/ft-bring-back-core-sdk 502: * [new branch] ft-did-you-mean -> origin/ft-did-you-mean 503: * [new branch] ft-error-tracking -> origin/ft-error-tracking ... 877: ✔ Actions updated 878: ⚠️ Triggers does not require update 879: unittests: commands[1]> pytest -vvv -rfE --doctest-modules composio/ tests/ --cov=composio --cov=examples --cov-report=html --cov-report=xml --cov-report=term --cov-report=term-missing --cov-config=.coveragerc 880: ============================= test session starts ============================== 881: platform linux -- Python 3.11.9, pytest-7.4.2, pluggy-1.5.0 -- /home/runner/work/composio/composio/python/.tox/unittests/bin/python 882: cachedir: .tox/unittests/.pytest_cache 883: rootdir: /home/runner/work/composio/composio/python 884: plugins: codecov-0.5.1, anyio-4.4.0, cov-5.0.0 885: collecting ... collected 44 items / 2 errors 886: ==================================== ERRORS ==================================== 887: ___ ERROR collecting composio/tools/local/shelltool/tests/test_workspace.py ____ ... 902: <frozen importlib._bootstrap>:1147: in _find_and_load_unlocked 903: ??? 904: <frozen importlib._bootstrap>:690: in _load_unlocked 905: ??? 906: .tox/unittests/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module 907: exec(co, module.__dict__) 908: composio/tools/local/shelltool/tests/test_workspace.py:6: in <module> 909: from composio.tools.env.factory import ExecutionEnvironment, WorkspaceFactory 910: E ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py) 911: ___ ERROR collecting composio/tools/local/shelltool/tests/test_workspace.py ____ 912: ImportError while importing test module '/home/runner/work/composio/composio/python/composio/tools/local/shelltool/tests/test_workspace.py'. ... 925: <frozen importlib._bootstrap>:1147: in _find_and_load_unlocked 926: ??? 927: <frozen importlib._bootstrap>:690: in _load_unlocked 928: ??? 929: .tox/unittests/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module 930: exec(co, module.__dict__) 931: composio/tools/local/shelltool/tests/test_workspace.py:6: in <module> 932: from composio.tools.env.factory import ExecutionEnvironment, WorkspaceFactory 933: E ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py) ... 1077: composio/utils/shared.py 117 104 11% 43-83, 99-108, 139-143, 153-158, 174-221, 247-292, 324-337 1078: composio/utils/url.py 10 6 40% 19, 24-35 1079: examples/crewai_ci_chart.py 15 15 0% 1-38 1080: -------------------------------------------------------------------------------------------------------------- 1081: TOTAL 7829 1708 78% 1082: Coverage HTML written to dir htmlcov 1083: Coverage XML written to file coverage.xml 1084: =========================== short test summary info ============================ 1085: ERROR composio/tools/local/shelltool/tests/test_workspace.py - ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py) 1086: ERROR composio/tools/local/shelltool/tests/test_workspace.py 1087: !!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!! 1088: ========================= 1 warning, 2 errors in 4.26s ========================= 1089: unittests: exit 2 (5.11 seconds) /home/runner/work/composio/composio/python> pytest -vvv -rfE --doctest-modules composio/ tests/ --cov=composio --cov=examples --cov-report=html --cov-report=xml --cov-report=term --cov-report=term-missing --cov-config=.coveragerc pid=5655 1090: .pkg: _exit> python /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__ 1091: unittests: FAIL code 2 (27.00=setup[18.18]+cmd[3.72,5.11] seconds) 1092: evaluation failed :( (27.14 seconds) 1093: ##[error]Process completed with exit code 2.

✨ CI feedback usage guide:

The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
The tool analyzes the failed checks and provides several feedbacks:

Failed stage
Failed test name
Failure summary
Relevant error logs

In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

/checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"

where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

Configuration options

enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

See more information about the checks tool in the docs.

Fix import paths (#245)

61e4d6b

shubhras01 requested a review from kaavee315 July 2, 2024 14:22

shubhras01 and others added 7 commits July 3, 2024 14:50

changes

6da51c2

fixing taskgroup in asyncio lib

78bb429

adding task context manager

94249fc

remove absolute file paths

2dc183a

remove docker files

596fc26

add generate report card

ed01c7f

Formatting changes

dab0d48

kaavee315 force-pushed the shubhra/run_eval_fixes branch from c1372a9 to dab0d48 Compare July 3, 2024 10:32

kaavee315 and others added 13 commits July 3, 2024 21:57

Update log director

3efc300

Update

5ea77ab

Add license

1bb3794

add logic to generate dockerfiles for swe run

101c16b

changes

a565a64

format fixes

0d49bc8

format fix

d696211

format fix

b7ecc3a

docker file gen

6ee3c27

scripts for docker images

04a5ce3

Merge branch 'kaavee/refactor_workspace' into shubhra/run_eval_fixes

19fc4b2

Fix

d0dc5e2

Update

6ba478d

kaavee315 changed the base branch from master to kaavee/refactor_workspace July 4, 2024 15:06

kaavee315 merged commit 14e1c85 into kaavee/refactor_workspace Jul 4, 2024
1 of 5 checks passed

kaavee315 deleted the shubhra/run_eval_fixes branch July 4, 2024 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adding swe-bench docker to improve evaluation #246

feat: adding swe-bench docker to improve evaluation #246

shubhras01 commented Jul 2, 2024

ellipsis-dev bot commented Jul 2, 2024

codiumai-pr-agent-pro bot commented Jul 4, 2024 •

edited

Loading

Configuration options

feat: adding swe-bench docker to improve evaluation #246

feat: adding swe-bench docker to improve evaluation #246

Conversation

shubhras01 commented Jul 2, 2024

ellipsis-dev bot commented Jul 2, 2024

codiumai-pr-agent-pro bot commented Jul 4, 2024 • edited Loading

CI Failure Feedback 🧐

(Checks updated until commit 6ba478d)

Configuration options

codiumai-pr-agent-pro bot commented Jul 4, 2024 •

edited

Loading

(Checks updated until commit `6ba478d`)