Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding swe-bench docker to improve evaluation #246

Merged
merged 21 commits into from
Jul 4, 2024

Conversation

shubhras01
Copy link
Contributor

  • adds swe-bench-docker repo code to improve and run evaluation on docker
    tasks:
  • add evaluation function as part of run_eval script
  • build docker images and push it to public docker repo
  • use the same docker image to run composio-swe

@shubhras01 shubhras01 requested a review from kaavee315 July 2, 2024 14:22
Copy link
Contributor

ellipsis-dev bot commented Jul 2, 2024

Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev

@kaavee315 kaavee315 changed the base branch from master to kaavee/refactor_workspace July 4, 2024 15:06
Copy link

codiumai-pr-agent-pro bot commented Jul 4, 2024

CI Failure Feedback 🧐

(Checks updated until commit 6ba478d)

Action: test (ubuntu-latest, 3.11)

Failed stage: Unittests [❌]

Failed test name: composio/tools/local/shelltool/tests/test_workspace.py

Failure summary:

The action failed because there were import errors in the test file
composio/tools/local/shelltool/tests/test_workspace.py.

  • The specific error was ImportError: cannot import name 'ExecutionEnvironment' from
    'composio.tools.env.factory'.
  • This indicates that the ExecutionEnvironment class or function is missing or incorrectly named in
    the composio.tools.env.factory module.

  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    495:  * [new branch]        featembed-tool                           -> origin/featembed-tool
    496:  * [new branch]        fix/readme                               -> origin/fix/readme
    497:  * [new branch]        fix/readme-logo                          -> origin/fix/readme-logo
    498:  * [new branch]        fix/swe-agent                            -> origin/fix/swe-agent
    499:  * [new branch]        ft-add-better-help-text                  -> origin/ft-add-better-help-text
    500:  * [new branch]        ft-apps-id                               -> origin/ft-apps-id
    501:  * [new branch]        ft-bring-back-core-sdk                   -> origin/ft-bring-back-core-sdk
    502:  * [new branch]        ft-did-you-mean                          -> origin/ft-did-you-mean
    503:  * [new branch]        ft-error-tracking                        -> origin/ft-error-tracking
    ...
    
    877:  ✔ Actions updated
    878:  ⚠️ Triggers does not require update
    879:  unittests: commands[1]> pytest -vvv -rfE --doctest-modules composio/ tests/ --cov=composio --cov=examples --cov-report=html --cov-report=xml --cov-report=term --cov-report=term-missing --cov-config=.coveragerc
    880:  ============================= test session starts ==============================
    881:  platform linux -- Python 3.11.9, pytest-7.4.2, pluggy-1.5.0 -- /home/runner/work/composio/composio/python/.tox/unittests/bin/python
    882:  cachedir: .tox/unittests/.pytest_cache
    883:  rootdir: /home/runner/work/composio/composio/python
    884:  plugins: codecov-0.5.1, anyio-4.4.0, cov-5.0.0
    885:  collecting ... collected 44 items / 2 errors
    886:  ==================================== ERRORS ====================================
    887:  ___ ERROR collecting composio/tools/local/shelltool/tests/test_workspace.py ____
    ...
    
    902:  <frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
    903:  ???
    904:  <frozen importlib._bootstrap>:690: in _load_unlocked
    905:  ???
    906:  .tox/unittests/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    907:  exec(co, module.__dict__)
    908:  composio/tools/local/shelltool/tests/test_workspace.py:6: in <module>
    909:  from composio.tools.env.factory import ExecutionEnvironment, WorkspaceFactory
    910:  E   ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py)
    911:  ___ ERROR collecting composio/tools/local/shelltool/tests/test_workspace.py ____
    912:  ImportError while importing test module '/home/runner/work/composio/composio/python/composio/tools/local/shelltool/tests/test_workspace.py'.
    ...
    
    925:  <frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
    926:  ???
    927:  <frozen importlib._bootstrap>:690: in _load_unlocked
    928:  ???
    929:  .tox/unittests/lib/python3.11/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    930:  exec(co, module.__dict__)
    931:  composio/tools/local/shelltool/tests/test_workspace.py:6: in <module>
    932:  from composio.tools.env.factory import ExecutionEnvironment, WorkspaceFactory
    933:  E   ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py)
    ...
    
    1077:  composio/utils/shared.py                                                           117    104    11%   43-83, 99-108, 139-143, 153-158, 174-221, 247-292, 324-337
    1078:  composio/utils/url.py                                                               10      6    40%   19, 24-35
    1079:  examples/crewai_ci_chart.py                                                         15     15     0%   1-38
    1080:  --------------------------------------------------------------------------------------------------------------
    1081:  TOTAL                                                                             7829   1708    78%
    1082:  Coverage HTML written to dir htmlcov
    1083:  Coverage XML written to file coverage.xml
    1084:  =========================== short test summary info ============================
    1085:  ERROR composio/tools/local/shelltool/tests/test_workspace.py - ImportError: cannot import name 'ExecutionEnvironment' from 'composio.tools.env.factory' (/home/runner/work/composio/composio/python/composio/tools/env/factory.py)
    1086:  ERROR composio/tools/local/shelltool/tests/test_workspace.py
    1087:  !!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
    1088:  ========================= 1 warning, 2 errors in 4.26s =========================
    1089:  unittests: exit 2 (5.11 seconds) /home/runner/work/composio/composio/python> pytest -vvv -rfE --doctest-modules composio/ tests/ --cov=composio --cov=examples --cov-report=html --cov-report=xml --cov-report=term --cov-report=term-missing --cov-config=.coveragerc pid=5655
    1090:  .pkg: _exit> python /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
    1091:  unittests: FAIL code 2 (27.00=setup[18.18]+cmd[3.72,5.11] seconds)
    1092:  evaluation failed :( (27.14 seconds)
    1093:  ##[error]Process completed with exit code 2.
    

    ✨ CI feedback usage guide:

    The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
    The tool analyzes the failed checks and provides several feedbacks:

    • Failed stage
    • Failed test name
    • Failure summary
    • Relevant error logs

    In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

    /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"
    

    where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

    Configuration options

    • enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
    • excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
    • enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
    • persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
    • final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

    See more information about the checks tool in the docs.

    @kaavee315 kaavee315 merged commit 14e1c85 into kaavee/refactor_workspace Jul 4, 2024
    1 of 5 checks passed
    @kaavee315 kaavee315 deleted the shubhra/run_eval_fixes branch July 4, 2024 15:10
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants