Skip to content
This repository has been archived by the owner on Jun 9, 2024. It is now read-only.

Use agent protocol for Benchmarks #209

Closed
wants to merge 2 commits into from

Conversation

jakubno
Copy link
Contributor

@jakubno jakubno commented Jul 28, 2023

Background

Showcase how benchmarks could look like with implemented Agent protocol. Smol-developer implements the protocol and you can run the tests without any additional setup, you just write path to correct folder (there's open PR waiting to be merged to work properly).

This is first iteration. I propose to start here and then iterate on our needs and problems. The protocol right now is very simple. I would like to start conversation what should be improved. I think benchmarks are good place to start. It has various use cases and simulates the ability to convey the results well.

Let's discuss if we want to have both options for now or we push the integration of agent protocol and use only the "api" version of challenges.

There's already few things I encountered during the implementation and testing, which are little bit problematic right now:

  • Passing files (should we pass them directly or just reference to them)
  • How to distinguish the results / progress info. Should/can it be general?

Other insights are more than welcomed!

Changes

Remove agent_interface and change it to general agent protocol.

PR Quality Checklist

  • [ x ] I have run the following commands against my code to ensure it passes our linters:
    black . --exclude test.py
    isort .
    mypy .
    autoflake --remove-all-unused-imports --recursive --ignore-init-module-imports --ignore-pass-after-docstring --in-place agbenchmark

@waynehamadi
Copy link
Contributor

@jakubno if I understand, you're coming for something different soon, so I am going to close soon this one

@waynehamadi waynehamadi closed this Aug 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants