Use agent protocol for Benchmarks #209

jakubno · 2023-07-28T07:12:46Z

Background

Showcase how benchmarks could look like with implemented Agent protocol. Smol-developer implements the protocol and you can run the tests without any additional setup, you just write path to correct folder (there's open PR waiting to be merged to work properly).

This is first iteration. I propose to start here and then iterate on our needs and problems. The protocol right now is very simple. I would like to start conversation what should be improved. I think benchmarks are good place to start. It has various use cases and simulates the ability to convey the results well.

Let's discuss if we want to have both options for now or we push the integration of agent protocol and use only the "api" version of challenges.

There's already few things I encountered during the implementation and testing, which are little bit problematic right now:

Passing files (should we pass them directly or just reference to them)
How to distinguish the results / progress info. Should/can it be general?

Other insights are more than welcomed!

Changes

Remove agent_interface and change it to general agent protocol.

PR Quality Checklist

[ x ] I have run the following commands against my code to ensure it passes our linters:

black . --exclude test.py
isort .
mypy .
autoflake --remove-all-unused-imports --recursive --ignore-init-module-imports --ignore-pass-after-docstring --in-place agbenchmark

waynehamadi · 2023-08-05T16:50:31Z

@jakubno if I understand, you're coming for something different soon, so I am going to close soon this one

jakubno added 2 commits July 29, 2023 18:36

Use agent protocol for benchmarks

7e523bd

Add missing typehint

0e8b67c

jakubno force-pushed the master branch from 4835465 to 0e8b67c Compare July 29, 2023 16:36

waynehamadi closed this Aug 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use agent protocol for Benchmarks #209

Use agent protocol for Benchmarks #209

jakubno commented Jul 28, 2023

waynehamadi commented Aug 5, 2023

Use agent protocol for Benchmarks #209

Use agent protocol for Benchmarks #209

Conversation

jakubno commented Jul 28, 2023

Background

Changes

PR Quality Checklist

waynehamadi commented Aug 5, 2023