Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract model costs into log and CSVs, so the pricing information is always available #216

Merged
merged 2 commits into from
Jun 26, 2024

Conversation

ruiAzevedo19
Copy link
Contributor

Part of #210

@ruiAzevedo19 ruiAzevedo19 requested a review from bauersimon June 25, 2024 15:43
@ruiAzevedo19 ruiAzevedo19 force-pushed the 210-model-costs branch 2 times, most recently from 12ac8e9 to de43bdb Compare June 25, 2024 16:01
@ruiAzevedo19 ruiAzevedo19 self-assigned this Jun 25, 2024
@ruiAzevedo19 ruiAzevedo19 added the enhancement New feature or request label Jun 25, 2024
@ruiAzevedo19 ruiAzevedo19 added this to the v0.6.0 milestone Jun 25, 2024
provider/openrouter/openrouter.go Outdated Show resolved Hide resolved
provider/openrouter/openrouter.go Outdated Show resolved Hide resolved
provider/openrouter/openrouter.go Outdated Show resolved Hide resolved
provider/openrouter/openrouter.go Show resolved Hide resolved
provider/openrouter/openrouter.go Outdated Show resolved Hide resolved
model/model.go Outdated Show resolved Hide resolved
provider/openrouter/openrouter.go Outdated Show resolved Hide resolved
evaluate/report/csv.go Outdated Show resolved Hide resolved
cmd/eval-dev-quality/cmd/evaluate.go Outdated Show resolved Hide resolved
cmd/eval-dev-quality/cmd/evaluate_test.go Outdated Show resolved Hide resolved
@bauersimon
Copy link
Member

Please try this out with the cheapest model from openrouter and post the CSV here to see how it looks like.

@ruiAzevedo19
Copy link
Contributor Author

@bauersimon These are the results

  • Command: eval-dev-quality evaluate --runs 1 --repository golang/plain --model openrouter/meta-llama/llama-3-8b-instruct
  • Model info: https://openrouter.ai/models/meta-llama/llama-3-8b-instruct
  • Current price:
    • Prompt: $0.07/M input tokens = $0.00000007/input token
    • Completion: $0.07/M output tokens = $0.00000007/output token
    • Total cost should be: $0.00000014
evaluation.csv
model,cost,language,repository,task,score,coverage,files-executed,generate-tests-for-file-character-count,processing-time,response-character-count,response-no-error,response-no-excess,response-with-code
openrouter/meta-llama/llama-3-8b-instruct,0.00000014,golang,golang/plain,write-tests,1,0,0,87,1186,90,1,0,0
evaluation.log
(...)
2024/06/26 10:35:47 Evaluation score for "openrouter/meta-llama/llama-3-8b-instruct" ("response-no-code"): cost=0.00, score=1, coverage=0, files-executed=0, generate-tests-for-file-character-count=87, processing-time=1186, response-character-count=90, response-no-error=1, response-no-excess=0, response-with-code=0

golang-summed.csv
model,cost,score,coverage,files-executed,generate-tests-for-file-character-count,processing-time,response-character-count,response-no-error,response-no-excess,response-with-code
openrouter/meta-llama/llama-3-8b-instruct,0.00000014,1,0,0,87,1186,90,1,0,0

models-summed.csv
model,cost,score,coverage,files-executed,generate-tests-for-file-character-count,processing-time,response-character-count,response-no-error,response-no-excess,response-with-code
openrouter/meta-llama/llama-3-8b-instruct,0.00000014,1,0,0,87,1186,90,1,0,0

@bauersimon
Copy link
Member

Awesome. The cost in the log is kinda useless but it should be higher for more expensive models anyways. Just need to remember to scale them up for our evaluations then.

@bauersimon bauersimon merged commit 0af4eab into main Jun 26, 2024
4 checks passed
@bauersimon bauersimon deleted the 210-model-costs branch June 26, 2024 09:47
@bauersimon bauersimon mentioned this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants