-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for other models in AutoEval #44
Comments
@NivekT i think if we add this with #31 , it might be faster and we can build on top of Llama-index's evals. What do you think? |
So we want to accept an array of models and evaluate against all of them, right? |
@rachittshah We can consider add LlamaIndex's eval if it integrates well with the pattern we have here. Feel free to propose something and we can have a look. @divij9 That can be part of it, but for each of the eval function linked above, they currently only support OpenAI or Anthropic. I will update the main issue to break the request into pieces that are easier for first-time contributors to work on. |
I have updated the ask to be bite-size. Feel free to comment if anything is unclear! |
I think I understand what our goal is. Can you please assign this to me? |
@Divij97 Sure! Let us know if you plan to work on all 4 subtasks or a specific one. Feel free pick whichever you think you can contribute to. Thanks! |
I'd love to help adding support for new models using https://github.com/BerriAI/litellm. Let me know if I can help out on this too |
@ishaan-jaff Awesome! Could you create an issue for it and I can assign that to you? I think the best approach would be to create a |
Hi @steventkrawczyk I have made the required changes but I am not able to push. I signed the cla but the push still returns a 403 for me |
@Divij97 you will need to push to a fork and raise a PR from the fork to the original repo |
Nevermind it was a key chain issue with my stupid mac. Can you please take a look at this PR: #59. It's not complete but I wanted to understand if I am headed in the right direction |
Wanted to update about my progress. I am done with all the changes but wanted some help testing them out for Anthropic. How do I generate an Anthropic key to test? |
🚀 The feature
This is a good task for a new contributor
We have a few utility functions to perform AutoEval:
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval.py
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval_scoring.py
https://github.com/hegelai/prompttools/blob/main/prompttools/utils/expected.py
Currently, they tend to only support one model each. Someone can re-factor the code for each of them to support multiple models. I would recommend making sure they all support for the best known models such as GPT-4 and Claude 2.
We can even consider LlaMA but that is less urgent.
Tasks
autoeval_binary_scoring
can take inmodel
as an argument. Let's make suregpt-4
andclaude-2
are both accepted and invoke the right completion function.autoeval_scoring
. OpenAI needs to added here.compute_similarity_against_model
gpt-4
andclaude-2
) at the same timeMotivation, pitch
Allow people to auto-evaluate with different best models would be ideal
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: