-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add evaluation notebook #31
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hemajv for adding the notebook. Presents good metrics for evaluation. Would we want all these evaluation metrics to be added to the pipeline and UI or we want to select which ones we want to ship further.
We have already added some of these to the UI but they may not be actually relevant or well performing. What we could look into from here is A quantitative eval of whether how often these scores are valid.
That will help us determine which out of the langchain eval criteria are doing well and also whether or not this is working most of the time. We can do the same evaluation also of the custom gpt 3 prompt evaluation and see how well that is performing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the notebook @hemajv 🎉 This is a great start
I added a comment on how we can follow this up with some quantitative eval to get a sense of which criteria we should include and whether this is actually working most of the time.
Yes, definitely agree on this 👍 Seems like some of the Langchain criteria are doing a pretty decent job in evaluating the outputs accurately. As you mentioned, we need to try this out with a certain set of cases to get a better understanding on how well its doing. |
I started creating a framework to quantitatively evaluate this as an extension to this notebook. Will also add that in a separate PR |
@oindrillac addressed the comments and pushed new commits, ready for your review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @hemajv LGTM 👍
This addresses #24