-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds basic ragas eval #193
base: main
Are you sure you want to change the base?
Conversation
requirements.txt
Outdated
@@ -10,3 +10,5 @@ pandas | |||
pandas-stubs | |||
lm-eval>=0.4.4 | |||
httpx | |||
|
|||
ragas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing newline at EOF
def __init__(self): | ||
pass | ||
|
||
def run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for this a user is expected to bring a list of Sample objects, which hold the input, prediction, and ground truth? Are we going to provide a way to build this list of Samples from given files or lists of each category, or is this moreso just for use with self-built scripts that import the Sample object and build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it such that dataset
now is either a pathlib.Path
object or a list
of samples, and we read what we need to accordingly
Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>
We want ragas to read from both a list as well as a list of samples Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>
When a dataset is provided and is missing the `response` field, we will need to generate these responses. This commit ensures that when this case happens, we will error out when a student model is not configured. Otherwise, we will always generate these responses if the student model exists, regardless if `response` is in the dataframe or not. Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>
@RobotSail let me know once you are done with testing the code. Other than that LGTM. |
|
||
max_tokens: int = 768 | ||
|
||
# Random seed for reproducibility. This is not supported everywhere and therefore is unreliable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had discussed this earlier, I think you're going to want to remove this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alimaredia I'll update this comment because I believe you are confusing it's meaning with our earlier conversation. This comment relates to the seed not being supported by every model serving framework.
This PR introduces Rubric-based evaluation through Ragas using the default with-reference rubric that they provide.
The current evaluation supports the two following modes:
dataset
contains records which holduser_input
(question),reference
(golden answer), andresponse
(model answer)dataset
with theuser_input
andreference
, and additionally aModelConfiguration
of a model which will be used to generate theresponse
for each question. This can be any model running on an OpenAI-compatible endpointSigned-off-by: Oleg S 97077423+RobotSail@users.noreply.github.com