-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows MMLU to have the system_prompt provided to it #197
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>
I am currently running a e2e test on granite-8b-starter and still seeing failures (manually brought in patches) |
Confirmed my image had patches for both libraries and then proceeded to run mmlu eval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, might be good to pass an example system_prompt from a unit test:
Line 65 in d8d6097
mmlu = MMLUEvaluator(model_path=MODEL_EXAMPLE, tasks=tasks) |
It wouldn't test much but at least to make sure the arg passing works.
The other place to potentially add an example would be in the test scripts:
That's less valuable since they aren't run anywhere. But could be a helpful example.
Hit with this same error
|
I am happy to bring in patches and do a e2e test to ensure we are at a place that everything is good on granite-8b-starter |
Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, might be good to pass an example system_prompt from a unit test:
Line 65 in d8d6097
mmlu = MMLUEvaluator(model_path=MODEL_EXAMPLE, tasks=tasks)
I've updated the PR to include these changes, please take a look and let me know if there's anything I missed.
Regarding this comment, Tyler and I had a debug session where we found that adding the new prompt template causes MMLU to exceed the existing model's context (4096 tokens) when --few-shot=5
is used. This happens because lm_eval harness appears to truncate the prompt by default and just send the last 4096 tokens fitting into the context, which causes the system prompt to disappear and throw an error on certain models.
The resolution here is to use a lower number for --few-shot
such as 1-3.
The MMLU evaluator currently performs MMLU evaluation by calling out to the lm_eval harness. The lm-eval harness itself handles the evaluation by creating its own prompt clientside and then passing it to
/v1/completions
of whatever OpenAI-compatible API that it's expecting to receive it at the other end.Certain models however require to have their chat template used to be used during inference in order to get the best results. This PR adjusts the MMLU evaluator by allowing the system prompt to be provided to the model and then will enable the chat template mode to be used when it's present.
Signed-off-by: Oleg S 97077423+RobotSail@users.noreply.github.com