Skip to content

Conversation

SLR722
Copy link
Contributor

@SLR722 SLR722 commented Mar 19, 2025

What does this PR do?

In this PR, we added a new eval open benchmark IfEval based on paper https://arxiv.org/abs/2311.07911 to measure the model capability of instruction following.

Test Plan

spin up a llama stack server with open-benchmark template

run llama-stack-client --endpoint xxx eval run-benchmark "meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/" --num-examples 20 on client side and get the eval aggregate results

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 19, 2025
@SLR722 SLR722 marked this pull request as ready for review March 19, 2025 21:51
@SLR722 SLR722 marked this pull request as draft March 19, 2025 21:53
@SLR722 SLR722 marked this pull request as ready for review March 19, 2025 23:30
@SLR722 SLR722 merged commit f369871 into main Mar 19, 2025
14 checks passed
@SLR722 SLR722 deleted the if_eval branch March 19, 2025 23:40
SLR722 added a commit to llamastack/llama-stack-client-python that referenced this pull request Mar 19, 2025
## What does this PR do?

add weighted_average aggreagtion function support for
llamastack/llama-stack#1708
franciscojavierarceo pushed a commit to franciscojavierarceo/llama-stack that referenced this pull request Mar 22, 2025
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.


## Test Plan
spin up a llama stack server with open-benchmark template

run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants