Skip to content

Conversation

@yanxi0830
Copy link
Contributor

@yanxi0830 yanxi0830 commented Nov 18, 2024

What does this PR do?

  • add ability to run agents generation for full eval (generate + scoring)
  • pre-register SimpleQA benchmark llm-as-judge scoring function in code

Test Plan

image

image

Simple QA w/ Search

image

  • eval_task_config_simpleqa_search.json
{
    "type": "benchmark",
    "eval_candidate": {
        "type": "agent",
        "config": {
            "model": "Llama3.1-405B-Instruct",
            "instructions": "Please use the search tool to answer the question.",
            "sampling_params": {
                "strategy": "greedy",
                "temperature": 1.0,
                "top_p": 0.9
            },
            "tools": [
                {
                    "type": "brave_search",
                    "engine": "brave",
                    "api_key": "API_KEY"
                }
            ],
            "tool_choice": "auto",
            "tool_prompt_format": "json",
            "input_shields": [],
            "output_shields": [],
            "enable_session_persistence": false
        }
    }
}

SimpleQA w/o Search

image

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Ran pre-commit to handle lint / formatting issues.
  • Read the contributor guideline,
    Pull Request section?
  • Updated relevant documentation.
  • Wrote necessary unit or integration tests.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2024
@yanxi0830 yanxi0830 marked this pull request as ready for review November 18, 2024 05:02
@@ -0,0 +1,91 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanxi0830 yanxi0830 merged commit 0784284 into main Nov 18, 2024
2 checks passed
@yanxi0830 yanxi0830 deleted the agent_in_eval branch November 18, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants