Skip to content

Implement LLM check for evaluation #49

@DachengLi1

Description

@DachengLi1

Currently, we follow qwen-math github to parse the evaluation logic. However, many are false negatives - the responses are mostly correct but wrongly parsed.

We should use LLM to check the response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions