Skip to content

Conversation

@Neph0s
Copy link

@Neph0s Neph0s commented Jul 4, 2025

Problem

The grade_sample method in browsecomp_eval.py has a regex bug that prevents correct grading
evaluation.

Current code:

match = re.search(r"correct: (yes|no)", grading_response)
return match.group(0) if match else "no"  # Returns "correct: yes" or "correct: no"

...
grade_result = self.grade_sample(problem, answer, response_text)

# Metrics based on grading response
is_correct = grade_result == "yes"
is_incorrect = grade_result == "no"

Issue:
- match.group(0) returns the entire match ("correct: yes" or "correct: no")
- But the code later compares grade_result == "yes" which always fails
- This causes is_correct and is_incorrect metrics to always be False

Solution

Change match.group(0) to match.group(1) to extract the captured group:

match = re.search(r"correct: (yes|no)", grading_response)
return match.group(1) if match else "no"  # Returns "yes" or "no"

Impact

- Fixes broken grading logic that was always marking responses as incorrect

Testing

The fix ensures that:
- Input "The answer is correct: yes"returns "yes" (was "correct: yes")
- Input "The answer is correct: no"returns "no" (was "correct: no")
- Comparisons grade_result == "yes" now work correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant