Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: work around exceptions from Math-Verify #158

Closed
wants to merge 2 commits into from

Conversation

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the error handling @ctjlewis ! Looks good to me with some nits, but I'll let @hynky1999 comment as well since he's the creator of math-verify

# Reward 1 if the content is the same as the ground truth, 0 otherwise
reward = float(verify(answer_parsed, gold_parsed))
except Exception as e:
print("Failed to verify answer: ", content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
print("Failed to verify answer: ", content)
logger.debug(f"Failed to verify answer {content} due to {e}")

reward = float(verify(answer_parsed, gold_parsed))
except Exception as e:
print("Failed to verify answer: ", content)
print(e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(e)

else:
# If the gold solution is not parseable, we reward 1 to skip this example
reward = 1.0
print("Failed to parse gold solution: ", sol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("Failed to parse gold solution: ", sol)
logger.debug("Failed to parse gold solution: ", sol)

@hynky1999
Copy link
Collaborator

I think it's no longer needed, because I added the wrapping try/except in math-verify to both parse and verify

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd disagree with this one. Rewarding 1.0 when the answer is not parsable will encourage to produce non-parsable solution.

My bad the try/except is around the verifier.

try:
# Reward 1 if the content is the same as the ground truth, 0 otherwise
reward = float(verify(answer_parsed, gold_parsed))
except Exception as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the exception type raised here? I'd go for a more specific catching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not frankly, since any error with math-verify is enough to cause a skip. If we wanted to catch specific exceptions and print something different each time, that's one thing, but if math-verify ever throws, the training run dies.

@ctjlewis
Copy link
Contributor Author

ctjlewis commented Feb 6, 2025

I think it's no longer needed, because I added the wrapping try/except in math-verify to both parse and verify

OK, so math-verify will never throw an exception?

I think we kinda should do it redundantly here anyway. A training run will call this thousands and thousands of times. If it ever throws, we want the run to be resistant to that.

The absolute ideal way would be to catch errors from within math-verify if any are thrown, and ignore them, but anything from dependencies should bubble up here so the training run doesn't just skip over 10,000 examples bc every call fails due to a system dependency not existing or something.

@hynky1999
Copy link
Collaborator

I think we kinda should do it redundantly here anyway. A training run will call this thousands and thousands of times. If it ever throws, we want the run to be resistant to that.

Yeah, I don't have a problem with that, feel free to put it there just to be sure, but the parse/verify method has exactly the same wrapper there now

@qgallouedec
Copy link
Member

eah, I don't have a problem with that, feel free to put it there just to be sure, but the parse/verify method has exactly the same wrapper there now

Indeed. Now if parsing fails it returns [], and if verifying "fails" it returns False. So I guess we don't need this precaution anymore right @ctjlewis?

@qgallouedec
Copy link
Member

@ctjlewis
Copy link
Contributor Author

ctjlewis commented Feb 6, 2025

OK, yeah, let's close this out, since that behavior will cover all the ground we need... Either we get a value back and can tell the verification process failed, or it threw for another reason (dependencies, system libs, whatever) and we explicitly do NOT want to ignore it (and do want training run to fail). Closing.

@qgallouedec Yeah I would close all 3 of those as fixed by #196 given this.

@ctjlewis ctjlewis closed this Feb 6, 2025
@ctjlewis
Copy link
Contributor Author

ctjlewis commented Feb 6, 2025

Sorry @hynky1999, I didn't see those logic changes, now that we can tell from return type whether it was verification error vs dependency exception this PR is redundant.

@qgallouedec qgallouedec mentioned this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants