-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: work around exceptions from Math-Verify #158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the error handling @ctjlewis ! Looks good to me with some nits, but I'll let @hynky1999 comment as well since he's the creator of math-verify
# Reward 1 if the content is the same as the ground truth, 0 otherwise | ||
reward = float(verify(answer_parsed, gold_parsed)) | ||
except Exception as e: | ||
print("Failed to verify answer: ", content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
print("Failed to verify answer: ", content) | |
logger.debug(f"Failed to verify answer {content} due to {e}") |
reward = float(verify(answer_parsed, gold_parsed)) | ||
except Exception as e: | ||
print("Failed to verify answer: ", content) | ||
print(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(e) |
else: | ||
# If the gold solution is not parseable, we reward 1 to skip this example | ||
reward = 1.0 | ||
print("Failed to parse gold solution: ", sol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("Failed to parse gold solution: ", sol) | |
logger.debug("Failed to parse gold solution: ", sol) |
I think it's no longer needed, because I added the wrapping try/except in math-verify to both parse and verify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd disagree with this one. Rewarding 1.0 when the answer is not parsable will encourage to produce non-parsable solution.
My bad the try/except is around the verifier.
try: | ||
# Reward 1 if the content is the same as the ground truth, 0 otherwise | ||
reward = float(verify(answer_parsed, gold_parsed)) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the exception type raised here? I'd go for a more specific catching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not frankly, since any error with math-verify is enough to cause a skip. If we wanted to catch specific exceptions and print something different each time, that's one thing, but if math-verify ever throws, the training run dies.
OK, so math-verify will never throw an exception? I think we kinda should do it redundantly here anyway. A training run will call this thousands and thousands of times. If it ever throws, we want the run to be resistant to that. The absolute ideal way would be to catch errors from within math-verify if any are thrown, and ignore them, but anything from dependencies should bubble up here so the training run doesn't just skip over 10,000 examples bc every call fails due to a system dependency not existing or something. |
Yeah, I don't have a problem with that, feel free to put it there just to be sure, but the parse/verify method has exactly the same wrapper there now |
Indeed. Now if parsing fails it returns |
These should be fixed by #196 which bumps the version of math-verify |
OK, yeah, let's close this out, since that behavior will cover all the ground we need... Either we get a value back and can tell the verification process failed, or it threw for another reason (dependencies, system libs, whatever) and we explicitly do NOT want to ignore it (and do want training run to fail). Closing. @qgallouedec Yeah I would close all 3 of those as fixed by #196 given this. |
Sorry @hynky1999, I didn't see those logic changes, now that we can tell from return type whether it was verification error vs dependency exception this PR is redundant. |
see huggingface/Math-Verify#6