fix: work around exceptions from Math-Verify #158

ctjlewis · 2025-02-02T08:14:05Z

lewtun

Thanks a lot for the error handling @ctjlewis ! Looks good to me with some nits, but I'll let @hynky1999 comment as well since he's the creator of math-verify

lewtun · 2025-02-06T12:30:57Z

src/open_r1/grpo.py

+                # Reward 1 if the content is the same as the ground truth, 0 otherwise
+                reward = float(verify(answer_parsed, gold_parsed))
+            except Exception as e:
+                print("Failed to verify answer: ", content)


nit:

Suggested change

print("Failed to verify answer: ", content)

logger.debug(f"Failed to verify answer {content} due to {e}")

lewtun · 2025-02-06T12:31:58Z

src/open_r1/grpo.py

+                reward = float(verify(answer_parsed, gold_parsed))
+            except Exception as e:
+                print("Failed to verify answer: ", content)
+                print(e)


Suggested change

print(e)

lewtun · 2025-02-06T12:32:16Z

src/open_r1/grpo.py

        else:
-            # If the gold solution is not parseable, we reward 1 to skip this example
-            reward = 1.0
            print("Failed to parse gold solution: ", sol)


Suggested change

print("Failed to parse gold solution: ", sol)

logger.debug("Failed to parse gold solution: ", sol)

hynky1999 · 2025-02-06T13:48:02Z

I think it's no longer needed, because I added the wrapping try/except in math-verify to both parse and verify

qgallouedec

~~I'd disagree with this one. Rewarding 1.0 when the answer is not parsable will encourage to produce non-parsable solution.~~

My bad the try/except is around the verifier.

qgallouedec · 2025-02-06T14:16:13Z

src/open_r1/grpo.py

+            try:
+                # Reward 1 if the content is the same as the ground truth, 0 otherwise
+                reward = float(verify(answer_parsed, gold_parsed))
+            except Exception as e:


What's the exception type raised here? I'd go for a more specific catching

We should not frankly, since any error with math-verify is enough to cause a skip. If we wanted to catch specific exceptions and print something different each time, that's one thing, but if math-verify ever throws, the training run dies.

ctjlewis · 2025-02-06T16:18:10Z

I think it's no longer needed, because I added the wrapping try/except in math-verify to both parse and verify

OK, so math-verify will never throw an exception?

I think we kinda should do it redundantly here anyway. A training run will call this thousands and thousands of times. If it ever throws, we want the run to be resistant to that.

The absolute ideal way would be to catch errors from within math-verify if any are thrown, and ignore them, but anything from dependencies should bubble up here so the training run doesn't just skip over 10,000 examples bc every call fails due to a system dependency not existing or something.

hynky1999 · 2025-02-06T17:24:56Z

I think we kinda should do it redundantly here anyway. A training run will call this thousands and thousands of times. If it ever throws, we want the run to be resistant to that.

Yeah, I don't have a problem with that, feel free to put it there just to be sure, but the parse/verify method has exactly the same wrapper there now

qgallouedec · 2025-02-06T17:40:46Z

eah, I don't have a problem with that, feel free to put it there just to be sure, but the parse/verify method has exactly the same wrapper there now

Indeed. Now if parsing fails it returns [], and if verifying "fails" it returns False. So I guess we don't need this precaution anymore right @ctjlewis?

qgallouedec · 2025-02-06T17:42:44Z

see huggingface/Math-Verify#6

fixes GRPO training error related to sympy/math expressions #168

fixes Error: "Can only compare inequalities with Expr" when using zero2.yaml to grpo.py #186

fixes GRPO issue #152

These should be fixed by #196 which bumps the version of math-verify

ctjlewis · 2025-02-06T19:34:52Z

OK, yeah, let's close this out, since that behavior will cover all the ground we need... Either we get a value back and can tell the verification process failed, or it threw for another reason (dependencies, system libs, whatever) and we explicitly do NOT want to ignore it (and do want training run to fail). Closing.

@qgallouedec Yeah I would close all 3 of those as fixed by #196 given this.

ctjlewis · 2025-02-06T19:37:19Z

Sorry @hynky1999, I didn't see those logic changes, now that we can tell from return type whether it was verification error vs dependency exception this PR is redundant.

fix: work around exceptions from Math-Verify

58473a9

see huggingface/Math-Verify#6

This was referenced Feb 2, 2025

Should skips really get reward 1? #159

Closed

Refactoring reward functions. Adding step by step reasoning reward. Adding test coverage for reward functions #144

Merged

Some-random mentioned this pull request Feb 3, 2025

Invalidate trace cache @ step 0 and module 740: cache has only 0 modules #119

Closed

lewtun requested review from hynky1999 and qgallouedec February 6, 2025 12:29

lewtun reviewed Feb 6, 2025

View reviewed changes

qgallouedec requested changes Feb 6, 2025

View reviewed changes

qgallouedec reviewed Feb 6, 2025

View reviewed changes

Merge branch 'main' into patch-2

b96b874

ctjlewis closed this Feb 6, 2025

qgallouedec mentioned this pull request Feb 7, 2025

Update grpo.py #134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: work around exceptions from Math-Verify #158

fix: work around exceptions from Math-Verify #158

ctjlewis commented Feb 2, 2025 •

edited

Loading

lewtun left a comment

lewtun Feb 6, 2025

lewtun Feb 6, 2025

lewtun Feb 6, 2025

hynky1999 commented Feb 6, 2025

qgallouedec left a comment •

edited

Loading

qgallouedec Feb 6, 2025

ctjlewis Feb 6, 2025

ctjlewis commented Feb 6, 2025 •

edited

Loading

hynky1999 commented Feb 6, 2025

qgallouedec commented Feb 6, 2025

qgallouedec commented Feb 6, 2025

ctjlewis commented Feb 6, 2025 •

edited

Loading

ctjlewis commented Feb 6, 2025

	print("Failed to verify answer: ", content)
	logger.debug(f"Failed to verify answer {content} due to {e}")

	print("Failed to parse gold solution: ", sol)
	logger.debug("Failed to parse gold solution: ", sol)

fix: work around exceptions from Math-Verify #158

fix: work around exceptions from Math-Verify #158

Conversation

ctjlewis commented Feb 2, 2025 • edited Loading

lewtun left a comment

Choose a reason for hiding this comment

lewtun Feb 6, 2025

Choose a reason for hiding this comment

lewtun Feb 6, 2025

Choose a reason for hiding this comment

lewtun Feb 6, 2025

Choose a reason for hiding this comment

hynky1999 commented Feb 6, 2025

qgallouedec left a comment • edited Loading

Choose a reason for hiding this comment

qgallouedec Feb 6, 2025

Choose a reason for hiding this comment

ctjlewis Feb 6, 2025

Choose a reason for hiding this comment

ctjlewis commented Feb 6, 2025 • edited Loading

hynky1999 commented Feb 6, 2025

qgallouedec commented Feb 6, 2025

qgallouedec commented Feb 6, 2025

ctjlewis commented Feb 6, 2025 • edited Loading

ctjlewis commented Feb 6, 2025

ctjlewis commented Feb 2, 2025 •

edited

Loading

qgallouedec left a comment •

edited

Loading

ctjlewis commented Feb 6, 2025 •

edited

Loading

ctjlewis commented Feb 6, 2025 •

edited

Loading