Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions examples/grpo_math/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
# Example: PPO on MATH dataset
# Example: GRPO on MATH dataset

This example shows the usage of [RM-Gallery](https://github.com/modelscope/RM-Gallery/) by running GRPO on a MATH dataset. You need to install RM-Gallery first.
The dataset is organized as:

```jsonl

{"question": "what is 2+2?", "gt_answer": 4}
{"question": "what is 2+3?", "gt_answer": 5}
```

This example shows the usage of PPO on the MATH dataset, adapted from [simpleRL](https://github.com/hkust-nlp/simpleRL-reason/tree/v0).

For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_reasoning_basic.md).

Expand Down
2 changes: 1 addition & 1 deletion trinity/common/rewards/reward_fn.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def _build_sample_from_experience(
]

sample = DataSample(
unique_id=experience.unique_id,
unique_id=experience.eid.uid,
input=to_rm_gallery_messages(messages),
output=output,
metadata=experience.info,
Expand Down