-
Notifications
You must be signed in to change notification settings - Fork 153
Open
Labels
enhancementNew feature or requestNew feature or request
Description
WebUI for Reward Rule Definition
This project provides a modular, user-friendly WebUI for defining reward rules, inspired by LlamaFactory. Built with Gradio, it enables researchers and engineers to flexibly configure, test, and export reward evaluation logic for RLHF and related tasks.
Key Features
-
Modular Gradio WebUI:
The interface is organized into five main tabs, each implemented as an independent module for easy maintenance and extension. -
Flexible Rule Definition:
Users can define various types of reward rules, including:- Thought Process
- Answer Tag
- Tool Tag
- Custom Tag
Each rule supports detailed requirement configuration, such as count limits, content length, and format constraints (JSON/XML), with customizable reward calculation modes and coefficients.
-
Dynamic Requirement Management:
- Only one of each requirement type can be added per rule.
- Requirement types already added are automatically removed from the dropdown.
- Full support for adding, editing, and deleting requirements, with real-time UI updates.
-
Grader System:
- Integrated grader registry for flexible extension.
- Built-in graders (e.g., QwenMathGrader) and easy integration of custom graders.
- Grader descriptions and real-time test interface.
-
Robust State & Error Handling:
- Unique tag name validation.
- Dynamic management of rule types and requirements.
- Comprehensive error and state management for a smooth user experience.
-
Export & Integration:
- One-click export of reward configuration as JSON or Python code for downstream use.
Typical Use Cases
- RLHF reward rule design and rapid iteration
- Custom evaluation logic for LLM outputs
- Educational or research demonstration of reward shaping
James0618 and HuYunhai-Alex
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request