Skip to content

WebUI #2

@James0618

Description

@James0618

WebUI for Reward Rule Definition

This project provides a modular, user-friendly WebUI for defining reward rules, inspired by LlamaFactory. Built with Gradio, it enables researchers and engineers to flexibly configure, test, and export reward evaluation logic for RLHF and related tasks.

Key Features

  • Modular Gradio WebUI:
    The interface is organized into five main tabs, each implemented as an independent module for easy maintenance and extension.

  • Flexible Rule Definition:
    Users can define various types of reward rules, including:

    • Thought Process
    • Answer Tag
    • Tool Tag
    • Custom Tag
      Each rule supports detailed requirement configuration, such as count limits, content length, and format constraints (JSON/XML), with customizable reward calculation modes and coefficients.
  • Dynamic Requirement Management:

    • Only one of each requirement type can be added per rule.
    • Requirement types already added are automatically removed from the dropdown.
    • Full support for adding, editing, and deleting requirements, with real-time UI updates.
  • Grader System:

    • Integrated grader registry for flexible extension.
    • Built-in graders (e.g., QwenMathGrader) and easy integration of custom graders.
    • Grader descriptions and real-time test interface.
  • Robust State & Error Handling:

    • Unique tag name validation.
    • Dynamic management of rule types and requirements.
    • Comprehensive error and state management for a smooth user experience.
  • Export & Integration:

    • One-click export of reward configuration as JSON or Python code for downstream use.

Typical Use Cases

  • RLHF reward rule design and rapid iteration
  • Custom evaluation logic for LLM outputs
  • Educational or research demonstration of reward shaping

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions