Skip to content

Small updates to rft healthbench #1858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,12 @@
"source": [
"# Reinforcement Fine-Tuning with the OpenAI API for Conversational Reasoning\n",
"\n",
"*This guide is for developers and ML practitioners who have some experience with OpenAIʼs APIs and wish to use their fine-tuned models for research or other appropriate uses. OpenAI’s services are not intended for the personalized treatment or diagnosis of any medical condition and are subject to our [applicable terms](https://openai.com/policies/).*\n",
"\n",
"This notebook demonstrates how to use OpenAI's reinforcement fine-tuning (RFT) to improve a model's conversational reasoning capabilities (specifically asking questions to gain additional context and reduce uncertainty). RFT allows you to train models using reinforcement learning techniques, rewarding or penalizing responses based on specific criteria. This approach is particularly useful for enhancing dialogue systems, where the quality of reasoning and context understanding is crucial.\n",
"\n",
"For a deep dive into the Reinforcement Fine-Tuning API and how to write effective graders, see [Exploring Model Graders for Reinforcement Fine-Tuning](https://cookbook.openai.com/examples/reinforcement_fine_tuning).\n",
"\n",
"### HealthBench\n",
"\n",
"This cookbook evaluates and improves model performance on a focused subset of [HealthBench](https://openai.com/index/healthbench/), a benchmark suite for medical QA. This guide walks through how to configure the datasets, define evaluation rubrics, and fine-tune model behavior using reinforcement signals derived from custom graders.\n",
Expand Down
2 changes: 1 addition & 1 deletion registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
- fine-tuning
- reinforcement-learning-graders

- title: Reinforcement Fine-tuning with the OpenAI API
- title: Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
path: examples/fine-tuned_qa/reinforcement_finetuning_healthbench.ipynb
date: 2025-05-21
authors:
Expand Down