Skip to content

Commit c9107ba

Browse files
committed
added model submission instructions
1 parent e0b62e4 commit c9107ba

File tree

2 files changed

+31
-33
lines changed

2 files changed

+31
-33
lines changed

explore/submission_guideline.html

+25-27
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@
44
<head>
55
<meta charset="utf-8"/>
66
<title>
7-
ReXrank
7+
CRAFT-MD
88
</title>
9-
<meta name="description" content="ReXrank: The leading open-source leaderboard for radiology report generation. Compare and benchmark AI models for medical imaging reports."/>
10-
<meta name="keywords" content="ReXrank, radiology, report generation, AI, medical imaging, leaderboard, benchmarking"/>
11-
<meta property="og:title" content="ReXrank: Radiology Report Generation Leaderboard"/>
12-
<meta property="og:description" content="Open-source leaderboard for comparing and benchmarking AI models in radiology report generation."/>
13-
<meta property="og:url" content="https://rajpurkarlab.github.io/ReXrank/"/>
9+
<meta name="description" content="CRAFT-MD: A Comprehensive Benchmarking Framework for Conversational Reasoning Assessment in Medicine."/>
10+
<meta name="keywords" content="CRAFT-MD, conversational LLMs, evaluation, leaderboard, benchmarking"/>
11+
<meta property="og:title" content="CRAFT-MD: Conversational Reasoning Assessment Framework for Testing in Medicine"/>
12+
<meta property="og:description" content="A Comprehensive Benchmarking Framework for Conversational Reasoning Assessment in Medicine."/>
13+
<meta property="og:url" content="https://rajpurkarlab.github.io/craft-md-pages/"/>
1414
<meta property="og:type" content="website"/>
15-
<meta content="ReXrank is an open-source leaderboard for AI-powered radiology report generation from chest x-ray images." name="description"/>
15+
<meta content="CRAFT-MD is a comprehensive multi-agent benchmarking framework for conversational reasoning assessment in Medicine." name="description"/>
1616
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
1717
<meta content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no" name="viewport"/>
1818
<meta content="../logo.png" property="og:image"/>
@@ -189,7 +189,7 @@
189189
</div>
190190
<div class="leftNav">
191191
<div class="brandDiv">
192-
<a class="navbar-brand" href="../">ReXrank</a>
192+
<a class="navbar-brand" href="../">CRAFT-MD</a>
193193
</div>
194194
</div>
195195
</div>
@@ -198,8 +198,8 @@
198198
<div class="container">
199199
<div class="row">
200200
<div class="col-md-12">
201-
<h1 id="appTitle">ReXrank</h1>
202-
<h2 id="appSubtitle">Chest X-ray Report Generation Leaderboard</h2>
201+
<h1 id="appTitle">CRAFT-MD</h1>
202+
<h2 id="appSubtitle">A Comprehensive Benchmarking Framework for Conversational Reasoning Assessment in Medicine.</h2>
203203
<!-- https://github.com/rajpurkarlab/ReXrank/blob/main/example_files/submission_tutorial.md -->
204204
<!-- <h3 id="helpLink"><a href="./explore/submission_guideline.html" target="_blank" rel="noopener noreferrer">⭐@Researchers: Submit to ReXrank Now!</a></h3> -->
205205
<!-- <h3 id="helpLink"><a href="https://forms.office.com/r/1wDsy2MmAM" target="_blank" rel="noopener noreferrer">⭐@Researchers: Submit to ReXrank Now!</a></h3> -->
@@ -215,43 +215,41 @@ <h2 id="appSubtitle">Chest X-ray Report Generation Leaderboard</h2>
215215
<div class="infoCard">
216216
<div class="infoBody">
217217
<div class="infoHeadline">
218-
<h2>ReXrank Submission Guideline (Round 1)</h2>
218+
<h2>CRAFT-MD Submission Guideline (Round 1)</h2>
219219
</div>
220220
<p>
221-
The ReXrank Challenge is a competition in chest X-ray report generation leveraging ReXGradient, our comprehensive multi-institutional dataset, running from December 1, 2024, to March 1, 2025.
221+
The CRAFT-MD Challenge is a competition to evaluate the clinical reasoning ability of LLMs, running from January 2, 2025, to April 2, 2025.
222222
The competition welcomes participation from academic institutions, industry professionals, and independent researchers worldwide.
223-
We will evaluate the performance of the models from multiple critical dimensions, including clinical accuracy and generalization capability across diverse institutions.
224-
A panel of distinguished radiologists will conduct thorough assessments of the highest-performing models.
225-
Top-performing participants will be invited to collaborate on <b>future research initiatives and model development</b>.
223+
Are you developing a new model that can beat the existing LLMs in clinical conversational reasoning? Submit your model to the CRAFT-MD leaderboard now!
226224
</p>
227225

228-
<div class="infoHeadline">
226+
<!-- <div class="infoHeadline">
229227
<h2>Getting Started</h2>
230-
</div>
231-
<p>
228+
</div> -->
229+
<!-- <p>
232230
To evaluate your models, we made available the <a href="https://github.com/rajpurkarlab/ReXrank/blob/main/example_files/evaluation_script.md", target="_blank">evaluation script</a> we will use for official evaluation, along with a sample prediction file that the script will take as input.
233231
To run the evaluation, use
234232
<code>python evaluate.py &lt;path_to_data&gt; &lt;path_to_predictions&gt;</code>.
235-
</p>
233+
</p> -->
236234

237235
<div class="infoHeadline">
238-
<h2>Submission Guidelines</h2>
236+
<h2>Model Submission Guidelines</h2>
239237
</div>
240-
<p><b>1. Evaluating on the MIMIC-CXR Test Set</b></p>
241-
<p>To achieve a consistent score with our leaderboard, please use the official MIMIC-CXR test split. You can download the file from <a href="https://physionet.org/content/mimic-cxr/2.0.0/", target="_blank">here</a>. We evaluate at the study level. If the submitted model can input multiple images, we will input all images of a study. If the submitted model includes only one image, we will default to using the frontal image. We also include context information like patient age, patient gender, indication and comparison. When submission, you can select if you are going to use this info.</p>
238+
<!-- <p><b>1. Evaluating on the MIMIC-CXR Test Set</b></p>
239+
<p>To achieve a consistent score with our leaderboard, please use the official MIMIC-CXR test split. You can download the file from <a href="https://physionet.org/content/mimic-cxr/2.0.0/", target="_blank">here</a>. We evaluate at the study level. If the submitted model can input multiple images, we will input all images of a study. If the submitted model includes only one image, we will default to using the frontal image. We also include context information like patient age, patient gender, indication and comparison. When submission, you can select if you are going to use this info.</p> -->
242240

243-
<p><b>2. Model Submission</b></p>
244-
<p>Your model submission should include the following:</p>
241+
<!-- <p><b>2. Model Submission</b></p> -->
242+
<p>Models can be submitted via <a href="mailto:sjohri@g.harvard.edu" target="_blank">email</a>. Your model submission should include the following:</p>
245243
<ol>
246244
<li><b>Model Description:</b> This description identifies your submission on the leaderboard: Name of the model, Institution, Paper link, Code link, Year.</li>
247245
<li><b>Conda Environment File:</b> Include the <code>environment.yaml</code> file support <code>conda install</code>.</li>
248-
<li><b>Inference Script:</b> The model should support the command: <code>python inference.py &lt;input_json_file&gt; &lt;output_json_file&gt; &lt;img_root_dir&gt;</code> We provide an <a href="https://github.com/xiaoman-zhang/ReXrank/blob/gh-pages/example_files/merversa_inference.py", target="_blank">example</a> of <a href="https://huggingface.co/hyzhou/MedVersa_Internal", target="_blank">MedVersa</a> for understanding our requirements.</li>
249-
<li><b>Evaluation Result:</b> Include the evaluation result on the MIMIC-CXR test set. We will contact you if we are unable to replicate these results.</li>
246+
<!-- <li><b>Inference Script:</b> The model should support the command: <code>python inference.py &lt;input_json_file&gt; &lt;output_json_file&gt; &lt;img_root_dir&gt;</code> We provide an <a href="https://github.com/xiaoman-zhang/ReXrank/blob/gh-pages/example_files/merversa_inference.py", target="_blank">example</a> of <a href="https://huggingface.co/hyzhou/MedVersa_Internal", target="_blank">MedVersa</a> for understanding our requirements.</li> -->
247+
<!-- <li><b>Evaluation Result:</b> Include the evaluation result on the MIMIC-CXR test set. We will contact you if we are unable to replicate these results.</li> -->
250248
</ol>
251249

252250
<div class="infoHeadline"></div>
253251
<p>
254-
Any questions or concerns? Please reach out to us with <a href="mailto:xiaomanzhang.zxm@gmail.com" target="_blank">email</a>.
252+
Any questions or concerns? Please reach out to us with <a href="mailto:sjohri@g.harvard.edu" target="_blank">email</a>.
255253
</p>
256254
</div>
257255
</div>

index.html

+6-6
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
<meta name="keywords" content="CRAFT-MD, conversational LLMs, evaluation, leaderboard, benchmarking"/>
1212
<meta property="og:title" content="CRAFT-MD: Conversational Reasoning Assessment Framework for Testing in Medicine"/>
1313
<meta property="og:description" content="A Comprehensive Benchmarking Framework for Conversational Reasoning Assessment in Medicine."/>
14-
<meta property="og:url" content="https://rajpurkarlab.github.io/CRAFT-MD/"/>
14+
<meta property="og:url" content="https://rajpurkarlab.github.io/craft-md-pages/"/>
1515
<meta property="og:type" content="website"/>
1616
<html lang="en"></html>
1717
<meta content="CRAFT-MD is a comprehensive multi-agent benchmarking framework for conversational reasoning assessment in Medicine." name="description"/>
@@ -116,7 +116,7 @@ <h2 id="appSubtitle">A Conversational Reasoning Assessment Framework for Testing
116116
<div class="cover" id="contentCover">
117117
<div class="container">
118118
<div class="row">
119-
<div class="col-md-7">
119+
<div class="col-md-6">
120120
<div class="infoCard">
121121
<div class="infoBody justified-text">
122122
<div class="infoHeadline">
@@ -127,9 +127,9 @@ <h2>What is CRAFT-MD?</h2>
127127
<p> It simulates doctor-patient interactions, where the clinical-LLM's performance in gathering medical histories, synthesizing information, and forming accurate diagnoses is assessed by a multi-agent setup involving a patient-AI, a grader-AI, and medical experts who validate the results. </p>
128128
<p> CRAFT-MD is designed to be flexible and scalable, allowing for the integration of new datasets and the evaluation of emerging models. </p>
129129
<!-- <p style="margin-bottom: 20px;"><b>Join us</b> in shaping the future of AI-assisted radiology. Develop your models, submit your results, and see how you stack up against the best in the field. Together, we can push the boundaries of what's possible in medical imaging and report generation. </p> -->
130-
<!-- <p style="margin-bottom: 20px;">
130+
<p style="margin-bottom: 20px;">
131131
<strong><a href="./explore/submission_guideline.html">Submit your models</a> for evaluation with CRAFT-MD</strong>.
132-
</p> -->
132+
</p>
133133
</p>
134134
<!-- <p>
135135
<span>⭐ <b>News!</b></span> Click <a href="./explore/vote_example.html">here</a> to vote for the models!
@@ -169,13 +169,13 @@ <h2>Getting Started</h2>
169169
</div>
170170
</div>
171171
</div> -->
172-
<div class="col-md-5">
172+
<div class="col-md-6">
173173
<div class="infoCard">
174174
<div class="infoBody">
175175
<div class="infoHeadline">
176176
<h2>Leaderboard Overview</h2>
177177
</div>
178-
<p></p>
178+
<p> The CRAFT-MD leaderboard ranks models based on their performance in Multi-turn Conversations in free response question (FRQ) setting.</p>
179179
<div class="fixed-height-table">
180180
<table class="table performanceTable">
181181
<thead>

0 commit comments

Comments
 (0)