You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: explore/submission_guideline.html
+25-27
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,15 @@
4
4
<head>
5
5
<metacharset="utf-8"/>
6
6
<title>
7
-
ReXrank
7
+
CRAFT-MD
8
8
</title>
9
-
<metaname="description" content="ReXrank: The leading open-source leaderboard for radiology report generation. Compare and benchmark AI models for medical imaging reports."/>
10
-
<metaname="keywords" content="ReXrank, radiology, report generation, AI, medical imaging, leaderboard, benchmarking"/>
<metacontent="ReXrank is an open-source leaderboard for AI-powered radiology report generation from chest x-ray images." name="description"/>
15
+
<metacontent="CRAFT-MD is a comprehensive multi-agent benchmarking framework for conversational reasoning assessment in Medicine." name="description"/>
The ReXrank Challenge is a competition in chest X-ray report generation leveraging ReXGradient, our comprehensive multi-institutional dataset, running from December 1, 2024, to March 1, 2025.
221
+
The CRAFT-MD Challenge is a competition to evaluate the clinical reasoning ability of LLMs, running from January 2, 2025, to April 2, 2025.
222
222
The competition welcomes participation from academic institutions, industry professionals, and independent researchers worldwide.
223
-
We will evaluate the performance of the models from multiple critical dimensions, including clinical accuracy and generalization capability across diverse institutions.
224
-
A panel of distinguished radiologists will conduct thorough assessments of the highest-performing models.
225
-
Top-performing participants will be invited to collaborate on <b>future research initiatives and model development</b>.
223
+
Are you developing a new model that can beat the existing LLMs in clinical conversational reasoning? Submit your model to the CRAFT-MD leaderboard now!
226
224
</p>
227
225
228
-
<divclass="infoHeadline">
226
+
<!-- <div class="infoHeadline">
229
227
<h2>Getting Started</h2>
230
-
</div>
231
-
<p>
228
+
</div> -->
229
+
<!-- <p>
232
230
To evaluate your models, we made available the <a href="https://github.com/rajpurkarlab/ReXrank/blob/main/example_files/evaluation_script.md", target="_blank">evaluation script</a> we will use for official evaluation, along with a sample prediction file that the script will take as input.
<p><b>1. Evaluating on the MIMIC-CXR Test Set</b></p>
241
-
<p>To achieve a consistent score with our leaderboard, please use the official MIMIC-CXR test split. You can download the file from <ahref="https://physionet.org/content/mimic-cxr/2.0.0/",target="_blank">here</a>. We evaluate at the study level. If the submitted model can input multiple images, we will input all images of a study. If the submitted model includes only one image, we will default to using the frontal image. We also include context information like patient age, patient gender, indication and comparison. When submission, you can select if you are going to use this info.</p>
238
+
<!-- <p><b>1. Evaluating on the MIMIC-CXR Test Set</b></p>
239
+
<p>To achieve a consistent score with our leaderboard, please use the official MIMIC-CXR test split. You can download the file from <a href="https://physionet.org/content/mimic-cxr/2.0.0/", target="_blank">here</a>. We evaluate at the study level. If the submitted model can input multiple images, we will input all images of a study. If the submitted model includes only one image, we will default to using the frontal image. We also include context information like patient age, patient gender, indication and comparison. When submission, you can select if you are going to use this info.</p> -->
242
240
243
-
<p><b>2. Model Submission</b></p>
244
-
<p>Your model submission should include the following:</p>
241
+
<!-- <p><b>2. Model Submission</b></p> -->
242
+
<p>Models can be submitted via <ahref="mailto:sjohri@g.harvard.edu" target="_blank">email</a>. Your model submission should include the following:</p>
245
243
<ol>
246
244
<li><b>Model Description:</b> This description identifies your submission on the leaderboard: Name of the model, Institution, Paper link, Code link, Year.</li>
247
245
<li><b>Conda Environment File:</b> Include the <code>environment.yaml</code> file support <code>conda install</code>.</li>
248
-
<li><b>Inference Script:</b> The model should support the command: <code>python inference.py <input_json_file> <output_json_file> <img_root_dir></code> We provide an <ahref="https://github.com/xiaoman-zhang/ReXrank/blob/gh-pages/example_files/merversa_inference.py",target="_blank">example</a> of <ahref="https://huggingface.co/hyzhou/MedVersa_Internal",target="_blank">MedVersa</a> for understanding our requirements.</li>
249
-
<li><b>Evaluation Result:</b> Include the evaluation result on the MIMIC-CXR test set. We will contact you if we are unable to replicate these results.</li>
246
+
<!-- <li><b>Inference Script:</b> The model should support the command: <code>python inference.py <input_json_file> <output_json_file> <img_root_dir></code> We provide an <a href="https://github.com/xiaoman-zhang/ReXrank/blob/gh-pages/example_files/merversa_inference.py", target="_blank">example</a> of <a href="https://huggingface.co/hyzhou/MedVersa_Internal", target="_blank">MedVersa</a> for understanding our requirements.</li> -->
247
+
<!-- <li><b>Evaluation Result:</b> Include the evaluation result on the MIMIC-CXR test set. We will contact you if we are unable to replicate these results.</li> -->
250
248
</ol>
251
249
252
250
<divclass="infoHeadline"></div>
253
251
<p>
254
-
Any questions or concerns? Please reach out to us with <ahref="mailto:xiaomanzhang.zxm@gmail.com" target="_blank">email</a>.
252
+
Any questions or concerns? Please reach out to us with <ahref="mailto:sjohri@g.harvard.edu" target="_blank">email</a>.
<metacontent="CRAFT-MD is a comprehensive multi-agent benchmarking framework for conversational reasoning assessment in Medicine." name="description"/>
<p> It simulates doctor-patient interactions, where the clinical-LLM's performance in gathering medical histories, synthesizing information, and forming accurate diagnoses is assessed by a multi-agent setup involving a patient-AI, a grader-AI, and medical experts who validate the results. </p>
128
128
<p> CRAFT-MD is designed to be flexible and scalable, allowing for the integration of new datasets and the evaluation of emerging models. </p>
129
129
<!-- <p style="margin-bottom: 20px;"><b>Join us</b> in shaping the future of AI-assisted radiology. Develop your models, submit your results, and see how you stack up against the best in the field. Together, we can push the boundaries of what's possible in medical imaging and report generation. </p> -->
130
-
<!-- <p style="margin-bottom: 20px;">
130
+
<pstyle="margin-bottom: 20px;">
131
131
<strong> ⭐ <ahref="./explore/submission_guideline.html">Submit your models</a> for evaluation with CRAFT-MD</strong>.
132
-
</p> -->
132
+
</p>
133
133
</p>
134
134
<!-- <p>
135
135
<span>⭐ <b>News!</b></span> Click <a href="./explore/vote_example.html">here</a> to vote for the models!
@@ -169,13 +169,13 @@ <h2>Getting Started</h2>
169
169
</div>
170
170
</div>
171
171
</div> -->
172
-
<divclass="col-md-5">
172
+
<divclass="col-md-6">
173
173
<divclass="infoCard">
174
174
<divclass="infoBody">
175
175
<divclass="infoHeadline">
176
176
<h2>Leaderboard Overview</h2>
177
177
</div>
178
-
<p></p>
178
+
<p> The CRAFT-MD leaderboard ranks models based on their performance in Multi-turn Conversations in free response question (FRQ) setting.</p>
0 commit comments