Skip to content

Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports

License

Notifications You must be signed in to change notification settings

bionlplab/llm4proofreading

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports

This is the repository for Reportedly LLMs, which aims to build task-specific LLMs for medical proofreading.

This repository is temporarily for review purposes; we will release a published version later.

Overview

The overall workflow of Reportedly LLMs.

Figure 1 will be presented here after the manuscript is published.

Our work consists of four parts:

(1). Dataset Construction

(2). Model Development

(3). Evaluation

Dataset Construction

We constructed a dataset consisting of two parts.

The first part includes 1,656 synthetic radiology reports generated by GPT-4 using specified prompts, divided into 828 error-free synthetic reports and 828 synthetic reports with errors.

Please refer to Prompts_for_Synthetic.txt

The second part comprises 614 reports: 307 errorfree reports from the MIMIC-CXR database, and 307 corresponding synthetic reports with errors generated by GPT-4 based on these MIMIC-CXR reports and specified prompts.

Please refer to Prompts_for_MIMIC.txt

Model Development

We fine-tune our models using Firefly codes.

Please refer to Firefly(https://github.com/yangjianxin1/Firefly)

Llama-3-8B-Instruct and Llama-3-70B-Instruct are fine-tuned on the training set with the following hyperparameters:

Hyperparameter Llama-3-8B-Instruct Llama-3-70B-Instruct
Batch size 1 1
Learning rate 3e-4 3e-4
Epochs 3 3
Max length 512 512

Evaluation

We evaluated the performance of models such as Llama-3 and GPT-4 on the test set.

Please refer to demo.ipynb for the relevant code.

Authors

XXX

Citation

Please cite the repo if you use the data or code in this repository.

@misc{XXX2024llm,
  author = {XXX},
  title = {Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports},
  year = {2024},
  publisher = {XXX},
  journal = {XXX},
}

Acknowledgements

XXX

About

Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published