Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports

This is the repository for Reportedly LLMs, which aims to build task-specific LLMs for medical proofreading.

This repository is temporarily for review purposes; we will release a published version later.

Overview

The overall workflow of Reportedly LLMs.

Figure 1 will be presented here after the manuscript is published.

Our work consists of four parts:

(1). Dataset Construction

(2). Model Development

(3). Evaluation

Dataset Construction

We constructed a dataset consisting of two parts.

The first part includes 1,656 synthetic radiology reports generated by GPT-4 using specified prompts, divided into 828 error-free synthetic reports and 828 synthetic reports with errors.

Please refer to Prompts_for_Synthetic.txt

The second part comprises 614 reports: 307 errorfree reports from the MIMIC-CXR database, and 307 corresponding synthetic reports with errors generated by GPT-4 based on these MIMIC-CXR reports and specified prompts.

Please refer to Prompts_for_MIMIC.txt

Model Development

We fine-tune our models using Firefly codes.

Please refer to Firefly(https://github.com/yangjianxin1/Firefly)

Llama-3-8B-Instruct and Llama-3-70B-Instruct are fine-tuned on the training set with the following hyperparameters:

Hyperparameter	Llama-3-8B-Instruct	Llama-3-70B-Instruct
Batch size	1	1
Learning rate	3e-4	3e-4
Epochs	3	3
Max length	512	512

Evaluation

We evaluated the performance of models such as Llama-3 and GPT-4 on the test set.

Please refer to demo.ipynb for the relevant code.

Authors

XXX

Citation

Please cite the repo if you use the data or code in this repository.

@misc{XXX2024llm,
  author = {XXX},
  title = {Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports},
  year = {2024},
  publisher = {XXX},
  journal = {XXX},
}

Acknowledgements

XXX

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
Prompts_for_MIMIC		Prompts_for_MIMIC
Prompts_for_Synthetic		Prompts_for_Synthetic
README.md		README.md
demo.ipynb		demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports

Overview

Dataset Construction

Model Development

Evaluation

Authors

Citation

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

bionlplab/llm4proofreading

Folders and files

Latest commit

History

Repository files navigation

Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports

Overview

Dataset Construction

Model Development

Evaluation

Authors

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages