This repository includes PubMedCLIP, the fine-tuned version of CLIP with ROCO image--caption pairs. We also provide the pipelines for encorporating PubMedCLIP as the alternative pre-trained visual encoder in MEVF and QCR medical visual question answering pipelines. Our experiments illustrate that PubMedCLIP results in up tp 3% improvement in the medical visual question answering.
If you use this work in academic publication, please cite the paper by Sedigheh Eslami, Christoph Meinel, and Gerard de Melo.
BibTeX entry:
@inproceedings{eslami2023pubmedclip,
title={PubMedCLIP: How Much Does CLIP Benefit Visual Question Answering in the Medical Domain?},
author={Eslami, Sedigheh and Meinel, Christoph and De Melo, Gerard},
booktitle={Findings of the Association for Computational Linguistics: EACL 2023},
pages={1151--1163},
year={2023}
}