Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering paper

ICMR 2021 Best Poster Paper Award

This repository is the official implementation of CMSA-MTPT for the visual question answering task in medical domain. Our model achieved 56.1 for open-ended and 77.3 for close-end on VQA-RAD dataset. Up to 2021-5-28, the proposed models achieves the SOTA on the VQA-RAD dataset. For the detail, please refer to link.

The main contributer of this code is Guanqi Chen link. This repository is based on and inspired by @Jin-Hwa Kim's work and @Aizo-ai's work. We sincerely thank for their sharing of the codes.

Citation

Please cite this paper in your publications if it helps your research

@inproceedings{gong2021cross,
  author    = {Haifan Gong and
               Guanqi Chen and
               Sishuo Liu and
               Yizhou Yu and
               Guanbin Li},
  title     = {Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical
               Visual Question Answering},
  booktitle = {{ICMR} '21: International Conference on Multimedia Retrieval, Taipei,
               Taiwan, August 21-24, 2021},
  pages     = {456--460},
  publisher = {{ACM}},
  year      = {2021},
  doi       = {10.1145/3460426.3463584},
}

You may also cite this work if it helps your research

@article{gong2022vqamix,
  title={VQAMix: Conditional Triplet Mixup for Medical Visual Question Answering},
  author={Haifan Gong and Guanqi Chen and Mingzhi Mao and Zhen Li and Guanbin Li},
  journal={IEEE Trans. on Medical Imaging},
  year={2022}
}

Note: You should replace the original imagenet pretrained encoder with the multi-task pretrained encoder in the drive or trained by yourself !!!

Overview of the proposed medical VQA model. Our method consists of four components (with different colors in the figure): image feature extractor, question encoder, cross-modal self-attention (CMSA) module, and answer predictor.

Multi-Task Pre-Training: the model is jointly trained with an image understanding task and a questionimage compatibility task. Depending on the dataset-specific image understanding task, the decoder can be selected as a fully convolutional network or a fully connected network.

Prerequisites

torch 1.0.1 torchvision 0.4.0a0 numpy 1.19.1 cuda 9.1 gpu GTX1080

Dataset and Pre-trained Models

The processed data should be downloaded via link with the extract code: tkm8. The downloaded file should be extracted to data_RAD/ directory.

The pretrained models is available at Baidu Drive with extract code: 163k Or Google Drive.

The dataset for multi-task pretraining is available at Baidu Drive with extract code gow6 Or Google Drive

Training and Testing

Just run the train.sh and the test.sh for training and evaluation. The result json file can be found in the directory results/.

Comaprison with the sota

License

MIT License

More information

The up to date result could refer to https://github.com/haifangong/VQAMix If you have any problem, no hesitate contact us at haifangong@outlook.com HCP Lab Homepage: https://www.sysuhcp.com/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering paper

ICMR 2021 Best Poster Paper Award

Citation

Note: You should replace the original imagenet pretrained encoder with the multi-task pretrained encoder in the drive or trained by yourself !!!

Prerequisites

Dataset and Pre-trained Models

Training and Testing

Comaprison with the sota

License

More information

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering paper

ICMR 2021 Best Poster Paper Award

Citation

Note: You should replace the original imagenet pretrained encoder with the multi-task pretrained encoder in the drive or trained by yourself !!!

Prerequisites

Dataset and Pre-trained Models

Training and Testing

Comaprison with the sota

License

More information