The following repository contains code for the paper 'TM-PATHVQA: 90000+ Textless Multilingual Questions for Medical Visual Question Answering' which has been accepted in Interspeech 2024.
The paper explores a novel VQA dataset in healthcare and medical diagnostics. Current text-based VQA systems limit their utility in scenarios where hands-free interaction and accessibility are crucial while performing tasks. A speech-based VQA system may provide a better means of interaction where information can be accessed while performing tasks simultaneously. To this end, this work implements a speech-based VQA system by introducing a Textless Multilingual Pathological VQA (TM-PathVQA) dataset, an expansion of the PathVQA dataset
The dataset can be accessed by the following link:
- Python>=3.8
- torch>1.6