Audio Question Answering (AQA)

PyTorch code accompanies our Interspeech 2023 paper:

Multi-Scale Attention for Audio Question Answering [arXiv]

Requirements

python3.6 +
pytorch1.6.0
tensorboardX
ffmpeg

Usage

Clone this repo
```
https://github.com/GeWu-Lab/MWAFM.git
```
Download data

Clotho-AQA and AQA-MUSIC-AVQA
Data pre-processing

We follow exact the same setting data format as MUSIC AVQA.

Notice: We examined the original annotation files of Clotho-AQA and found that the official open-source annotations were not cleansed, resulting in discrepancies where different annotators provided different answers for the same question. As a result, we performed a simple filtering process where we considered a question to have the correct answer if it had at least two identical answers Based on this filtering process, we obtained a new and more accurate annotation file. The files in 'metadata' folder are described as follows
- 'single_word_[train/val/test].csv', Does not contain samples with answers yes and no.
- 'single_word_[train/val/test]_clean.csv', Does not contain samples with answers yes and no. (Cleaned data)
- 'clotho_aqa_[train/val/test]_clean.csv', Contains samples with answers yes and no. (Cleaned data)
- 'binary_[train/val/test]_clean.csv', Include only samples with answers yes and no. (Cleaned data)

Train and evaluate

Training

python main_MWAFM.py --mode train

Testing

python main_MWAFM.py --mode test

Citation

If you find this work useful, please consider citing it.


@ARTICLE{Li2023MultiScale,
  title	= {Multi-Scale Attention for Audio Question Answering},
  author	= {Guangyao li, Yixin Xu, Di Hu},
  journal	= {Proc. INTERSPEECH},
  year	= {2023},
}

Acknowledgement

This research was supported by Public Computing Cloud, Renmin University of China.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Audio Question Answering (AQA)

Requirements

Usage

Citation

Acknowledgement

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Audio Question Answering (AQA)

Requirements

Usage

Citation

Acknowledgement