GitHub - rikeilong/CueNet: Official Implementation for "Cue-N: Cue-Aware Network for Audio-Visual Question Answering"

CueNet: A Cue-Aware Network for Audio-Visual Question Answering

Introduction

Audio-Visual Question Answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer. Existing methods mainly focus on querying relevant audiovisual regions using questions. However, the question itself contains very little information and fails to provide enough clues. In this paper, we use a visual language model to inject high-level visual knowledge and incorporate a Large Language Model (LLM) to expand the question to detailed cue knowledge. Then, we propose a Cue-Aware Network (Cue-N), which divides the recognition paradigm into two steps: 1) High-level semantic evidence generation via LLM. We connect visual knowledge and question content through prompt engineering to obtain summarized cues that contain "how many", "where", and "what" types of information. 2) Progressive incorporation of generated cue knowledge. We first design an association block that cooperated with contrastive learning to learn coordinated audiovisual pairs. Then, a cue aggregator is proposed to incorporate cue knowledge into the audio-visual features. Results on two publicly available datasets containing multiple question-and-answer pairs (i.e., Music-AVQA and AVQA) demonstrate the superiority of our Cue-N. During the experiment, one interesting finding behind is that removing deep audio-visual features during inference can effectively mitigate overfitting.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CueNet: A Cue-Aware Network for Audio-Visual Question Answering

Introduction

Qualitative Results

About

Releases

Packages

rikeilong/CueNet

Folders and files

Latest commit

History

Repository files navigation

CueNet: A Cue-Aware Network for Audio-Visual Question Answering

Introduction

Qualitative Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages