diff --git a/README.md b/README.md index 66c7b07..0de0862 100644 --- a/README.md +++ b/README.md @@ -364,6 +364,13 @@ This is a curated list of audio-visual learning methods and datasets, based on o
**Institution:** The Chinese University of Hong Kong +**[InterSpeech-2024]** +[LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition](https://arxiv.org/abs/2406.04432) +
+**Authors:** Sreyan Ghosh, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi, Ramani Duraiswami, Dinesh Manocha +
+**Institution:** University of Maryland, College Park, USA + #### Speaker Recognition **[MTA-2016]** @@ -1470,6 +1477,13 @@ Chenqi Kong, Baoliang Chen, Wenhan Yang, Haoliang Li, Peilin Chen, Shiqi Wang
**Institution:** Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford, U.K. +**[CVPR-2024]** +[AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection](https://arxiv.org/abs/2406.02951) +
+**Authors:** Trevine Oorloff, Surya Koppisetti, Nicolò Bonettini, Divyaraj Solanki, Ben Colman, Yaser Yacoob, Ali Shahriyari, Gaurav Bharaj +
+**Institution:** University of Maryland - College Park; Reality Defender Inc. + ## Cross-modal Perception ### Cross-modal Generation