Skip to content

Latest commit

 

History

History
10 lines (8 loc) · 886 Bytes

README.md

File metadata and controls

10 lines (8 loc) · 886 Bytes

DyViSE

model: models/ECAPA_lip/ecapa_tdnn_lip.py

loss: losses/angleproto.py

Reference

  • E. Z. Xu, Z. Song, C. Feng, M. Ye, and M. Z. Shou, “AVA-AVD: Audio-visual speaker diarization in the wild,” CoRR, vol. abs/2111.14448, 2021
  • R. Gao and K. Grauman, “Visualvoice: Audio-visual speech separation with cross-modal consistency,” in Proc. CVPR, 2021, pp. 15 495–15 505.
  • R. Tao, Z. Pan, R. K. Das, X. Qian, M. Z. Shou, and H. Li, “Is someone speaking?: Exploring long-term temporal features for audio-visual active speaker detection,” in Proc. ACM Multimedia, 2021, pp. 3927–3935.
  • S. Chen, C. Wang, Z. Chen, Y. Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y. Qian, Y. Qian, J. Wu, M. Zeng, and F. Wei, “WavLM: Large-scale self-supervised pre-training for full stack speech processing,” CoRR, vol. abs/2110.13900, 2021.