DyViSE

model: models/ECAPA_lip/ecapa_tdnn_lip.py

loss: losses/angleproto.py

Reference

E. Z. Xu, Z. Song, C. Feng, M. Ye, and M. Z. Shou, “AVA-AVD: Audio-visual speaker diarization in the wild,” CoRR, vol. abs/2111.14448, 2021
R. Gao and K. Grauman, “Visualvoice: Audio-visual speech separation with cross-modal consistency,” in Proc. CVPR, 2021, pp. 15 495–15 505.
R. Tao, Z. Pan, R. K. Das, X. Qian, M. Z. Shou, and H. Li, “Is someone speaking?: Exploring long-term temporal features for audio-visual active speaker detection,” in Proc. ACM Multimedia, 2021, pp. 3927–3935.
S. Chen, C. Wang, Z. Chen, Y. Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y. Qian, Y. Qian, J. Wu, M. Zeng, and F. Wei, “WavLM: Large-scale self-supervised pre-training for full stack speech processing,” CoRR, vol. abs/2110.13900, 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
losses		losses
models		models
utils		utils
README.md		README.md