-
🎓 I’m currently a Ph.D. student at Harbin Institute of Technology and a research intern in Microsoft Research Asia.
-
🌱 My research interests include self-supervised learning, speech and audio processing and spoken language processing.
-
📄 My research highlights:
-
[Nov 2023] VALL-E produced the AI Audiobook of Impromptu: Amplifying Our Humanity Through AI with an “AI Reid” voice.
-
[Apr 2023] VALL-E wins the UNESCO Netexplo Innovation Award 2023 (top 10 out of over 3000 innovations of the year).
-
[Apr 2023] BEATs is accepted by ICML 2023 as an oral paper.
-
[Mar 2023] VALL-E X, a cross-lingual version of VALL-E that can help anyone speak a foreign language in their own voice without an accent. See https://aka.ms/vallex for demos.
-
[Jan 2023] VALL-E, a language modeling approach for text to speech synthesis, achieves state-of-the-art zero-shot TTS performance and emerges in-context learning capabilities. See https://aka.ms/valle for demos.
-
[Dec 2022] BEATs, a discrete label prediction based audio pre-training framework, ranks 1st in the AudioSet, Balanced AudioSet and ESC-50 leaderboards. We released the codes and pre-trained models.
-
[Nov 2022] WavLM is now available on TorchAudio. Try to use it here.
-
[Sep 2022] SpeechLM, a textual enhanced speech pre-training model, achieves 16% relative WER reduction over data2vec with only 10K text sentences on the LibriSpeech speech recognition benchmark. We released the codes and pre-trained models.
-
[Sep 2022] WavLM is published in IEEE Journal of Selected Topics in Signal Processing.
-
[Jan 2022] WavLM ranks 1st in the VoxSRC 2021 speaker verification permanent leaderboard.
-
[Dec 2021] WavLM demo of speaker verification is on Huggingface.
-
[Nov 2021] WavLM codes and pre-trained models are released here.
-
[Oct 2021] WavLM ranks 1st in the SUPERB leaderboard.
-
[Oct 2021] WavLM, a large-scale self-supervised pre-training framework for full-stack speech processing, achieves state-of-the-art performance on 19 tasks, including all the 15 tasks on SUPERB benchmark, VoxCeleb1 speaker verification benchmark, LibriCSS speech separation benchmark, CALLHOME speech diarization benchmark and LibriSpeech speech recognition benchmark.
-
[Oct 2021] Ultra fast continuous speech separation model is shipped in the Microsoft Conversation Transcription Service.
-
[Dec 2020] Our continuous speech separation model is shipped in the Microsoft Conversation Transcription Service.
-
[Oct 2020] Microsoft speaker diarization system with conformer-based continuous speech separation ranks 1st in the VoxCeleb Speaker Recognition Challenge 2020.
-
[Aug 2020] Continuous speech separation with conformer achieves state-of-the-art performance on the LibriCSS speech separation benchmark. We released the codes and pre-trained models. See demos here.
-
[Apr 2020] RecAdam, my 1st first-author paper, achieves state-of-the-art performance on the GLUE benchmark. We released the codes.
-
Research Scientist @ Meta FAIR
Pinned Loading
-
microsoft/unilm
microsoft/unilm PublicLarge-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
-
microsoft/UniSpeech
microsoft/UniSpeech PublicUniSpeech - Large Scale Self-Supervised Learning for Speech
-
CSS_with_Conformer
CSS_with_Conformer PublicCode for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.
-
CSS_with_TSTransformer
CSS_with_TSTransformer PublicCode for the INTERSPEECH-2021 paper: Ultra Fast Speech Separation Model with Teacher Student Learning.
-
CSS_with_EETransformer
CSS_with_EETransformer PublicCode for the ICASSP-2021 paper: Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.