Stars
A PyTorch native library for large model training
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Inference and training library for high-quality TTS models.
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
Multilingual Voice Understanding Model
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
Code for Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
MARS5 speech model (TTS) from CAMB.AI
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
A generative speech model for daily dialogue.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
CHIME-7/8 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence architecture