Stars
PyTorch implementation of Real-ESRGAN model
SOFA_AI: Singing-Oriented Forced Aligner for Automatic Inference
A library built for easier audio self-supervised training, downstream tasks evaluation
Textual Inversion for Stable Diffusion XL 1.0
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Parameters to analyse audio files
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Controllable and fast Text-to-Speech for over 7000 languages!
The remake of the https://github.com/biubug6/Pytorch_Retinaface
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
YSDA course in Speech Processing.
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS
Framework for processing and filtering datasets
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Hackers' Guide to Language Models
Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
This is my reimplementation of Tacotron2 based on nvidia implementation
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context