wsstriving

Follow

Shuai Wang wsstriving

Follow

73 followers · 30 following

Shanghai Jiao Tong University
wsstriving.github.io

Achievements

Achievements

Highlights

Pro

Stars

pytorch / torchtitan

A PyTorch native library for large model training

Python 2,919 233 Updated Jan 2, 2025

cantabile-kwok / vec2wav2.0

Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995

Python 63 6 Updated Dec 3, 2024

facebookresearch / audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

Python 494 61 Updated Oct 26, 2024

google / speaker-id

This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.

Python 389 39 Updated Oct 25, 2024

NZqian / RapBank

53 2 Updated Sep 13, 2024

wenet-e2e / wesep

Target Speaker Extraction Toolkit

Python 138 16 Updated Nov 6, 2024

huggingface / optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Python 2,647 486 Updated Dec 25, 2024

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 4,661 451 Updated Dec 26, 2024

huggingface / parler-tts

Inference and training library for high-quality TTS models.

Python 4,859 502 Updated Dec 10, 2024

BUTSpeechFIT / DVBx

Discriminative Training of VBx Diarization

Python 20 2 Updated Sep 23, 2024

IDRnD / redimnet

The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"

Python 129 6 Updated Nov 14, 2024

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 3,903 349 Updated Nov 29, 2024

fgnt / speaker_reassignment

Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment

Python 5 1 Updated Nov 5, 2024

Vincent-ZHQ / CA-MSER

Code for Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information

Python 135 15 Updated Nov 27, 2023

thuhcsi / SECap

Python 149 13 Updated Jul 9, 2024

kenjihiranabe / The-Art-of-Linear-Algebra

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

PostScript 18,291 2,213 Updated Nov 13, 2024

Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,581 213 Updated Aug 1, 2024

line / LibriTTS-P

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

117 2 Updated Jun 13, 2024

zjlww / ardit-web

HTML 25 1 Updated Aug 2, 2024

KdaiP / StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Python 373 42 Updated Sep 13, 2024

xjchenGit / SingGraph

Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).

Python 23 4 Updated Sep 27, 2024

yukara-ikemiya / friendly-stable-audio-tools

Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.

Python 155 11 Updated Jul 25, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 33,397 3,631 Updated Dec 3, 2024

Zejun-Yang / AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,752 592 Updated Jul 2, 2024

nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)

Python 569 94 Updated Dec 31, 2024

flinkerlab / neural_speech_decoding

Jupyter Notebook 94 11 Updated Apr 8, 2024

chenzomi12 / AISystem

AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 11,801 1,710 Updated Jan 2, 2025

wavmark / wavmark

AI-based Audio Watermarking Tool

Python 238 32 Updated Jan 7, 2024

huangwb8 / ChineseResearchLaTeX

中国科研常用LaTeX模板集

TeX 350 46 Updated Nov 18, 2024

liyunlongaaa / NSD-MS2S

CHIME-7/8 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence architecture

Shell 71 5 Updated May 17, 2024