Releases: wenet-e2e/wespeaker
Releases · wenet-e2e/wespeaker
WeSpeaker 1.2.0
What's Changed
- Add a recipe for the validation set of VoxSRC-23's diarization track by @xx205 in #166
- Support the SphereFace2 loss function by @Hunterhuan in #173
- Support the Self-Supervised Learning (SSL) recipe on Voxceleb dataset, including DINO, MoCo, and SimCLR, by @czy97 and @Hunterhuan in #180
- Support the NIST SRE16 recipe by @czy97 in #177
- Support the kaldi-compatible PLDA and unsupervised adaptation by @wsstriving in #186
WeSpeaker 1.1.0
What's Changed
- Support the RepVGG model by @cdliang11 in #102
- Support Automatic Mixed Precision (AMP) training by @czy97 in #103
- Add Triton GPU deployment for diarization pipeline by @wd929 in #113
- Support Multi-Query Multi-Head Attentive Pooling (MQMHASTP) and Intertopk-Subcenter loss by @Hunterhuan in #115
- Support C++ onnxruntime by @cdliang11 in #135
- Support the CAM++ model by @JiJiJiang in #153
- Add more pretrained models by @wsstriving, @czy97 and @JiJiJiang
WeSpeaker 1.0.0
Highlight
- Competitive results: compared with SpeechBrain, ASV-Subtools, etc
- Light-weight: clean and simple codes, no Kaldi dependency
- Unified IO (UIO): designed for large-scale training data
- On-the-fly feature preparation: provide different data augmentation methods
- Distributed training: adopted for multi-node multi-GPU scalability
- Production ready: support TensorRT or ONNX exporting format, with a triton inference server demo
- Pre-trained models: provide the python bindings, and a Hugging face interactive demo on speaker verification
Overall Structure
Recipes
We provide three well-structured recipes:
- Speaker Verification: VoxCeleb an CNCeleb (SOTA results)
- Speaker Diarization: VoxConverse (An example of using pre-trained speaker model)
Support List
- SOTA Models: TDNN-based x-vector, ResNet-based r-vector, and ECAPA_TDNN
- Pooling Functions: statistics-based TAP/TSDP/TSTP, and attention-based ASTP
- Criteria: standard Softmax, and margin-based A-/AM-/AAM-Softmax
- Scoring: Cosine, PLDA, and Score Normalization (AS-Norm)
- Metric: EER, minDCF (DET curve), and DER
- Online Augmentation: Resample, Noise && RIR, Speed Perturb, and SpecAug
- Training strategies: Well-designed learning-rate and margin schedulers, Large margin fine-tuning