The audio demos with respect to the paper "DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement" are provided (Accepted by IEEE TASLP). The code and the pretained model is also released.
You can use dual_aia_trans_merge_crm() in aia_trans.py for dual-branch SE, while aia_complex_trans_mag() and aia_complex_trans_ri() are single-branch aprroaches. The trained weights on VB dataset, 30h WSJ0-SI84 datset and 300h 2020 DNS-Challenge are also provided. You can directly perform inference or finetune the model by using vb_aia_merge_new.pth.tar.
CUDA 10.1
torch == 1.8.0
pesq == 0.0.1
librosa == 0.7.2
SoundFile == 0.10.3
prepare your data. Run json_extract.py to generate json files, which records the utterance file names for both training and validation set
# Run json_extract.py
json_extract.py
change the parameter settings accroding to your directory (within config_vb.py or config_dns.py)
Network Training (you can also use aia_complex_trans_mag() and aia_complex_trans_ri() network in aia_trans.py for single-branch SE)
# Run main_vb.py or main_dns.py to begin network training
# solver_merge.py and train_merge.py contain detailed training process
main_vb.py
The trained weights are provided in BEST_MODEL.
# Run enhance_vb.py or enhance_wsj.py to enhance the noisy speech samples.
enhance_vb.py