Kinect-WSJ

Code to simulate a reverberated, noisy version of the WSJ0-2MIX dataset. Microphones are placed on a linear array with spacing between the devices resembling that of Microsoft Kinect ™, the device used to record the CHiME-5 dataset. This was done so that we could use the real ambient noise captured as part of CHiME-5 dataset. The room impulse responses (RIR) were simulated for a sampling rate of 16,000 Hz.

Requirements

Instructions

Run ./create_corrupted_speech.sh --stage 0 --wsj_data_path wsj_path --chime5_wav_base chime_path --dihard_sad_label_path dihard_path --dest save_path

Paths

wsj_path : Path to precomputed wsj-2mix dataset. Should contain the folder 2speakers/wav16k/
chime_path : Path to chime-5 dataset. Should contain the folders train, dev and eval
dihard_path : Path to dihard labels. Should contain *.lab files for the train and dev set

Stages

stage 0: Download RIR files (~ 7 GB) to save_path, deflate the compressed files and create a softlink to the RIRs in the current folder
stage 1: Extract chime 5 noise using dihard labels
stage 2: Create the mixture. See the code to allow parallelization

Output Data

Creates the following sub-folders in each of tr, tt and cv folders:

s1 : spatial image of s1 (Reverberated version of s1 speaker)
s2 : spatial image of s2 (Reverberated version of s2 speaker)
s1_direct : direct component of s1 at each of the microphones
s2_direct : direct component of s2 at each of the microphones
s1_early : Contains only the early reflections (first 50 ms of RIR. See config.py to change the value) of s1 at each of the microphones
s2_early : Contains only the early reflections (first 50 ms of RIR. See config.py to change the value) of s2 at each of the microphones
noise : Contains the noise imposed for each mixture
mix : s1 + s2 + noise
list.yaml : A yaml file containing the positions and direction of arrival (DOA) for each utterance and speaker

Hard disk usage

Dataset type	Per sub-folder	Total *
Train (tr)	21G	168G
Validation (cv)	5.2G	41.6G
Test (tt)	3.2G	25.6G

[*] Combination of mix, s1_early, s2_early, s1_direct, s2_direct and noise.

References

Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

If you are using this code please cite the following paper:

@inproceedings{sivasankaran2019analyzing,  
  booktitle = {2020 28th {{European Signal Processing Conference}} ({{EUSIPCO}})},  
  title={Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition},
  author={Sunit Sivasankaran and Emmanuel Vincent and Dominique Fohr},
  year={2021},  
  month = Jan,  
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
list		list
noise_from_chime5		noise_from_chime5
LICENSE		LICENSE
README.md		README.md
config.py		config.py
create_corrupted_speech.py		create_corrupted_speech.py
create_corrupted_speech.sh		create_corrupted_speech.sh
feat_create_utils.py		feat_create_utils.py
olafilt.py		olafilt.py
parse_options.sh		parse_options.sh
requirements.txt		requirements.txt
utils.py		utils.py
wsj_utils.py		wsj_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kinect-WSJ

Requirements

Instructions

Paths

Stages

Output Data

Hard disk usage

References

About

Releases

Packages

Contributors 2

Languages

License

sunits/Reverberated_WSJ_2MIX

Folders and files

Latest commit

History

Repository files navigation

Kinect-WSJ

Requirements

Instructions

Paths

Stages

Output Data

Hard disk usage

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages