Speech-denoising-Autoencoder

Speech denoising systems usually enhance only the magnitude spectrum while leaving the phase spectrum. This system try to improve the performance of denoising system based on denoising autoencoder neural network. The estimation of clean audio is computed by complex ideal ratio mask to enhance the phase information.

Structure

Input : audio data on mel-frequency domain

Output: complex ratio mask (cRM)[1]

This model built in linear shape (2049-500-180) without weight lock[2].

Source

youtube-dl : a command-line program to download videos from YouTube.com and a few more sites

SoX : a cross-platform command line utility to convert various formats of audio files in to other formats

FFmpeg : a complete, cross-platform solution to record, convert and stream audio and video

librosa : python package for music and audio analysis

Reference

[1] Complex Ratio Masking for Monaural Speech Separation, D.Williamson, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 3, MARCH 2016

[2] Speech Synthesis with Deep Denoising Autoencoder, Zhenzhou Wu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speech-denoising-Autoencoder

Structure

Source

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speech-denoising-Autoencoder

Structure

Source

Reference