End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks (TASLP 2018)

Introduction

This paper tries to solve the mismatch (as in Fig.1) between training objective function and evaluation metrics which are usually highly correlated to human perception. Due to the inconsistency, there is no guarantee that the trained model can provide optimal performance in applications. In this study, we propose an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) to reduce the gap between the model optimization and the evaluation criterion. Because of the utterance-based optimization, temporal correlation information of long speech segments, or even at the entire utterance level, can be considered to directly optimize perception-based objective functions.

Major Contribution

Utterance-based waveform enhancement
Direct short-time objective intelligibility (STOI) score optimization (without any approximation)

For more details and evaluation results, please check out our paper.

Waveform enhancement process:

Dependencies:

Python 2.7
keras=1.1.0 (recommended)

Note!

For the STOI loss function optimization, please e-mail me.

Citation

If you find the code useful in your research, please cite:

@article{fu2018end,
  title={End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural   networks},
  author={Fu, Szu-Wei and Wang, Tao-Wei and Tsao, Yu and Lu, Xugang and Kawai, Hisashi},
  journal={IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)},
  volume={26},
  number={9},
  pages={1570--1584},
  year={2018},
  publisher={IEEE Press}}
  
@inproceedings{fu2017raw,
  title={Raw waveform-based speech enhancement by fully convolutional networks},
  author={Fu, Szu-Wei and Tsao, Yu and Lu, Xugang and Kawai, Hisashi},
  booktitle={2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)},
  pages={006--012},
  year={2017},
  organization={IEEE}}

Contact

e-mail: jasonfu@iis.sinica.edu.tw or d04922007@ntu.edu.tw

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
images		images
README.md		README.md
Utterance_based_FCN_MSE.py		Utterance_based_FCN_MSE.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks (TASLP 2018)

Introduction

Major Contribution

Waveform enhancement process:

Dependencies:

Note!

Citation

Contact

About

Releases

Packages

Languages

JasonSWFu/End-to-end-waveform-utterance-enhancement

Folders and files

Latest commit

History

Repository files navigation

End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks (TASLP 2018)

Introduction

Major Contribution

Waveform enhancement process:

Dependencies:

Note!

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages