Speech and Audio Processing (2024) - Word Segmentation and Classification using Machine Learning

Project Overview

This project is an individual assignment for the "Speech and Audio Processing" course, offered in the 8th semester of the 2024 academic year at the University of Piraeus, Department of Informatics. The project focuses on developing a Python program that segments speech recordings into words and classifies them into background or foreground sounds using machine learning classifiers. The system must process the input audio and identify the word boundaries without any prior knowledge of the number of words.

Course Information

Institution: University of Piraeus
Department: Department of Informatics
Course: Speech and Audio Processing (2024)
Semester: 8th

Technologies Used

Python
Libraries:
- TensorFlow: For building and training neural networks.
- NumPy: For numerical computations.
- librosa: For audio feature extraction.
- pandas: For handling datasets.
- inquirer: For interactive CLI options.
- speechrecognition: For basic speech processing.
- pydub: For audio playback and processing.

Classifiers Implemented

This project implements four different classifiers for segmenting and classifying the audio:

Support Vector Machines (SVM)
Multilayer Perceptron (MLP)
Recurrent Neural Networks (RNN)
Least Squares

Each classifier is trained to identify whether a portion of the audio corresponds to background noise or speech. The classifiers are trained and tested using features extracted from the audio dataset.

Dataset Details

The VOiCES dataset is used for training and testing. It contains various audio samples divided into:

Background sound: Contains only noise without speech.
Foreground sound: Contains speech, with or without background noise.

The dataset is organized into the following structure:

auxiliary2024/input
├── VOiCES_devkit
    ├── distant-16k
    ├── references
    ├── source-16k

Project Structure

The repository is structured as follows:

/source2024/
    svm.py                              # SVM classifier implementation
    mlp.py                              # MLP classifier implementation
    rnn.py                              # RNN classifier implementation
    least_squares.py                    # Least Squares classifier implementation
    main.py                             # Main script for full process execution
/docs/
    Project-description.pdf             # Description of the project
    Project-documentation.pdf           # Detailed project documentation
/images/
    program_generations.png             # Generation messages
    program_solution_exists.png         # Solution found message
    program_solution_exists_image.png   # Graph visualization
    program_start.png                   # Program start message
/auxiliary2024/
    input/                              # Contains the VOiCES dataset
    output/                             # Output directory for classifier predictions

Additionally:

main_menu.py: This is located in the root directory and provides an interactive menu for selecting which part of the project to execute (e.g., dataset loading, feature extraction, model training, etc.).

Setup Instructions

Prerequisites

Ensure you have Anaconda or Miniconda installed on your system. You can download it from here.

Running the Program

Clone the repository and navigate to the project directory:

git clone https://github.com/thkox/speech-and-audio-processing
cd speech-and-audio-processing

Download the VOiCES dataset from here.
- Extract the dataset to the auxiliary2024/input/ directory.
Install the required libraries:
```
python setup.py
```

Activate the conda environment:

conda activate speech-and-audio-processing

Running the Classifiers

Once the environment is activated, you can execute the program using either of the following scripts:

main.py: Executes the entire process, including loading the dataset, extracting features, training the models, and making predictions.
main_menu.py: Opens an interactive menu where you can choose specific tasks, such as loading the dataset, extracting features, training models, and transcribing audio files.

Example Output

Upon successful execution, the following will be displayed:

The progression of training and testing, along with the best classifier performances.
The final solution showing the word boundaries detected by the classifiers.
A graphical representation of the speech waveform with detected word intervals.

Graph Visualization:
Generation Messages:
Solution Found:

Project Documentation

For detailed explanations of the code, classifiers, and algorithms used in this project, refer to the Project-documentation.pdf located in the /docs directory. This document contains thorough explanations of all steps, including dataset preparation, feature extraction, model training, and evaluation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech and Audio Processing (2024) - Word Segmentation and Classification using Machine Learning

Project Overview

Course Information

Technologies Used

Classifiers Implemented

Dataset Details

Project Structure

Setup Instructions

Prerequisites

Running the Program

Running the Classifiers

Example Output

Project Documentation

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
auxiliary2024		auxiliary2024
docs		docs
images		images
source2024		source2024
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main_menu.py		main_menu.py
setup.py		setup.py

License

thkox/uni_2024-word-segmentation-and-classification

Folders and files

Latest commit

History

Repository files navigation

Speech and Audio Processing (2024) - Word Segmentation and Classification using Machine Learning

Project Overview

Course Information

Technologies Used

Classifiers Implemented

Dataset Details

Project Structure

Setup Instructions

Prerequisites

Running the Program

Running the Classifiers

Example Output

Project Documentation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages