GitHub - IrfanNafiz/RecMe: An intuitive gui-based voice identifier.

RecMe - The Speaker Identifier

A voice recognition system showcasing DSP techniques and machine learning for accurate speaker identification using a 1D-Convolutional Deep Neural Network with a focus on minimal dataset necessity for Edge Devices.

Report Bug

RecMe-The Speaker Identifier is a significant component of the final project for EEE332 Digital Signal Processing Lab 1. This voice recognition system aims to showcase the application of digital signal processing techniques and machine learning in the field of voice analysis and recognition using a 1D-Convolutional Deep Neural Network, and build a foundationary project on the utilization of minimal dataset because of edge device restrictions.

The project demonstrates the practical utilization of DSP concepts, and this repository specifically focuses on the voice recognition aspect. The goal is to provide an efficient and accurate system for identifying and distinguishing speakers based on their vocal characteristics using minimal dataset.

**Grade Received: A+ (4.0/4.0 GPA) **

Code Organization

The project's codebase is organized into different components, each serving a specific purpose. Below is an overview of the key components:

Training Model

The core training of the model is implemented in the train_model.py script. This script is built upon the Keras Speaker Recognition Example, adapting it to our project's requirements and optimizing using TensorBoard for applicaiton in Minimal Dataset use cases. It handles the training and evaluation of the speaker recognition model.

Console Based Application

The functionality of the application in a console-based environment is housed in the app.py script. This component provides a command-line interface for interacting with the trained model. Users can perform various actions, such as speaker recognition, using this application.

Graphical User Interface (GUI)

For a more user-friendly interaction, a graphical user interface (GUI) has been developed using the PyQT5 framework. The base code and related .ui files are located in the /pyqt5_ui/ directory. The GUI provides an intuitive way for users to interact with the speaker recognition functionality.

The initial GUI design is in the .ui files within /pyqt5_ui/.
The modified GUI code is implemented in main_ui.py, incorporating additional functionality and features.
The GUI application launcher is guiapp.py, which initializes and runs the PyQT5-based GUI application.

This organized structure ensures a clear separation of concerns and makes it easy to locate and modify specific parts of the codebase according to their respective functionalities.

File Management and Dataset Creation

The preprocessing step of the dataset management is implemented in the audio_slicer.py file. It automatically preprocesses any .wav files in the data\custom directory and organizes then in the audio folder within the previously mentioned folder for training purposes. The noise folder contains noise recordings from the dataset collected as will be explained in the next section.

The slicer.py file supplies preprocessed noises to the training model.

Dataset Directory

The dataset is stored in the data folder. The directory tree is organized as follows:

16000_pcm: This folder contains the downloaded dataset from Kaggle Speaker Recognition, from which only the noise samples are used.
custom: This folder contains the preprocessed data that is used in our application. It is the main dataset folder for the project.
raw_data: This folder contains the raw records of the dataset as they were initially obtained. These raw records are found within the preprocessed data folder.

Please make sure to reference the specific subfolders when working with the dataset in the project.

To access the dataset, you can use the following paths:

Noise samples: data/16000_pcm
Preprocessed data: data/custom
Raw records: data/raw_data

Usage

Hardware requirements:

A working microphone is necessary in order to capture your voice for the application. Make sure the system has detected your microphone.

Python Libraries:

A requirements.txt file is provided with all the necessary installations for running the application. Simply run the following code in the parent directory: pip install requirements.txt

FFMPEG

Please ensure ffmpeg is installed in your system. This installation is system specific. Follow this guide for Windows installation. Otherwise visit the ffmpeg site.

Tensorflow-GPU

Windows only supports tensorflow=2.10.0 for tensorflow gpu support. This requires further cuDNN and CUDA Toolkit installations. Visit this site for details on tensorflow gpu installation on Windows. Otherwise follows the tensorflow site.

Console Application

To use the application in a console, you can run the following code in the project directory terminal

python app.py

To turn on DEBUG mode, pass the argument -d into hte code as follows

python app.py -d

GUI Application

To use the GUI Application, run the following code in the parent directory:

python guiapp.py

Builds

You can also access the builds folder to install the application using the setup.exe, which will automatically install necessary components to run the application, and launch the application using RecMe.exe in the installation directory.

Results

License

Anyone is free to use this application under the MIT license. You can credit me if you want. :)

Acknowledgments

Keras Speaker Recognition Example was the core behind this project. Do send it some love.

Contact

You can contact me using my email: irfannafizislive@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
__pycache__		__pycache__
data		data
eval_metrics		eval_metrics
model_logs		model_logs
models_saves		models_saves
pyqt5_ui		pyqt5_ui
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
audio_slicer.py		audio_slicer.py
empty_plot.png		empty_plot.png
generate_plot.py		generate_plot.py
guiapp.py		guiapp.py
guiapp.spec		guiapp.spec
main_ui.py		main_ui.py
model.h5		model.h5
requirements.txt		requirements.txt
resampler.py		resampler.py
saved_variable.joblib		saved_variable.joblib
slicer.py		slicer.py
testing_stream_app.py		testing_stream_app.py
train_model.py		train_model.py
voice-recognition.ico		voice-recognition.ico
waveform_fft_output.png		waveform_fft_output.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RecMe - The Speaker Identifier

Table of Contents

Code Organization

Training Model

Console Based Application

Graphical User Interface (GUI)

File Management and Dataset Creation

Dataset Directory

Usage

Hardware requirements:

Python Libraries:

FFMPEG

Tensorflow-GPU

Console Application

GUI Application

Builds

Results

License

Acknowledgments

Contact

About

Releases

Packages

Languages

License

IrfanNafiz/RecMe

Folders and files

Latest commit

History

Repository files navigation

RecMe - The Speaker Identifier

Table of Contents

Code Organization

Training Model

Console Based Application

Graphical User Interface (GUI)

File Management and Dataset Creation

Dataset Directory

Usage

Hardware requirements:

Python Libraries:

FFMPEG

Tensorflow-GPU

Console Application

GUI Application

Builds

Results

License

Acknowledgments

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages