WhisperScribe

Code Repository: https://github.com/gsu-library/whisper-scribe
Author: Matt Brooks mbrooks34@gsu.edu
Date Created: 2024-05-21
License: GPLv3
Version: 1.2.1

Description

WhisperScribe is a Django-powered web application that simplifies audio analysis by using AI for speech recognition (Faster Whisper) and speaker diarization (Pyannote.Audio). Users can upload or link media, generate accurate transcripts with speaker identification, and easily edit the results. This project also leverages CUDA support for quicker processing.

Requirements

Python v3.10.12
FFmpeg
Web Server
NVIDIA drivers (if using CUDA)

Installation

The following installation instructions are based on a Linux server install using Python v3.10.12.

Install Python. We recommend using version 3.10.12, as that is what this repository is built on. If you need to manage multiple Python versions, we suggest using Pyenv.
Install FFmpeg.
Install and configure a web server for static and media file hosting. This can also be used as a reverse proxy server to proxy Gunicorn. Either Apache or Nginx are recommended.
Either clone the WhisperScribe git repository or download the source code from the latest release. Move/extract the files in a location that is not being served by a web server.
Create a Python virtual environment inside the WhisperScribe folder - venv is recommended. Once created, activate and stay in the virtual environment for the remainder of the steps.
Install the required Python packages.
Copy the core/settings.sample.py file to core/settings.py and configure the settings file. If wanting to use a database other than SQLite configure it now (see Django's databases documentation).
Run Django database migrations: python manage.py migrate.
Create the cache table: python manage.py createcachetable.
Move static files: python manage.py collectstatic.
Install NVIDIA drivers if using CUDA (optional).
Create Django admin user (optional): python manage.py createsuperuser.

Installing Python Packages

To install the required Python packages it is recommended to use pip to install the freeze file that is used with this project: pip install -r requirements-freeze.txt. In some scenarios (not using Linux, different Python version, etc.) pip will fail to install the freeze file. If this is the case, installing the requirements.txt file should work: pip install -r requirements.txt.

Configuring the Web Server

A web server will have to be configured to host static and media files used by WhisperScribe. Django has documentation on how to deploy static files.

Configuring the Settings File

The SECRET_KEY and ALLOWED_HOST fields must be configured before running WhisperScribe. It is recommended to also take a look at the rest of the configurations in the settings file. See Django settings reference for additional information. If troubleshooting is needed for setup/configuration DEBUG can be enabled. DO NOT LEAVE THIS ENABLED IN A PRODUCTION ENVIRONMENT!

SECRET_KEY - REQUIRED
Run the following command while within the WhisperScribe Python virtual environment to generate a secret key: python -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())'

ALLOWED_HOSTS - REQUIRED
A list of strings representing the host/domain names that this Django site can serve.

CSRF_TRUSTED_ORIGINS
If using a reverse proxy to Gunicorn this will have to be set to Gunicorn's bind address. See CSRF trusted origins for more information.

HUGGING_FACE_TOKEN
This is required to use diarization. In order to create a token you must:

Accept pyannote/segmentation-3.0 user conditions,
accept pyannote/speaker-diarization-3.1 user conditions,
and create an access token at hf.co/settings/tokens.

UPPERCASE_SPEAKER_NAMES
If speaker names should be in uppercase or not in file downloads.

MAX_SEGMENT_LENGTH
The default max number of characters per segment.

MAX_SEGMENT_TIME
The default max length of segments in seconds.

WHISPER_LANGUAGE
The default for the langauge spoken in the audio. Set to None or '' for auto detection as a default.

WHISPER_MODELS
The list of models available to Whisper (tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, distil-large-v3). See https://huggingface.co/Systran.

WHISPER_MODEL_DEFAULT
The default whisper model to show (from the list of WHISPER_MODELS).

USE_DJANGO_Q
Whether to use Django Q or not. This may cause issues in a Windows environemnt. If disabled the WhisperScribe interface will hang while processing audio.

DATABASES
Configure what kind of database you want to use. The default is SQLite. See https://docs.djangoproject.com/en/5.1/ref/settings/#databases and https://docs.djangoproject.com/en/5.1/ref/databases/.

TIME_ZONE
Set to your local time zone.

MEDIA_URL
The URL that handles the media served from MEDIA_ROOT. This must end in a slash.

MEDIA_ROOT
The absolute filesystem path to the directory that will store the media files.

STATIC_URL
The URL to use when referring to the static files located in STATIC_ROOT. This must end in a slash.

STATIC_ROOT
The absolute filesystem path to the directory where the collectstatic command will move static files for deployment.

NVIDIA Drivers

The NVIDIA drivers available will depend on the OS and the video card installed. Ubuntu provides a helpful article that goes over searching for and installing NVIDIDA drivers. We have had success on our setup using the nvidida-driver-535-server package.

MySQL Drivers

To connect WhisperScribe to a MySQL database a MySQL pip package, headers, and libraries will have to be installed. The mysqlclient pip package is recommended. The installation instructions can be found on the mysqlclient pypi.org page.

Usage

Manual Startup

Use the following commands to start the Django application and to run Django Q (within the Python virtual environment). If Django Q is disabled in the settings file the qcluster command does not need to be included.

gunicorn core.wsgi
python manage.py qcluster

If wanting to run Gunicorn on a port other than 8000 the -b flag can be passed to set the bind address and port.

Using Systemd Service

The systemd service can be used to run WhisperScribe on Linux operating systems. To set this up first copy both the whisperscribe.sample.service and whisperscribe-q.sample.service files to whisperscribe.service and whisperscribe-q.service respectively. Then edit both copied files to update the paths for WorkingDirectory, Environment, and ExecStart. For all three make sure the absolute path to WhisperScribe is used and for the Environment and ExecStart directives make sure the name of the virtual environment folder is correct. Also make sure the path for Environment includes the correct version of Python. Once configured the files can be added to systemd with the following commands. You will need to edit the command to use the path to your instance of WhisperScribe.

sudo systemctl enable /path/to/whisperscribe/whisperscribe.service
sudo systemctl enable /path/to/whisperscribe/whisperscribe-q.service

Once both services are enabled WhisperScribe will start automatically during normal boot. WhisperScribe can also be started, stopped, and restarted with the following commands.

sudo systemctl start whisperscribe
sudo systemctl stop whisperscribe
sudo systemctl restart whisperscribe

Updates

Check the CHANGELOG and release notes to see if there are any major changes with the core/settings.sample.py file, if a migration is required, if the requirements-freeze.txt pip packages file has been updated, or if static files need to be migrated.

It never hurts to run the commands below after an update (while in the Python virtual environment).

pip install -r requirements-freeze.txt
python manage.py migrate
python manage.py collectstatic

Additional Information

Reverse Proxy Server

At some point you will want to reverse proxy a web server to WhisperScribe in order to use SSL certificates. Apache and NGINX provide well documented guides on setting up reverse proxies. Gunicorn also provides a guide on setting up a reverse proxy using Nginx. Do note that if using a reverse proxy server some additional settings will need to be adjusted such as max post size.

Developer Notes

The Django project folder is 'core' and the application folder is 'webui'.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
core		core
webui		webui
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
manage.py		manage.py
package-lock.json		package-lock.json
package.json		package.json
requirements-freeze.txt		requirements-freeze.txt
requirements.txt		requirements.txt
whisperscribe-q.sample.service		whisperscribe-q.sample.service
whisperscribe.sample.service		whisperscribe.sample.service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperScribe

Description

Requirements

Installation

Installing Python Packages

Configuring the Web Server

Configuring the Settings File

NVIDIA Drivers

MySQL Drivers

Usage

Manual Startup

Using Systemd Service

Updates

Additional Information

Reverse Proxy Server

Developer Notes

Dependencies

About

Releases 4

Contributors 2

Languages

License

gsu-library/whisper-scribe

Folders and files

Latest commit

History

Repository files navigation

WhisperScribe

Description

Requirements

Installation

Installing Python Packages

Configuring the Web Server

Configuring the Settings File

NVIDIA Drivers

MySQL Drivers

Usage

Manual Startup

Using Systemd Service

Updates

Additional Information

Reverse Proxy Server

Developer Notes

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases 4

Contributors 2

Languages