Spoken_language_identification

SpeechFlow is an advanced speech-to-text API that offers exceptional accuracy for businesses of all sizes and industries. With SpeechFlow, users can transcribe audio and video content into text with high precision, making it an ideal solution for companies that need to quickly and accurately convert speech into text for various purposes, such as captioning, transcription, and analysis. With support for multiple languages and dialects, SpeechFlow is a versatile tool that can cater to a wide range of businesses and industries.

Spoken_language_identification

Objective
Technology
Available models and languages
Environment Setup
Code Implementation
LICENSE

Objective

Spoken Language Identification (LID) is defined as detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker. It has numerous applications in speech recognition, multilingual machine translations, and speech-to-speech translations.

Our model currently supports 13 languages: English, Spanish, Italian, French, German, Portuguese, Russian, Turkish, Vietnamese, Indonesian, Chinese, Japanese, and Korean.

Technology

The model uses convolutional and recurrent neural networks trained on two thousands of hours of speech data(private). Approximately 150 hours of speech supervision per language.

Available models and languages

The figure below shows a ACC (Accuracy) breakdown by languages of the FLEURS test-set using pretrained model.
FLEURS dataset downloads can be fount here: Downloads

Environment Setup

The models are implemented in TensorFlow. To use all of the functionality of the library, you should have:
tensorflow==2.4.1
tensorflow-gpu==2.4.1
tensorflow-addons==0.15.0
matplotlib==3.5.0
numpy==1.19.5
scikit-learn==1.0.1
librosa==0.8.1
SoundFile==0.10.3.post1
PyYAML==6.0

Download the codebase and open up a terminal in the root directory. Make sure python 3.7 is installed in the current environment. Then execute

pip install -r requirements.txt

Code Implementation

Audio Format

The wav files have 16KHz sampling rate, single channel, and 16-bit Signed Integer PCM encoding.

Features

As speech features, 80-dimensional log mel-filterbank outputs were computed from 25ms window for each 10ms. Those log mel-filterbank features were further normalized to have zero mean and unit variance over the training partition of the dataset.

Prepare your input data

You must prepare your own data before training the model, refer to 'data/demo_txt/demo_train.txt' file.

Train model

To get start, please config 'congfigs/config.yml' file, and simple run this command in the console:

python train.py

This will train Spoken_language_identification model by data in the 'data/demo_txt/demo_train.txt', then store the model on saved_weights folder, perform inference on 'demo_txt/demo_test.txt', print the inference results, and save the averaged accuracy in a text file.

Inference

The pretrained model is provided in this project, simple run this command:

python predict_by_pb.py test_audios/chinese.wav

or

python predict_by_weights.py test_audios/chinese.wav

The provided chinese.wav audio needs to meet the Audio Format, if your audio file is not wav format(eg: mp3), you can convert the audio to wav format by ffmpeg. Run the following command in your audio directory convert to wav format.

ffmpeg -i audio.mp3 -ab 256k -ar 16000 -ac 1 -f wav audio.wav

If you don't have installed ffmpeg, please installed it first.

sudo apt-get update
sudo apt-get install ffmpeg

LICENSE

Spoken_language_identification is released under the Apache License, version 2.0. The Apache license is a popular BSD-like license. Spoken_language_identification can be redistributed for free, even for commercial purposes, although you can not take off the license headers (and under some circumstances, you may have to distribute a license document).

Name	Name	Last commit message	Last commit date
Latest commit zhongemma Update README.md Mar 22, 2023 9366e90 · Mar 22, 2023 History 28 Commits
configs	configs	Delete configs/__pycache__ directory	Mar 17, 2023
data	data	first commit	Mar 17, 2023
featurizers	featurizers	Delete featurizers/__pycache__ directory	Mar 17, 2023
models	models	Delete models/layers/__pycache__ directory	Mar 17, 2023
optimizers	optimizers	first commit	Mar 17, 2023
saved_models/lang14/pb/2	saved_models/lang14/pb/2	Delete saved_models/lang14/pb/1 directory	Mar 17, 2023
saved_weights/20230228-084356	saved_weights/20230228-084356	Delete f1-score.png	Mar 17, 2023
test_audios	test_audios	add test audio	Mar 17, 2023
util	util	Delete util/__pycache__ directory	Mar 17, 2023
vocab	vocab	Delete vocab/__pycache__ directory	Mar 17, 2023
.gitignore	.gitignore	Create .gitignore	Mar 20, 2023
LICENSE	LICENSE	Initial commit	Mar 16, 2023
README.md	README.md	Update README.md	Mar 22, 2023
convert_to_pb.py	convert_to_pb.py	first commit	Mar 17, 2023
dataset.py	dataset.py	first commit	Mar 17, 2023
fleurs.jpg	fleurs.jpg	Add files via upload	Mar 16, 2023
librosa_mel_filter.csv	librosa_mel_filter.csv	first commit	Mar 17, 2023
network.png	network.png	Add files via upload	Mar 16, 2023
predict_by_pb.py	predict_by_pb.py	add test audio	Mar 17, 2023
predict_by_weights.py	predict_by_weights.py	add test audio	Mar 17, 2023
requirements.txt	requirements.txt	first commit	Mar 17, 2023
speechflow.jpg	speechflow.jpg	Add files via upload	Mar 17, 2023
train.py	train.py	add test audio	Mar 17, 2023
write_dat.py	write_dat.py	first commit	Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spoken_language_identification

Objective

Technology

Available models and languages

Environment Setup

Code Implementation

Audio Format

Features

Prepare your input data

Train model

Inference

LICENSE

About

Releases

Packages

Languages

License

SpeechFlow-io/Spoken_language_identification

Folders and files

Latest commit

History

Repository files navigation

Spoken_language_identification

Objective

Technology

Available models and languages

Environment Setup

Code Implementation

Audio Format

Features

Prepare your input data

Train model

Inference

LICENSE

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages