Download fasttext model:
wget https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.zip
unzip wiki.en.zip
Download training data, e.g. from:
https://drive.google.com/drive/folders/1vgJf0hev60GeZbfyOiqU1YtX_h1FaGl8
Train the text classifier:
- Edit
config.yaml
. Choose classifier model (svm, xgboost, fasttext) and make it True. - In case of
fasttext
, place the path to the pretrained embeddings vector atpretrained_embedding_vectors
- In case of SVM, set its parameters.
- If you wish to reduce your dataset for debbuging reasons, use the
hop_samples
variable. - Run the training script script:
-
case 1: Using the whole fasttext embedding model:
python3 train_text.py --annotation valence_transcriptions_labels.csv --pretrained wiki.en.vec -o valence
-
case 2: Using an embeddings limit in order to avoid memory errors:
python3 train_text.py --annotation valence_transcriptions_labels.csv --pretrained wiki.en.vec -o valence_500K -l 500000
-
case 3: Using bert embeddings:
python3 train_text.py --annotation valence_transcriptions_labels.csv --pretrained bert -o valence
-
Input is training data in csv format with two columns: "transcriptions"
(the text data examples) and "labels" (the ground truth target value of
the classification task).
The output classifier dictionary will be saved at the defined
output_folder
.
python3 test_text.py -i "i am so sad please help me I need help this is so fucking awful. this is bad i am so angry you fucking idiot. Yes i am so happy this is really great" -c output_models/text/valence.pt
returns:
--> Loading the text embeddings model
--> Extracting text features
{'__label__val_high': 33.333333333333336, '__label__val_low': 66.66666666666667, '__label__val_neutral': 0.0}
Download training data
https://drive.google.com/drive/folders/12Ai25ZUCysAUUWRSiSd5WrsWRbB1gRuF
Train the audio classifiers
- Edit
config.yaml
, under theaudio_classifier
and choose classifier type (i.e. svm, xgboost) - In case of SVM, set its parameters.
- Set the parameters for the segment based feature extraction,
using the field
basic_features_params
- Run the script:
python3 train_audio.py -i valence_train -o audio_valence
(input folder is supposed to have audio files as samples, organized in folders-classes). Model is saved as provided by the-o
argument in the folder specified in theaudio_classifier.out_folder
field
python3 test_audio.py -i test.wav -c output_models/audio/audio_arousal.pt
returns
--> Extracting audio features
Predicted probability percentage for each class:
({'neutral': 0.0, 'strong': 80.0, 'weak': 20.0}, ['strong', 'strong', 'strong', 'strong', 'weak'])
Download training data
https://drive.google.com/drive/folders/1kIOdlztkGKfYZZONXYqT1h9OtJS9hFBP
To train:
- Edit
config.yaml
and go under therecording_level_classifier
field to setclassifier_type
(svm, svm_rbf, knn, gradientboosting, extratrees, randomforest) raudio_num_features_discard
: the number of readys audio features to discard.pyaudio_num_features
: the number of pyaudio features to keep (use 'all' for all features otherwise use a number)gender
: 'male' or 'female' (this is not needed if the samples are already seperated)- In
features_type
choose the type of the segment-level features you wish to use (fused, text or audio) audio_features
: 'late_fused' , 'fused' or 'audio' for late fusion of readys+pyaudio, early fusion of readys+pyaudio or readys_audio only, respectivelylate_fusion
: in the fieldclassifier_pyaudio
define the classifier's type of pyaudio features (lr,NaiveBayes or svm_rbf), in the fieladclassifier_raudio
define the classifier type of readys_audio features (lr,NaiveBayes or svm_rbf).- In
metric
choose the validation metric (i.e. f1_macro) - In
google_credentials
write the path of google credentials file - In
reference_text
choose False if no reference text is used or True if reference text is used (Attention!! If reference_text is True then the .txt reference files should be located in the same directory as their relative wav files and have the same name) - In
text_segmentation_params
segmentation_threshold
choose None if no threshold is used,otherwise choose an integer number which defines the number of words per segment or the segment's duration (sec).method_of_segmentation
choose None for text segmentation in sentences,"fixed_size_text" for segmentation in fixed number of words or "fixed_window" for segmentation in fixed seconds.
- In
audio_models_folder
text_models_folder
andout_folder
write the paths of audio models,text models and recording level models respectively audio_features
: iffused
concatenates models' features with hand-crafted audio features from pyAudioAnalysis.pyaudio_params
: ifaudio_features
is fused, choose the hand-crafted feature parameters- Run training script e.g.
python3 train_recording_level_classifier.py -i speeches -mn speeches -f
where the -i
argument is the path to the folder that contains the folders
(one folder for each class, each one containing the respective recordings -
examples of each class). The model in the above example will be saved in
output_models/recording_level/speeches.pt
or in case of late fusion two models will be saved: output_models/recording_level/speeches_pyaudio.pt
, output_models/recording_level/speeches_readys.pt
. The -f
argument is optional and should be used only if the input files are extracted features and not audio.
Download trained models
https://drive.google.com/drive/folders/1vuW93WZgb84Nt4iSjnuRTrEUfOUns4iV
(or use your own trained models)
Test recording-level:
- Open the config file and define the google_credentials path and audio, text models paths as before.
- Run
test_recording_level.py
e.g.python3 test_recording_level.py -i test.wav -m output_models/recording_level/speeches.pt
or in case of late fusion run:python3 test_recording_level.py -i test.wav -m output_models/recording_level/speeches_MA.pt -m2 output_models/recording_level/speeches_LLA.pt