sound_phrase_classifier

Classifies sound phrases from large scale corpora using NLP, POS tagging, Word Embeddings, and SVMs.

Description

This project is a replication of the experiments conducted in Section 2 of the paper: "Discovering sound concepts and acoustic relations in text" found on IEEE Xplore

The project processes large scale text corpora and uses regular expressions and POS tagging to classify sound phrases. I then manually labeled around 3000 sound phrases obtained previously into sound or non-sound classification. The resulting was used to train a Linear SVM to produce a sound phrase vs non-sound phrase classifier.

The project runs in Python3.

Files included

train_sound_clf.py
test_sound_clf.py
run_sound_clf.py
training_data

Additional Files:
training_data (training data for train_sound_clf.py)
clf1.model (classifier model trained on word2vec 300d vectors)
sample_document (input for run_sound_clf.py when set to "true")
sample_list (input for run_sound_clf.py when set to "false")
results.txt (output from run_sound_clf.py when input is sample_list)

Files not included (must download)

Google's pretrained word2vec represantations model (found here: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit )
Stanford's GloVe pretrained vectors (found here: https://nlp.stanford.edu/projects/glove/)

Dependencies and Libraries

numpy, optunity, gensim, sklearn, pickle, sys, os, nltk, re

How to Use

To train the sound_classifier on new data and get a saved copy of the LinearSVM model, run:
python3 train_sound_clf.py (word2vec/glove) <embeddings_filename> <training_data_filename>
This will save the classifier model to the filename 'clf1.model'

To test the accuracy of the sound classifier on a list of labeled data, run:
python3 test_sound_clf.py (word2vec/glove) <embeddings_filename> <model_filename> <test_data_filename>
This will print the accuracy of the classifier on the test data.

To run the classifier on a large text document or a list of unlabeled sounds, run:
python3 run_sound_clf.py (word2vec/glove) <embeddings_filename> <model_filename> <data_filename> (true/false)
(True for large document, false for list of sounds) This will process the document(or list) and output a list (results.txt) of filtered sound phrases with their confidence scores.

IMPORTANT: A classifier may be trained on glove or word2vec embeddings only. Additionally, the input files for training_data and sample_list (when run_sound_clf.py set to 'false') must match the format given in the examples files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sound_phrase_classifier

Description

Files included

Files not included (must download)

Dependencies and Libraries

How to Use

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
README.md		README.md
clf1.model		clf1.model
results.txt		results.txt
run_sound_clf.py		run_sound_clf.py
sample_document		sample_document
sample_list		sample_list
test_sound_clf.py		test_sound_clf.py
train_sound_clf.py		train_sound_clf.py
training_data		training_data

radurevutchi/sound_phrase_classifier

Folders and files

Latest commit

History

Repository files navigation

sound_phrase_classifier

Description

Files included

Files not included (must download)

Dependencies and Libraries

How to Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages