Sentence Classification

Implementation of sentence classification using CNN, and RNN with LSTM/GRU units.

Dependencies

This code is written in python. To use it you will need:

Python 2.7
Theano 0.7
A recent version of NumPy
NLTK 3 (for dataset preprocessing)

Getting started

We use TREC Question-Type Classification as a demo to illustrate the usage of CNN/LSTM/GRU classifier.
Download the dataset from http://cogcomp.cs.illinois.edu/Data/QA/QC/ (train_5500.label and TREC_10.label) and put these into the data directory.
Since the dataset is relatively small, we use the pretrained word embedding as the initialization of the word embedding parameters (to be refined).
Using the word2vec vectors will require downloading the binary file (i.e. GoogleNews-vectors-negative300.bin file) from https://code.google.com/p/word2vec/. Put this binary file into the data directory.
To process the raw data, run

cd ./data
python process_trec.py

Running the models

This code can be run in CPU/GPU devices directly.

Example commands:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python eval_trec_cnn.py 
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python eval_trec_lstm.py
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python eval_trec_gru.py

For each classifier, we run the algorithm 10 times and take the average as the final test error. The classification accuracies after running the code should be close to the following numbers.

CNN: 93.30 \pm 0.59   LSTM: 93.14 \pm 0.69    GRU: 92.66 \pm 0.77

All the printout information will be stored into a log file.

Acknowledgments

Our implementation utilizes code from the following:

Licence

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence Classification

Dependencies

Getting started

Running the models

Acknowledgments

Licence

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
model		model
LICENSE		LICENSE
README.md		README.md
eval_trec_cnn.py		eval_trec_cnn.py
eval_trec_gru.py		eval_trec_gru.py
eval_trec_lstm.py		eval_trec_lstm.py

License

zhegan27/sentence_classification

Folders and files

Latest commit

History

Repository files navigation

Sentence Classification

Dependencies

Getting started

Running the models

Acknowledgments

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages