Skip to content

sa1ka/pytorch_language_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fea59b3 · Jul 18, 2018

History

20 Commits
Jul 18, 2018
Mar 29, 2018
Jan 4, 2018
Mar 29, 2018
Mar 24, 2018
Jan 4, 2018
Jul 18, 2018
Mar 24, 2018
Mar 24, 2018
Jul 18, 2018
Jan 4, 2018

Repository files navigation

lstm_language_model

a pytorch version lstm language model, support class-based softmax (Following the paper) and NCE (noise contrasitve estimation, following the paper], and thanks Stonesjtu's amazing project) for speeding up .

Theoretical Analysis

Class-based Softmax

In class-based softmax, each word is assigned to one class, hence the probability of a word become:

Theoretically, the computational cost can be reduced from O(dk) to O(d\sqrt{k}), where d is the size of last hidden layer and k is the size of vocabulary.

But in pratice, there are too many overhead (especially in GPU).

NCE

NCE transfers the probability estimation problem into a binary classification problem. In NCE, we have a noise distributiona and our goal is to train a model to differentiate the target word from noise. The biggest trick in NCE is that, we treat the probability normalization term as a constant, which saves a lots of time for both training and testing.

Usage

Before training the model, please run the following script to build a vocab with class:

python build_vocab_with_class.py --ncls 30 --min_count 0

The vocab built above is based on the frequence, you can also build your own vocab using other methods. (see example in ./data/penn/vocab.c.txt, Notice that the class should be a integer.)

Run training script:

python train.py --cuda --data [data_path] --decoder [sm|nce|cls]

File Structure

  • data/: corpus dictionary
  • params/: save the parameters
  • data.py: custom data iter and dictionary
  • model.py: the basic rnn model
  • decoder.py: the decoder layers (softmax, class-based softmax and NCE)
  • train.py: the training process
  • utils.py: utilize functions

Performance

Experiments on swb corpus (6W vocab):

epoch average training time:

softmax: 1061s

nce: 471s

class-based softmax: 465s

experiments on swb (6W vocab)

Releases

No releases published

Packages

No packages published

Languages