A PyTorch implementation of punctuation prediction system using LSTM/BLSTM [1][2][3], which automatically adds suitable punctation into text without punctuation.
- PyTorch 0.4+
egs/toy/run.sh
provides an example usage.
# Set PATH and PYTHONPATH
$ cd egs/toy/; . ./path.sh
# Train
$ train.py -h
# Add punctuation
$ add_punctuation.py -h
# Analyze metrics
$ analyer.py -h
If you want to visualize your loss, you can use visdom to do that:
- Open a new terminal in your remote server (recommend tmux) and run
$ visdom
. - Open a new terminal and run
$ train.py ... --visdom 1 --vidsdom_id "<any-string>"
. - Open your browser and type
<your-remote-server-ip>:8097
, egs,127.0.0.1:8097
. - In visdom website, chose
<any-string>
inEnvironment
to see your loss.
$ train.py --continue_from <model-path>
Use comma separated gpu-id sequence, such as:
$ CUDA_VISIBLE_DEVICES="0,1" train.py
- [1] Kaituo Xu, Lei Xie, and Kaisheng Yao. "Investigating LSTM for punctuation prediction" in ISCSLP 2016
- [2] Ottokar Tilk and Tanel Alumae. "Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration" in Interspeech 2016
- [3] Ottokar Tilk and Tanel Alumae. "LSTM for Punctuation Restoration in Speech Transcripts" in Interspeech 2015