CONV-SEG

Convolutional neural network for Chinese word segmentation (CWS). The corresponding paper: Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

Author

Chunqi Wang

Dependencies

It is better to use a nvidia GPU to accelerate the training procedure.

Data

Downlaod data.zip from here (Note that the SIGHAN datasets should only be used for research purposes). Extract data.zip to this directory. So the file tree would be:

convseg
|	data
|	|	datasets
|	|	|	sighan2005-pku
|	|	|	|	train.txt
|	|	|	|	dev.txt
|	|	|	|	test.txt
|	|	|	sighan2005-msr
|	|	|	|	train.txt
|	|	|	|	dev.txt
|	|	|	|	test.txt
|	|	embeddings
|	|	|	news_tensite.w2v200
|	|	|	news_tensite.pku.words.w2v50
|	|	|	news_tensite.msr.words.w2v50
|	tagger.py
|	train_cws.py
|	train_cws.sh
|	train_cws_wemb.sh
|	score.perl
|	README.md

How to use

First, give execute permission to scripts:

chmod +x train_cws.sh train_cws_wemb.sh

Train a preliminary model (CONV-SEG):

./train_cws.sh WHICH_DATASET WHICH_GPU

Train a model with word embeddings (WE-CONV-SEG):

./train_cws_wemb.sh WHICH_DATASET WHICH_GPU

We have two optional datasets: pku and msr. If you run the program in CPU environment, just leave the second argument empty.

For example, if you want to train the model CONV-SEG on the pku dataset and on gpu0, you should run:

./train_cws.sh pku 0

More arguments can be set in train.py.

Test Score

Model	PKU(dev)	PKU(test)	MSR(dev)	MSR(test)
CONV-SEG	96.8	95.7	97.2	97.3
WE-CONV-SEG	97.5	96.5	98.1	98.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CONV-SEG

Author

Dependencies

Data

How to use

Test Score

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
LICENSE		LICENSE
README.md		README.md
cws.py		cws.py
score.perl		score.perl
server.py		server.py
tagger.py		tagger.py
test.py		test.py
train.py		train.py
train_cws.sh		train_cws.sh
train_cws_wemb.sh		train_cws_wemb.sh

License

chqiwang/convseg

Folders and files

Latest commit

History

Repository files navigation

CONV-SEG

Author

Dependencies

Data

How to use

Test Score

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages