BERTKNP is a Japanese dependency parser based on BERT. BERTKNP achieves higher dependency accuracy than KNP by four points.
Install the following tools beforehand.
- Install the dependencies in a python virtual environment.
$ pip install repo/pytorch-pretrained-bert-parsing
- Download and install BERT and BERTKNP models.
Instead of this script, you can do the following commands step by step.
$ ./download_and_install_models.sh
$ wget 'http://nlp.ist.i.kyoto-u.ac.jp/DLcounter/lime.cgi?down=http://lotus.kuee.kyoto-u.ac.jp/nl-resource/JapaneseBertPretrainedModel/Japanese_L-12_H-768_A-12_E-30_BPE_WWM_transformers.zip' -O Japanese_L-12_H-768_A-12_E-30_BPE_WWM_transformers.zip $ unzip Japanese_L-12_H-768_A-12_E-30_BPE_WWM_transformers.zip $ ln -s Japanese_L-12_H-768_A-12_E-30_BPE_WWM_transformers pretrained_model $ ( cd pretrained_model && ln -s config.json bert_config.json ) $ wget http://lotus.kuee.kyoto-u.ac.jp/nl-resource/bertknp/model/bertknp-model-20190901.tar.gz -O - | tar xzv
$ echo "昨日訪れた下鴨神社の参道はかなり暗かった。" | jumanpp -s 1 | bin/bertknp
- By default, a dependency tree is output. If you need detailed information, use ``the
-tab
option in the same way as KNP. - The python in your PATH is used. If you want to use the python in your virtual environment, specify by
-p [python path]
. - You can use a CPU or a GPU. If you use a GPU and have a limited GPU memory, specify multiple GPUs as follows:
$ export CUDA_VISIBLE_DEVICES="0,1"
You can use BERTKNP from pyknp just like using KNP from pyknp.
from pyknp import KNP
knp = KNP('path/to/bin/bertknp', option='-p /path/to/venv/for/bertknp/bin/python -tab -pyknp', jumanoption='-s 1')
knp.parse('昨日訪れた下鴨神社の参道はかなり暗かった。') # returns pyknp.BList
You can modify dependency labels assigned by KNP. Tag and bnst segmentation is kept.
$ echo "昨日訪れた下鴨神社の参道はかなり暗かった。" | jumanpp | knp -tab > parsed.knp
$ cat parsed.knp | bin/bertknp -f knp
柴田知秀, 河原大輔, 黒橋禎夫: BERTによる日本語構文解析の精度向上, 言語処理学会 第25回年次大会, pp.205-208, 名古屋, (2019.3). http://www.anlp.jp/proceedings/annual_meeting/2019/pdf_dir/F2-4.pdf