Python binding for J.DepP(C++ implementation of Japanese Dependency Parsers) https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/
$ python -m pip install jdepp
pip install does not install the model(dictionary).
You can get precompiled model files(MeCab POS tagging + train with KNBC copus) from
https://github.com/lighttransport/jdepp-python/releases/tag/v0.1.0
Precompiled KNBC model file is licensed under 3-clause BSD license.
- MeCab style POS format:
FEATURE_SEP ','
- See
jdepp/typedf.h
for more info about ifdef macros.
Download precompiled model file.
$ wget https://github.com/lighttransport/jdepp-python/releases/download/v0.1.0/knbc-mecab-jumandic-2ndpoly.tar.gz
$ tar xvf knbc-mecab-jumandic-2ndpoly.tar.gz
import jdepp
model_path = "model/knbc"
parser = jdepp.Jdepp()
parser.load_model(model_path)
# NOTE: Mecab format: surface + TAB + feature(comma separated 7 fields)
input_postagged = """吾輩 名詞,普通名詞,*,*,吾輩,わがはい,代表表記:我が輩/わがはい カテゴリ:人
は 助詞,副助詞,*,*,は,は,*
猫 名詞,普通名詞,*,*,猫,ねこ,*
である 判定詞,*,判定詞,デアル列基本形,だ,である,*
。 特殊,句点,*,*,。,。,*
名前 名詞,普通名詞,*,*,名前,なまえ,*
は 助詞,副助詞,*,*,は,は,*
まだ 副詞,*,*,*,まだ,まだ,*
ない 形容詞,*,イ形容詞アウオ段,基本形,ない,ない,*
。 特殊,句点,*,*,。,。,*
EOS
"""
sent = parser.parse_from_postagged(input_postagged)
print(sent)
print(jdepp.to_tree(str(sent)))
# S-ID: 1; J.DepP
0: 吾輩は━━┓
1: 猫である。━━┓
2: 名前は━━┫
3: まだ━━┫
4: ない。EOS
jdepp.to_dot
is provided to export graph as dot(Graphviz)
dot_text = jdepp.to_dot(str(sentence))
# feed output text to graphviz viewer, e.g. https://dreampuf.github.io/GraphvizOnline/
See examples/ for more details
MeCab style. surface + TAB + feature(comma separated 7 fields)
You can use jagger-python for POS tagging.
import jagger
import jdepp
jagger_model_path = "model/kwdlc/patterns"
tokenizer = jagger.Jagger()
tokenizer.load_model(jagger_model_path)
text = "吾輩は猫である。名前はまだない。"
toks = tokenizer.tokenize(text)
pos_tagged_input = ""
for tok in toks:
pos_tagged_input += tok.surface() + '\t' + tok.feature() + '\n'
pos_tagged_input += "EOS\n"
jdepp_model_path = "model/knbc"
parser.load_model(jdepp_model_path)
parser.parse_from_postagged(pos_tagged_input)
If you just want to use J.DepP from cli(e.g. batch processing), you can build a standalone C++ app using CMake.
We modified J.DepP source code to improve portablily(e.g. Ours works well on Windows)
Training a model from Python binding is also not yet supported. For a while, you can train a model by using standalone C++ jdepp app.
This is for developer usecase. Use setup.py(pyproject.toml) to build python module for end users.
Install pybind11 devkit.
$ python -m pip install pybind11
Then invoke cmake with -DJDEPP_WITH_PYTHON
and pybind11_DIR
$ pybind11_DIR=/path/to/pybind11 cmake -DJDEPP_WITH_PYTHON=1 ...
- tag it:
git tag vX.Y.Z
- push tag:
git push --tags
Versioning is automatically done through setuptools_scm
- WASM build
- Training API support
- Integrate jagger POS tagger as builtin(standalone) POS tagger in J.DepP
- MMap(or SharedMemory) load of dict data to save memory in Python multiprocessing
jdepp-python is licensed under 2-Clause BSD license.
J.DepP https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/ is licensed under GPLv2/LGPLv2.1/BSD triple license.
- pacco, cedar, opal(subcompoennts of J.DepP): GPLv2/LGPLv2.1/BSD triple license. We choose BSD license.
- io-util: MIT license.
- optparse: Unlicense https://github.com/skeeto/optparse