Stanford Tagger and NN Dependency Parser Models for Russian Language
-
Clone CoreNLP from the project repository.
-
Download resources for lemmatization 'dict.tsv', tagger and parser models using links in section 'CoreNLPRusModels' above.
-
Build the project and run the Launcher (edu.stanford.nlp.international.russian.process.Launcher).
Obligatory Launcher parameters are the following:
-tagger
- filepath to POS-tagging model russian-ud-pos.tagger;-taggerMF
- filepath to POS-tagging model russian-ud-mf.tagger, which outputs POS-tags with inflectional morphological features (according to UD v.2), and these morpho features are reused by the parsing model;-mf
- if this flag is True, inflectional morphology is written to the FEATS field of the CoNLL annotations;-parser
- dependency parser model, inventory of syntactic relations meets UD v.2, better start with the model nndep.rus.modelMFWiki100HS400_80.txt.gz, which uses embeddings, trained on Wikipedia dump;-pLemmaDict
- filepath to dict.tsv, preferrably to put it to /CoreNLP/src/edu/stanford/nlp/international/russian/process directory;-pText
- filepath to input file, encoding = UTF-8; /home/filepath/input_file.txt-pResults
- filepath to output file '.conll', format = CoNLL-U.
- Running from console example:
java -Xmx8g edu.stanford.nlp.international.russian.process.Launcher -tagger russian-ud-pos.tagger -taggerMF russian-ud-mf.tagger -pLemmaDict src/edu/stanford/nlp/international/russian/process/dict.tsv -parser nndep.rus.modelMFWiki100HS400_80.txt.gz -pText input.txt -pResults output.conll -mf
- Java 1.8
- allocate at less 5 Gb for JVM: -Xmx5g
- input file encoding: UTF-8
If you find the pipeline useful in your research, please consider citing our paper:
@inproceedings{DBLP:conf/kesw/KovriguinaSSP17,
author = {Liubov Kovriguina and
Ivan Shilin and
Alexander Shipilo and
Alina Putintseva},
title = {Russian Tagging and Dependency Parsing Models for Stanford CoreNLP
Natural Language Toolkit},
booktitle = {Knowledge Engineering and Semantic Web - 8th International Conference,
{KESW} 2017, Szczecin, Poland, November 8-10, 2017, Proceedings},
pages = {101--111},
year = {2017},
doi = {10.1007/978-3-319-69548-8\_8}
}