Releases: buriy/spacy-ru
Releases · buriy/spacy-ru
spaCy 2.3 models
Models included in this release:
ru2_nerus_800ks_96
- width=96 (for CPU and GPU **)
- POS score: 87,9
- DEP score: 87,1
- NER score: 95,3
- trained on Nerus
- LICENSE: MIT
Itn Tag Loss Tag % Dep Loss UAS LAS
20 612196.679 91.566 2285020.336 91.676 85.352
ru2_combined_400ks_96 *
- width=96 (for CPU and GPU **)
- POS score: 89,2
- DEP score: 87,9
- NER score: 94,73
- LICENSE: CC BY-NC-SA 4.0
Itn Tag Loss Tag % Dep Loss UAS LAS
20 468998.154 92.414 1774568.248 92.134 86.241
ru2_grameval_96
- width=96 (for CPU and GPU **)
- POS score: 89,0
- DEP score: 87,9
- NER score: 0,0
- only POS tagging & DEP parsing !!!,
- LICENSE: CC BY-NC-SA 4.0
Itn Tag Loss Tag % Dep Loss UAS LAS
20 207172.379 93.661 926799.585 94.010 88.752
ru2_grameval_300
- width=300 (for GPU **)
- POS score: 90,0
- DEP score: 91,3
- NER score: 0,0
- only POS tagging & DEP parsing !!!,
- LICENSE: CC BY-NC-SA 4.0
Itn Tag Loss Tag % Dep Loss UAS LAS
20 54762.824 95.291 394716.120 98.595 94.527
Notes:
- All models are based on Navec vectors & pymorphy2 morphology (So we have ~2.5 mln words included in a combined vector model).
- POS and DEP tests are based on the weighted model quality on grameval subsets: score = (3news + 3fiction + wiki + social) / 8.
-
- "combined" dataset = grameval 2020 + a part of Nerus
- ** CPU speed depends on the network width square, so width-300 model compared to width-96 model is about 10x slower on CPU, though GPU speed is almost constant.
width=48: CPU WPS=8000 GPU WPS=12000
width=96: CPU WPS=3600 GPU WPS=12000
width=192: CPU WPS=1300 GPU WPS=10000
width=300: CPU WPS=600 GPU WPS=8000
POS & DEP model for spaCy 2.3 based on SynTagRus and navec
POS & DEP model for spaCy 2.3: POS tagger and DEP (syntax analysis) models, trained on SynTagRus, using Navec vectors & pymorphy2 morphology.
Quality on SynTagRus-test:
POS | 95.31%
DEP UAS | 91.77%
DEP LAS | 89.12%
Accuracy.txt:
Itn Tag Loss Tag % Dep Loss UAS LAS NER Loss NER P NER R NER F Token % CPU WPS GPU WPS
--- --------- -------- --------- ------ ------ --------- ------ ------ ------ ------- ------- -------
30 24154.514 95.310 196988.805 91.777 89.124 0.000 0.000 0.000 0.000 100.000 6001 11902
How to use it: unpack into your project root folder, then
import ru2_syntagrus
ru2_syntagrus.load_ru2('path_to/ru2_syntagrus')
Or you could just use spacy.load('path_to/ru2_syntagrus/')
but then lemmas will be a bit worse.
License: CC BY-NC-SA 4.0 (same as SynTagRus)
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
http://creativecommons.org/licenses/by-nc-sa/4.0/