Here, you find trained models ready to be used with :mod:`nlpnet`. Model files can be decompressed anywhere, and when using :mod:`nlpnet`, the path to it must be supplied (using the --data
argument in the nlpnet-tag
script or the nlpnet.set_data_dir
function).
If you have trained :mod:`nlpnet` to perform any task in another language, please enter in contact and we add provide a link to your models.
Note
This is only useful for training new models. If you want to use a pre-trained POS or SRL model, you don't need the embeddings.
These word embeddings can be used to train new :mod:`nlpnet` models (check the :ref:`training` Section for details on how to use them). The archive contains a vocabulary file and an embeddings file. The latter is a NumPy matrix whose i-th row corresponds to the vector representation of the i-th word in the vocabulary. The embeddings were obtained applying word2embeddings over a corpus of around 240 million tokens, composed of the Portuguese Wikipedia and news articles.
Performance: 97.33% token accuracy, 93.66% out-of-vocabulary token accuracy (evaluated on the revised Mac-Morpho test section)
This SRL model doesn't use any feature besides word vectors. You can use it without a parser or a chunker. However, due to the small size of PropBank-Br, its performance is lower than what SENNA obtains for English.
Performance: 66.19% precision, 59.78% recall, 62.82 F-1 (evaluated on PropBank-Br test section)
This dependency parser includes a POS tagger. Performance is unfortunately still below state-of-the-art.
Performance: 91.5% unlabeled attachment score (UAS), 89.1% labeled attachment score (LAS) (evaluated on the Penn Treebank)