Skip to content

Latest commit

 

History

History
52 lines (30 loc) · 2.37 KB

models.rst

File metadata and controls

52 lines (30 loc) · 2.37 KB

Trained Models

Here, you find trained models ready to be used with :mod:`nlpnet`. Model files can be decompressed anywhere, and when using :mod:`nlpnet`, the path to it must be supplied (using the --data argument in the nlpnet-tag script or the nlpnet.set_data_dir function).

If you have trained :mod:`nlpnet` to perform any task in another language, please enter in contact and we add provide a link to your models.

Word embeddings

Note

This is only useful for training new models. If you want to use a pre-trained POS or SRL model, you don't need the embeddings.

These word embeddings can be used to train new :mod:`nlpnet` models (check the :ref:`training` Section for details on how to use them). The archive contains a vocabulary file and an embeddings file. The latter is a NumPy matrix whose i-th row corresponds to the vector representation of the i-th word in the vocabulary. The embeddings were obtained applying word2embeddings over a corpus of around 240 million tokens, composed of the Portuguese Wikipedia and news articles.

State-of-the-art POS tagger

Performance: 97.33% token accuracy, 93.66% out-of-vocabulary token accuracy (evaluated on the revised Mac-Morpho test section)

Semantic Role Labeling model

This SRL model doesn't use any feature besides word vectors. You can use it without a parser or a chunker. However, due to the small size of PropBank-Br, its performance is lower than what SENNA obtains for English.

Performance: 66.19% precision, 59.78% recall, 62.82 F-1 (evaluated on PropBank-Br test section)

Dependency Parser model

This dependency parser includes a POS tagger. Performance is unfortunately still below state-of-the-art.

Performance: 91.5% unlabeled attachment score (UAS), 89.1% labeled attachment score (LAS) (evaluated on the Penn Treebank)