Add descriptive information for parts of speech tag #1034

kracekumar · 2017-05-02T11:18:35Z

I am following a tutorial about spaCy. The tutorial focuses on part of speech tagging, the code snippet outputs jargons like PROPN, ADP. It's hard to judge what this means. It will be useful to have descriptive details about the names.

The developer can later look up somewhere

>get_descprition(token.pos_)
 'Proper noun <and some more info>

The text was updated successfully, but these errors were encountered:

ines · 2017-05-03T12:10:34Z

This is a nice idea!

I think the best way to do this would be to add a module spacy.explain that checks strings (and possibly symbols) against a "glossary" dictionary and returns the description. This would let you do something like:

print(token.pos_, spacy.explain(token.pos_))
# PROPN Proper noun

I'll play around with this and test it – should be easy to get something simple together for the next release and then keep updating it over time.

It's tempting to also extend this beyond strings and let users pass in objects as well... but I'm not sure if this is really a good idea. (For example, if you pass in an instance of Tokenizer, what should spacy.explain return that's not already covered by help()?)

ines · 2017-05-03T15:12:46Z

Just added spacy.glossary and the update will be included in the next release. The glossary currently has explanations for the English and German POS tagging and dependency labelling scheme, plus the available entity labels.

You can either import explain() from spacy.glossary, or call spacy.explain(), for example:

spacy.explain('NORP')
# Nationalities or religious or political groups

doc = nlp(u'Hello world')
for word in doc:
    print(word.text, word.tag_, spacy.explain(word.tag_))
# Hello UH interjection
# world NN noun, singular or mass

Note that the function always expects the string representation of a tag or label. If a term can't be found it the glossary, explain currently returns nothing (this seemed better than raising a KeyError or something, because there might still be some terms that are missing).

kracekumar · 2017-05-03T17:18:54Z

That was super fast! Thank you @ines

lock · 2018-05-08T22:38:51Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added the enhancement Feature requests and improvements label May 3, 2017

ines closed this as completed in a04b5be May 3, 2017

lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add descriptive information for parts of speech tag #1034

Add descriptive information for parts of speech tag #1034

kracekumar commented May 2, 2017

ines commented May 3, 2017

ines commented May 3, 2017 •

edited

Loading

kracekumar commented May 3, 2017

lock bot commented May 8, 2018

Add descriptive information for parts of speech tag #1034

Add descriptive information for parts of speech tag #1034

Comments

kracekumar commented May 2, 2017

ines commented May 3, 2017

ines commented May 3, 2017 • edited Loading

kracekumar commented May 3, 2017

lock bot commented May 8, 2018

ines commented May 3, 2017 •

edited

Loading