Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add descriptive information for parts of speech tag #1034

Closed
kracekumar opened this issue May 2, 2017 · 4 comments
Closed

Add descriptive information for parts of speech tag #1034

kracekumar opened this issue May 2, 2017 · 4 comments
Labels
enhancement Feature requests and improvements

Comments

@kracekumar
Copy link

I am following a tutorial about spaCy. The tutorial focuses on part of speech tagging, the code snippet outputs jargons like PROPN, ADP. It's hard to judge what this means. It will be useful to have descriptive details about the names.

The developer can later look up somewhere

>get_descprition(token.pos_)
 'Proper noun <and some more info>
@ines ines added the enhancement Feature requests and improvements label May 3, 2017
@ines
Copy link
Member

ines commented May 3, 2017

This is a nice idea!

I think the best way to do this would be to add a module spacy.explain that checks strings (and possibly symbols) against a "glossary" dictionary and returns the description. This would let you do something like:

print(token.pos_, spacy.explain(token.pos_))
# PROPN Proper noun

I'll play around with this and test it – should be easy to get something simple together for the next release and then keep updating it over time.

It's tempting to also extend this beyond strings and let users pass in objects as well... but I'm not sure if this is really a good idea. (For example, if you pass in an instance of Tokenizer, what should spacy.explain return that's not already covered by help()?)

@ines ines closed this as completed in a04b5be May 3, 2017
@ines
Copy link
Member

ines commented May 3, 2017

Just added spacy.glossary and the update will be included in the next release. The glossary currently has explanations for the English and German POS tagging and dependency labelling scheme, plus the available entity labels.

You can either import explain() from spacy.glossary, or call spacy.explain(), for example:

spacy.explain('NORP')
# Nationalities or religious or political groups

doc = nlp(u'Hello world')
for word in doc:
    print(word.text, word.tag_, spacy.explain(word.tag_))
# Hello UH interjection
# world NN noun, singular or mass

Note that the function always expects the string representation of a tag or label. If a term can't be found it the glossary, explain currently returns nothing (this seemed better than raising a KeyError or something, because there might still be some terms that are missing).

@kracekumar
Copy link
Author

That was super fast! Thank you @ines

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements
Projects
None yet
Development

No branches or pull requests

2 participants