-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different POS tags for same sentence repeated in paragraph #954
Comments
The POS tagger has always been document level -- it's the parser that decides the sentence boundaries (relying heavily on POS tag features). This specific example is interesting though. I wouldn't have predicted this, especially across the previous model as well. |
OK cool thanks for the reply. |
Closing this and making #1057 the master issue – work in progress for spaCy v2.0! |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I am seeing odd behavior with regards to fine-grained POS tags for a text with identical repeated sentences: 'The cactus also bears fruit. The cactus also bears fruit.' For the first sentence, the 'cactus' token is tagged as NN, whereas in the second sentence, it is NNS. If you take away the 'also' in the second sentence, the tag is correctly 'NN'. I had assumed that POS tagging was done at the sentence level of analysis, so I'm curious why this is happening. Thanks!
for t in sent:
print t, t.tag_, t.dep_
The DT det
cactus NN nsubj
also RB advmod
bears VBZ ROOT
fruit NN dobj
. . punct
The DT det
cactus NNS nsubj
also RB advmod
bears VBZ ROOT
fruit NN dobj
. . punct
Your Environment
spaCy 1.6
spyder 3.0.2
Error also replicates on displacy.
The text was updated successfully, but these errors were encountered: