-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-vocabulary items tagged as personal pronouns #753
Comments
This really sounds like a bug. The statistical model's behaviour shouldn't be changing. |
I confirm that prepositions also crop up, e.g., “I feel overexcited” -> ‘overexcited’ tagged as “IN” |
Closing this and making #1057 the master issue – work in progress for spaCy v2.0! |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Spacy (1.6.0) has a tendency to tag unknown words as pronouns.
For instance: “feels hot/feverish” is tokenised as two tokens and ‘hot/feverish’ is tagged as PRP.
Personal pronouns are closed-class words and it is very unlikely that any new personal pronoun will get introduced to the language, so it would be an improvement if the statistical model was somehow tamed not to produce these eagerly. Probably the same applies to other closed classes, for instance prepositions.
The text was updated successfully, but these errors were encountered: