Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💫 German lexical probabilities incorrect for some punctuation #725

Closed
ines opened this issue Jan 8, 2017 · 2 comments
Closed

💫 German lexical probabilities incorrect for some punctuation #725

ines opened this issue Jan 8, 2017 · 2 comments
Labels
lang / de German language data and models models Issues related to the statistical models 🌙 nightly Discussion and contributions related to nightly builds

Comments

@ines
Copy link
Member

ines commented Jan 8, 2017

The current German lexicon has several strange results relating to punctuation, likely due to tokenization problems when the frequency counts were first created. This needs to be fixed in the next data release.

Examples

>>> nlp_de.vocab[u'.'].prob
-17.746963500976562
>>> nlp_de.vocab[u','].prob
-17.746963500976562
>>> nlp_de.vocab[u';'].prob
-7.240050315856934
>>> nlp_de.vocab[u'das'].prob
-4.596080303192139
>>> nlp_de.vocab[u'und'].prob
-3.6732823848724365

Related issues: #519, #611

@ines ines added performance 🌙 nightly Discussion and contributions related to nightly builds labels Jan 8, 2017
@ines ines added the lang / de German language data and models label Jan 8, 2017
@ines ines added docs Documentation and website models Issues related to the statistical models and removed docs Documentation and website models Issues related to the statistical models labels May 13, 2017
@ines
Copy link
Member Author

ines commented May 13, 2017

Closing this and making #1057 the master issue – work in progress for spaCy v2.0!

@ines ines closed this as completed May 13, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lang / de German language data and models models Issues related to the statistical models 🌙 nightly Discussion and contributions related to nightly builds
Projects
None yet
Development

No branches or pull requests

1 participant