Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update russian tokenizer exceptions #11753

Merged
merged 2 commits into from
Nov 15, 2022
Merged

Update russian tokenizer exceptions #11753

merged 2 commits into from
Nov 15, 2022

Conversation

ArchiDevil
Copy link
Contributor

Description

Fixed two typos in tokenizer exceptions, added a couple of new abbreviations, removed nonbreaking spaces that may affect normalization in some cases.

Types of change

Enhancement

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@polm polm added lang / ru Russian language data and models feat / tokenizer Feature: Tokenizer labels Nov 7, 2022
@adrianeboyd adrianeboyd added the v3.5 Related to v3.5 label Nov 7, 2022
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
@richardpaulhudson richardpaulhudson merged commit 7e684ad into explosion:master Nov 15, 2022
adrianeboyd added a commit to adrianeboyd/spaCy that referenced this pull request Nov 15, 2022
* Fix typos, add couple of new abbreviations, remove nonbreaking spaces

* Remove space from abbreviation

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
@adrianeboyd adrianeboyd mentioned this pull request Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tokenizer Feature: Tokenizer lang / ru Russian language data and models v3.5 Related to v3.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants