Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_vectors should accept arbitrary space characters as word tokens #834

Closed
raphael0202 opened this issue Feb 16, 2017 · 1 comment
Closed
Labels
bug Bugs and behaviour differing from documentation

Comments

@raphael0202
Copy link
Contributor

raphael0202 commented Feb 16, 2017

This bug is similar to #737. Some space characters (such as the NO-BREAK SPACE U+00A0) are currently not detected as space by the load_vectors function, and the loading fails if the word vector files has such strings as token.
I fixed the bug by detecting the re.compile(r'\s') expression at the beginning of each line, instead of ' '.
I submitted a PR (#836).

Your Environment

  • Operating System: Ubuntu 16.04
  • Python Version Used: 3.0.5
  • spaCy Version Used: HEAD
@ines ines added the bug Bugs and behaviour differing from documentation label Feb 16, 2017
@ines ines closed this as completed in 7d8c9ee Feb 16, 2017
ines added a commit that referenced this issue Feb 16, 2017
ines added a commit that referenced this issue Feb 16, 2017
ines added a commit that referenced this issue Feb 16, 2017
@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants