-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_vectors should accept arbitrary space characters as word tokens #834
Labels
bug
Bugs and behaviour differing from documentation
Comments
8 tasks
ines
added a commit
that referenced
this issue
Feb 16, 2017
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This bug is similar to #737. Some space characters (such as the NO-BREAK SPACE U+00A0) are currently not detected as space by the load_vectors function, and the loading fails if the word vector files has such strings as token.
I fixed the bug by detecting the
re.compile(r'\s')
expression at the beginning of each line, instead of ' '.I submitted a PR (#836).
Your Environment
The text was updated successfully, but these errors were encountered: