Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra space within doc.text method #859

Closed
rulai-huajunzeng opened this issue Feb 25, 2017 · 5 comments
Closed

Extra space within doc.text method #859

rulai-huajunzeng opened this issue Feb 25, 2017 · 5 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@rulai-huajunzeng
Copy link

rulai-huajunzeng commented Feb 25, 2017

An unexpected space was added within the doc.text.
code as below:

>>> nlp=spacy.load('en', add_vectors=False)
>>> nlp(u"aaabbb@ccc.com\nThank you!").text
u'aaabbb@ccc.com \nThank you!'

Note that there is a space before \n which was not there in the input. This caused incorrect position of tokens when I tried to map to original text

Your Environment

  • Operating System: Ubuntu
  • Python Version Used: Python 2.7.12 :: Anaconda 4.1.1 (64-bit)
  • spaCy Version Used: 1.6.0
  • Environment Information:
@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Mar 1, 2017
@honnibal
Copy link
Member

honnibal commented Mar 1, 2017

Thanks for the report -- bad bug. Will try to get this fixed shortly. In the meantime I think the error relates to the recent URL regex changes.

ines added a commit that referenced this issue Mar 1, 2017
ines added a commit that referenced this issue Mar 1, 2017
@oroszgy
Copy link
Contributor

oroszgy commented Mar 2, 2017

Recently \n keeps being a token which is a bit strange for me.
See this:

>>>repr(nlp(u"Hi\nThank you!")[1])
'\n'

@honnibal
Copy link
Member

honnibal commented Mar 2, 2017

@oroszgy Non-trivial whitespace has always been a token in spaCy -- I figured you'd have noticed this before. Is something new here that I'm not understanding?

@oroszgy
Copy link
Contributor

oroszgy commented Mar 3, 2017

Hm, interesting, somehow I did not noticed this behaviour before. I guess my previous comment is useless in this case.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

3 participants