Skip to content

Old read_csv() & EOF character issue back #16559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gaptekar opened this issue May 31, 2017 · 6 comments
Closed

Old read_csv() & EOF character issue back #16559

gaptekar opened this issue May 31, 2017 · 6 comments
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@gaptekar
Copy link

I recently updated from pandas 0.19.2 to 0.2
I am experiencing the exact same issue as this person did back in 2013
#5500
"Error tokenizing data. C error: EOF inside string starting at line. 140"

Reverting back to 0.19.2 has fixed the issue. Can run the unit test that was created for this problem?
forking-repos@8c4cf85

@jreback
Copy link
Contributor

jreback commented May 31, 2017

can you show a reproducible example.

@gorkemozkaya
Copy link

gorkemozkaya commented Jun 7, 2017

I'm having the same problem. Python 3, pandas 0.20.2

reproducible example:

import pandas as pd

with open('test.csv', 'wb') as fout:
    fout.write(b'c1,c2\r\n"test \x1a    test", test\r\n')

pd.read_csv('test.csv')

#ParserError: Error tokenizing data. C error: EOF inside string starting at line 1

@jorisvandenbossche
Copy link
Member

Is this a windows issue? (cannot reproduce your example above on linux)

@gorkemozkaya
Copy link

Yes, it happened on a 64 bit Windows.

@jreback
Copy link
Contributor

jreback commented Jun 9, 2017

@gfyoung can you verfiy / see if you can fix?

@jreback jreback added this to the Next Major Release milestone Jun 9, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
pandas-devgh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes pandas-devgh-16559.
@gfyoung
Copy link
Member

gfyoung commented Jun 11, 2017

@jreback : Confirmed! git bisection reveals that #16039 is the culprit. PR coming soon.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.3, Next Major Release Jun 11, 2017
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Jun 11, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
pandas-devgh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes pandas-devgh-16559.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
pandas-devgh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes pandas-devgh-16559.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2017
pandas-devgh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes pandas-devgh-16559.
jreback pushed a commit that referenced this issue Jun 11, 2017
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Jul 6, 2017
pandas-devgh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes pandas-devgh-16559.

(cherry picked from commit c550372)
TomAugspurger pushed a commit that referenced this issue Jul 7, 2017
gh-16039 created a bug in which files containing
byte-like data could break, as EOF characters mid-field
(despite being quoted) would cause premature line breaks.

Given that this PR was a performance patch, this
commit can be safely reverted.

Closes gh-16559.

(cherry picked from commit c550372)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

5 participants