Skip to content

skipfooter doesn't really "skip" in read_csv #13879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gfyoung opened this issue Aug 2, 2016 · 2 comments
Closed

skipfooter doesn't really "skip" in read_csv #13879

gfyoung opened this issue Aug 2, 2016 · 2 comments
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@gfyoung
Copy link
Member

gfyoung commented Aug 2, 2016

On master:

from pandas import read_csv
from pandas.compat import StringIO
data = 'a,b,c\ncat,foo,bar\ndog,foo,"baz'  # Note the stray quotation mark
read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: unexpected end of data

If we were truly "skipping" the last row, no error should have been raised. However, this occurs because the data is all parsed in memory first with Python's csv library.

Whether this is intended behaviour or not has implications for the C engine in terms of implementing analogous skipfooter behaviour. Or perhaps it has something to do with the fact that error_bad_lines and error_warn_lines parameters not with the Python engine?

xref #5232

@jreback jreback added API Design IO CSV read_csv, to_csv labels Aug 2, 2016
@jreback jreback added this to the Next Major Release milestone Aug 2, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 26, 2016
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes pandas-devgh-13879.
@jorisvandenbossche
Copy link
Member

If this feature would be implemented in the C engine, I would expect it to work in this case, so that the skipped lines need not to parse correctly. But I am not sure if this is actually possible?

Questions on how to treat quotations marks (are they parsed or not to determine the number of lines to skip .. ?) similar as those recent issues about skiprows will also come up. So for this to be consistent, they maybe need to get parsed to some extent?

@gfyoung
Copy link
Member Author

gfyoung commented Nov 26, 2016

@jorisvandenbossche : You are correct. This code should not break, though whether it's possible is another story, as some parsing might be needed. But in any case, not sure yet how to implement for the C engine, though that can be dealt with separately from this issue.

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2016
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes pandas-devgh-13879.
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 28, 2016
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes pandas-devgh-13879.
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 29, 2016
…gine

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes pandas-devgh-13879.
jorisvandenbossche pushed a commit that referenced this issue Nov 29, 2016
…gine (#14749)

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
@jorisvandenbossche jorisvandenbossche added Error Reporting Incorrect or improved errors from pandas and removed API Design labels Nov 29, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.2, Next Major Release Nov 29, 2016
jorisvandenbossche pushed a commit that referenced this issue Dec 15, 2016
… rows in Python engine (#14749)

Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.

Closes gh-13879.
(cherry picked from commit dfeae39)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants