-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore comment lines in read_csv parsing #7582
Conversation
@AmrAS1 pls squash, otherwise looks ok |
|
||
|
||
|
||
- The file parsers `read_csv` and `read_table` now ignore line comments provided by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use double backtck quotes? (read_csv
) for the functions read_csv and read_table (for parameter names single is OK)
not familiar with the c parser code, so can't comment on that, but added some doc comments. @AmrAS1 You updated the io.rst explanation of the parameters, but I think you forgot to also update the parameter explanations in the docstring (it was in your previous PR I think, so probably just have to copy it) |
@jorisvandenbossche - Oops, I'll update that now. Let me know if there are any other doc problems. |
looking good what the docs is concerned. Maybe you could also add a test for the combination of using comments with skiprows and header? (or are there already?) |
Ok, updated with a skiprows/header test |
[5., np.nan, 10.]] | ||
# skiprows should skip the first 3 (commented) lines, while | ||
# header should start from the first non-commented line | ||
df = self.read_csv(StringIO(data), comment='#', skiprows=3, header=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you also test it seperate? Because by providing skiprows, you don't test if header=1
is skipping the commented lines automatically (only testing that skiprows does remove also the commented lines)? So for example also read_csv(StringIO(data), comment='#', header=1)
should yield the same result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I think I meant to have a row between X,Y,Z
and A,B,C
with skiprows=4
. I'll fix that and add a couple separate tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thing you caught this--I just found that PythonParser
wasn't behaving as expected! I currently have it adjusting self.line_pos
when passing through skiprows
, must've put that in before I decided on the correct functionality. I'll fix this.
commit 5e9e0fa29d727953583a116638a9d0db81f9ed21 Author: Michael Mueller <michaeldmueller7@gmail.com> Date: Thu Jun 26 19:53:35 2014 -0400 Fixed issue with empty lines commit 57b54918b251ab77f000b575d77bcce3affcb27a Author: Michael Mueller <michaeldmueller7@gmail.com> Date: Thu Jun 26 16:31:27 2014 -0400 Added reference to new functionality in docs commit a2371638691584416439d3c6a4dd2ef1829dcbe3 Author: Michael Mueller <michaeldmueller7@gmail.com> Date: Thu Jun 26 16:26:06 2014 -0400 Implemented functionality to ignore comment lines, wrote a test
Ignore comment lines in read_csv parsing
@AmrAS1 thanks! |
@jreback - You're welcome! |
#7470 can do for 0.15.0 as its an API change |
give a check on the docs when they are built (shortly): http://pandas-docs.github.io/pandas-docs-travis/io.html#comments and verify how it looks..thxs |
Sure thing |
Weird...there's one section in which ipython outputs this:
The rest of the output looks fine. |
Indeed strange. I suppose this is an issue with the processing in the ipython directive. If you want to look into it, the code is in https://github.com/pydata/pandas/tree/master/doc/sphinxext/ipython_sphinxext (but which is a copy from ipython itself), or you can open an issue at ipython. |
This is the first part of #7470.
closes #2685