Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore comment lines in read_csv parsing #7582

Merged
merged 1 commit into from
Jun 30, 2014

Conversation

mdmueller
Copy link
Contributor

This is the first part of #7470.
closes #2685

@jreback jreback added this to the 0.14.1 milestone Jun 27, 2014
@jreback
Copy link
Contributor

jreback commented Jun 27, 2014

@AmrAS1 pls squash, otherwise looks ok

@cpcloud @jorisvandenbossche ?




- The file parsers `read_csv` and `read_table` now ignore line comments provided by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use double backtck quotes? (read_csv) for the functions read_csv and read_table (for parameter names single is OK)

@jorisvandenbossche
Copy link
Member

not familiar with the c parser code, so can't comment on that, but added some doc comments.

@AmrAS1 You updated the io.rst explanation of the parameters, but I think you forgot to also update the parameter explanations in the docstring (it was in your previous PR I think, so probably just have to copy it)

@mdmueller
Copy link
Contributor Author

@jorisvandenbossche - Oops, I'll update that now. Let me know if there are any other doc problems.

@jorisvandenbossche
Copy link
Member

looking good what the docs is concerned.

Maybe you could also add a test for the combination of using comments with skiprows and header? (or are there already?)

@mdmueller
Copy link
Contributor Author

Ok, updated with a skiprows/header test

[5., np.nan, 10.]]
# skiprows should skip the first 3 (commented) lines, while
# header should start from the first non-commented line
df = self.read_csv(StringIO(data), comment='#', skiprows=3, header=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you also test it seperate? Because by providing skiprows, you don't test if header=1 is skipping the commented lines automatically (only testing that skiprows does remove also the commented lines)? So for example also read_csv(StringIO(data), comment='#', header=1) should yield the same result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think I meant to have a row between X,Y,Z and A,B,C with skiprows=4. I'll fix that and add a couple separate tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thing you caught this--I just found that PythonParser wasn't behaving as expected! I currently have it adjusting self.line_pos when passing through skiprows, must've put that in before I decided on the correct functionality. I'll fix this.

commit 5e9e0fa29d727953583a116638a9d0db81f9ed21
Author: Michael Mueller <michaeldmueller7@gmail.com>
Date:   Thu Jun 26 19:53:35 2014 -0400

    Fixed issue with empty lines

commit 57b54918b251ab77f000b575d77bcce3affcb27a
Author: Michael Mueller <michaeldmueller7@gmail.com>
Date:   Thu Jun 26 16:31:27 2014 -0400

    Added reference to new functionality in docs

commit a2371638691584416439d3c6a4dd2ef1829dcbe3
Author: Michael Mueller <michaeldmueller7@gmail.com>
Date:   Thu Jun 26 16:26:06 2014 -0400

    Implemented functionality to ignore comment lines, wrote a test
@jreback
Copy link
Contributor

jreback commented Jun 28, 2014

@jorisvandenbossche ok?

jreback added a commit that referenced this pull request Jun 30, 2014
Ignore comment lines in read_csv parsing
@jreback jreback merged commit 49a86f1 into pandas-dev:master Jun 30, 2014
@jreback
Copy link
Contributor

jreback commented Jun 30, 2014

@AmrAS1 thanks!

@mdmueller
Copy link
Contributor Author

@jreback - You're welcome!

@mdmueller mdmueller deleted the ignore-comment-lines branch June 30, 2014 19:28
@jreback
Copy link
Contributor

jreback commented Jun 30, 2014

#7470 can do for 0.15.0 as its an API change

@jreback
Copy link
Contributor

jreback commented Jun 30, 2014

give a check on the docs when they are built (shortly): http://pandas-docs.github.io/pandas-docs-travis/io.html#comments and verify how it looks..thxs

@mdmueller
Copy link
Contributor Author

Sure thing

@mdmueller
Copy link
Contributor Author

Weird...there's one section in which ipython outputs this:

In [15]: data = 'a,b,c\n# commented line\n1,2,3\n#another comment\n4,5,6'

In [16]: print(data)
a,b,c
1,2,3
4,5,6

# commented line
#another comment

The rest of the output looks fine.

@jorisvandenbossche
Copy link
Member

Indeed strange. I suppose this is an issue with the processing in the ipython directive. If you want to look into it, the code is in https://github.com/pydata/pandas/tree/master/doc/sphinxext/ipython_sphinxext (but which is a copy from ipython itself), or you can open an issue at ipython.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Option for reading files with a variable number of comment lines at start
3 participants