Ignore comment lines in read_csv parsing #7582

mdmueller · 2014-06-26T20:32:33Z

This is the first part of #7470.
closes #2685

jreback · 2014-06-27T12:23:45Z

@AmrAS1 pls squash, otherwise looks ok

jorisvandenbossche · 2014-06-27T14:09:25Z

doc/source/v0.14.1.txt

-
-
-
+- The file parsers `read_csv` and `read_table` now ignore line comments provided by


can you use double backtck quotes? (read_csv) for the functions read_csv and read_table (for parameter names single is OK)

jorisvandenbossche · 2014-06-27T14:17:00Z

not familiar with the c parser code, so can't comment on that, but added some doc comments.

@AmrAS1 You updated the io.rst explanation of the parameters, but I think you forgot to also update the parameter explanations in the docstring (it was in your previous PR I think, so probably just have to copy it)

mdmueller · 2014-06-27T14:21:39Z

@jorisvandenbossche - Oops, I'll update that now. Let me know if there are any other doc problems.

jorisvandenbossche · 2014-06-27T14:30:18Z

looking good what the docs is concerned.

Maybe you could also add a test for the combination of using comments with skiprows and header? (or are there already?)

mdmueller · 2014-06-27T14:38:08Z

Ok, updated with a skiprows/header test

jorisvandenbossche · 2014-06-27T15:22:22Z

pandas/io/tests/test_parsers.py

+                    [5., np.nan, 10.]]
+        # skiprows should skip the first 3 (commented) lines, while
+        # header should start from the first non-commented line
+        df = self.read_csv(StringIO(data), comment='#', skiprows=3, header=1)


Shouldn't you also test it seperate? Because by providing skiprows, you don't test if header=1 is skipping the commented lines automatically (only testing that skiprows does remove also the commented lines)? So for example also read_csv(StringIO(data), comment='#', header=1) should yield the same result?

Oops, I think I meant to have a row between X,Y,Z and A,B,C with skiprows=4. I'll fix that and add a couple separate tests.

Good thing you caught this--I just found that PythonParser wasn't behaving as expected! I currently have it adjusting self.line_pos when passing through skiprows, must've put that in before I decided on the correct functionality. I'll fix this.

commit 5e9e0fa29d727953583a116638a9d0db81f9ed21 Author: Michael Mueller <michaeldmueller7@gmail.com> Date: Thu Jun 26 19:53:35 2014 -0400 Fixed issue with empty lines commit 57b54918b251ab77f000b575d77bcce3affcb27a Author: Michael Mueller <michaeldmueller7@gmail.com> Date: Thu Jun 26 16:31:27 2014 -0400 Added reference to new functionality in docs commit a2371638691584416439d3c6a4dd2ef1829dcbe3 Author: Michael Mueller <michaeldmueller7@gmail.com> Date: Thu Jun 26 16:26:06 2014 -0400 Implemented functionality to ignore comment lines, wrote a test

jreback · 2014-06-28T01:56:31Z

@jorisvandenbossche ok?

Ignore comment lines in read_csv parsing

jreback · 2014-06-30T19:26:28Z

@AmrAS1 thanks!

mdmueller · 2014-06-30T19:28:11Z

@jreback - You're welcome!

jreback · 2014-06-30T19:30:05Z

#7470 can do for 0.15.0 as its an API change

jreback · 2014-06-30T19:50:14Z

give a check on the docs when they are built (shortly): http://pandas-docs.github.io/pandas-docs-travis/io.html#comments and verify how it looks..thxs

mdmueller · 2014-06-30T19:52:41Z

Sure thing

mdmueller · 2014-06-30T22:43:43Z

Weird...there's one section in which ipython outputs this:

In [15]: data = 'a,b,c\n# commented line\n1,2,3\n#another comment\n4,5,6'

In [16]: print(data)
a,b,c
1,2,3
4,5,6

# commented line
#another comment

The rest of the output looks fine.

jorisvandenbossche · 2014-06-30T23:46:50Z

Indeed strange. I suppose this is an issue with the processing in the ipython directive. If you want to look into it, the code is in https://github.com/pydata/pandas/tree/master/doc/sphinxext/ipython_sphinxext (but which is a copy from ipython itself), or you can open an issue at ipython.

jreback added CSV labels Jun 27, 2014

jreback added this to the 0.14.1 milestone Jun 27, 2014

jorisvandenbossche reviewed Jun 27, 2014
View reviewed changes

jreback added a commit that referenced this pull request Jun 30, 2014

Merge pull request #7582 from amras1/ignore-comment-lines

49a86f1

Ignore comment lines in read_csv parsing

jreback merged commit 49a86f1 into pandas-dev:master Jun 30, 2014

mdmueller deleted the ignore-comment-lines branch June 30, 2014 19:28

jreback mentioned this pull request Jun 30, 2014

TST: failing windows parser test #7623

Closed

jorisvandenbossche mentioned this pull request Aug 27, 2014

BUG: read_table with full line comment and delim_whitespace=True #8115

Closed

mdmueller mentioned this pull request Aug 27, 2014

Made line comments work with delim_whitespace and custom line terminator #8122

Merged




		- The file parsers `read_csv` and `read_table` now ignore line comments provided by

Uh oh!

Ignore comment lines in read_csv parsing #7582

Ignore comment lines in read_csv parsing #7582

Uh oh!

Conversation

mdmueller commented Jun 26, 2014

Uh oh!

jreback commented Jun 27, 2014

Uh oh!

jorisvandenbossche Jun 27, 2014

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jun 27, 2014

Uh oh!

mdmueller commented Jun 27, 2014

Uh oh!

jorisvandenbossche commented Jun 27, 2014

Uh oh!

mdmueller commented Jun 27, 2014

Uh oh!

jorisvandenbossche Jun 27, 2014

Choose a reason for hiding this comment

Uh oh!

mdmueller Jun 27, 2014

Choose a reason for hiding this comment

Uh oh!

mdmueller Jun 27, 2014

Choose a reason for hiding this comment

Uh oh!

jreback commented Jun 28, 2014

Uh oh!

jreback commented Jun 30, 2014

Uh oh!

mdmueller commented Jun 30, 2014

Uh oh!

jreback commented Jun 30, 2014

Uh oh!

jreback commented Jun 30, 2014

Uh oh!

mdmueller commented Jun 30, 2014

Uh oh!

mdmueller commented Jun 30, 2014

Uh oh!

jorisvandenbossche commented Jun 30, 2014

Uh oh!

Uh oh!