ENH/BUG: ignore line comments in CSV files GH2685 #4505

holocronweaver · 2013-08-07T18:44:45Z

I have added the ability for both the C and Python CSV parsers to ignore commented lines (i.e., lines beginning with a comment character). Currently the C parser preserves commented lines as empty lines (all NaN), while the Python parser ignores them all together.

In addition, I fixed a small related problem with the CSV format sniffer in the Python parser.

I plan to finish up this work by ignoring empty lines as per #4466.

cpcloud · 2013-08-08T03:19:11Z

pandas/io/parsers.py

@@ -1282,9 +1280,8 @@ class MyDialect(csv.Dialect):

            sniff_sep = True

-            if sep is not None:
+            if (sep is not None) and (dia.quotechar is not None):


no need for parens here is not binds tighter than and

cpcloud · 2013-08-08T03:20:24Z

Can you add a test and release notes? thx!

holocronweaver · 2013-08-08T03:52:08Z

Sorry, I am new to pandas dev. I am guessing a unit test for commented lines in a CSV file is what you have in mind?

cpcloud · 2013-08-08T04:01:55Z

Yep!

holocronweaver · 2013-08-09T22:05:04Z

Where should the test be created? There does not seem to be a particular file for parsers. Maybe test_frame since parsers return frames?

cpcloud · 2013-08-09T22:17:03Z

check out pandas/tests/test_parsers.py

* also fix bug in CSV format sniffer

holocronweaver · 2013-08-12T20:40:44Z

Sorry, missed the tests folder in pandas/io.

Having trouble setting up the test to expect different output for C and Python parsers. The tests seem to lock the parser engine and ignore the engine parameter in read_csv, causing my test to fail. The Python parser omits empty lines, while the C parser does not. In #4466 I propose making the behavior the same by making the C parser follow the Python parser behavior. Perhaps I should just go ahead and implement my suggestion? Or is there a method to query the current engine from within a test?

jreback · 2013-08-12T23:43:41Z

it normally goes thru 3 different version of the parser if you put your test in ParserTests, python parsing, c parsing, and i think low memory c parsing. You can put a test in say PythonParsing if you only want to have it run on that. Best prob to put it in the main test class if you want to have similar behavior in all parsers.

if you step thru it it call read_csv (and not pd.read_csv), which sets the engine depending on the iteration. I think you can set engine='python' in any event when you call read_csv to specify locally

jreback · 2013-08-23T02:12:29Z

@holocronweaver how's this coming along?

holocronweaver · 2013-08-23T02:29:16Z

Almost done, though temporarily delayed due to work. I will try to get this finished up tomorrow if possible. Worst case would be next weekend.

jreback · 2013-08-23T02:35:57Z

gr8
ping when ready

jreback · 2013-09-20T23:12:47Z

@holocronweaver how's this coming along?

holocronweaver · 2013-09-21T00:45:46Z

@jreback It is basically done, but I need time to test and debug. I am currently finishing a GSoC project which ends next week, so I will have a bit of free time again and will try to push this as soon as I get a chance.

jreback · 2013-09-21T00:47:02Z

@holocronweaver perfect...pls ping back when to take a look

jreback · 2013-10-02T21:02:00Z

@holocronweaver how's this coming?

jreback · 2013-10-07T21:15:14Z

@holocronweaver ping!

jreback · 2013-10-11T12:33:20Z

@holocronweaver going to be able to rebase this in the next couple of days?

holocronweaver · 2013-10-11T14:44:44Z

@jreback Sorry, have been very busy at work. Will be at least another week, though I will try to get it done sooner. Apologies again for the long delay.

jreback · 2013-10-14T01:19:18Z

@holocronweaver ok...let us know

jreback · 2014-01-03T20:53:05Z

@holocronweaver can are to rebase this?

holocronweaver · 2014-01-04T16:16:56Z

@jreback Sure, when I get back from holiday travels.

jreback · 2014-02-16T21:36:10Z

@holocronweaver progress on this?

holocronweaver · 2014-02-24T03:56:17Z

@jreback No, but it is on my TODO list. Crunch time is preventing anything extracurricular.

jreback · 2014-03-09T15:04:44Z

@holocronweaver update?

jreback · 2014-04-05T23:47:37Z

@holocronweaver update on this?

jreback · 2014-06-16T16:49:55Z

closing in favor of #7470

jreback · 2014-12-28T02:27:06Z

see here: https://github.com/pydata/pandas/pull/7470/files

try skip_blank_lines=False (is the original behavior)

amanshei · 2014-12-28T02:30:19Z

Thanks!!

On Sat, Dec 27, 2014 at 6:27 PM, jreback notifications@github.com wrote:

see here: https://github.com/pydata/pandas/pull/7470/files

try skip_blank_lines=False (is the original behavior)

—
Reply to this email directly or view it on GitHub
#4505 (comment).

cpcloud reviewed Aug 8, 2013
View reviewed changes

Jesse Johnson added 2 commits August 12, 2013 14:42

ENH/BUG: ignore line comments in CSV files GH2685

e4fb9ed

* also fix bug in CSV format sniffer

TST: add test for CSV parser line comments

d680f13

hayd mentioned this pull request Aug 21, 2013

first line comments on a read_csv #4623

Closed

jreback mentioned this pull request Aug 21, 2013

[io] comment option ignored in read_csv when sep provided. #3001

Closed

jreback closed this Jan 3, 2014

jreback reopened this Jan 3, 2014

jreback added CSV labels Feb 16, 2014

jreback modified the milestones: 0.15.0, 0.14.0 Feb 26, 2014

mdmueller mentioned this pull request Jun 16, 2014

ENH: ignoring comment lines and empty lines in CSV files #7470

Closed

jreback modified the milestones: 0.14.1, 0.15.0 Jun 16, 2014

jreback closed this Jun 16, 2014

Uh oh!

ENH/BUG: ignore line comments in CSV files GH2685 #4505

ENH/BUG: ignore line comments in CSV files GH2685 #4505

Uh oh!

Conversation

holocronweaver commented Aug 7, 2013

Uh oh!

cpcloud Aug 8, 2013

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Aug 8, 2013

Uh oh!

holocronweaver commented Aug 8, 2013

Uh oh!

cpcloud commented Aug 8, 2013

Uh oh!

holocronweaver commented Aug 9, 2013

Uh oh!

cpcloud commented Aug 9, 2013

Uh oh!

holocronweaver commented Aug 12, 2013

Uh oh!

jreback commented Aug 12, 2013

Uh oh!

jreback commented Aug 23, 2013

Uh oh!

holocronweaver commented Aug 23, 2013

Uh oh!

jreback commented Aug 23, 2013

Uh oh!

jreback commented Sep 20, 2013

Uh oh!

holocronweaver commented Sep 21, 2013

Uh oh!

jreback commented Sep 21, 2013

Uh oh!

jreback commented Oct 2, 2013

Uh oh!

jreback commented Oct 7, 2013

Uh oh!

jreback commented Oct 11, 2013

Uh oh!

holocronweaver commented Oct 11, 2013

Uh oh!

jreback commented Oct 14, 2013

Uh oh!

jreback commented Jan 3, 2014

Uh oh!

holocronweaver commented Jan 4, 2014

Uh oh!

jreback commented Feb 16, 2014

Uh oh!

holocronweaver commented Feb 24, 2014

Uh oh!

jreback commented Mar 9, 2014

Uh oh!

jreback commented Apr 5, 2014

Uh oh!

jreback commented Jun 16, 2014

Uh oh!

jreback commented Dec 28, 2014

Uh oh!

amanshei commented Dec 28, 2014

Uh oh!

Uh oh!