read_csv() & EOF character in string cause parsing issue

While importing large text files using read_csv we occasionally get an EOF (End of File ) character within a string, which causes an exception: "Error tokenizing data.  C error: EOF inside string starting at line. 844863" .  This occurs even with "error_bad_lines = False"..

Further, the line stated in the error message is not the line containing the EOF character.  In this particular case the actual row was approx. 230 rows before the one stated, which hinders exception handling.  (I now see this difference was caused by other "bad_lines" that were being skipped - the quoted error line is correct but the imported rows was less.) 

I feel it would be appropriate if "error_bad_lines = False" handled this exception and allowed such rows to be skipped.  

I note that when importing this text file into Excel, the "premature" EOF is simply ignored.

We are running on Windows 8 , with python version 2.7 and pandas version 0.12


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_csv() & EOF character in string cause parsing issue #5500

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_csv() & EOF character in string cause parsing issue #5500

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions