-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: read_csv, engine='c' error #9735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can reproduce this bug with 0.16.0, but it works in 0.16.1+. It looks like GH #10023 fixed it (based on bisection). |
do you want to add a confirming test ? |
Same error here on pandas 0.16.2 (Mac OS X) -- works when engine='python', not when using C parser. Input file is a 59 MB csv that I had previously written via pandas with Works just fine when I cut off the first 1000 lines of the input file and try to read that, so I figure the bug is somewhere in |
The 0.16.1. update resolved my problem, so it works now. |
Perhaps 0.16.2 is a regression, then. I can open a new ticket if you think it's a different bug, but I get the exact same error message and traceback. |
on 0.16.2, so if you have a differnce pls post.
|
I had to add a bunch of floats to the end to get it to fail on 0.16.0, but it works on everything I tried from 0.16.1 through current trunk. I'll try to get a minimal test case. |
closed by #11138 |
* commit 'v0.17.0rc1-92-gc6bcc99': (29 commits) CI: tests latest versions of openpyxl COMPAT: openpyxl >= 2.2 support, pandas-dev#10125 Tests demonstrating how to use sqlalchemy.text() objects in read_sql() TST: Capture warnings in _check_plot_works COMPAT/BUG: color handling in scatter COMPAT: Support for matplotlib 1.5 ERR/API: Raise NotImplementedError when Panel operator function is not implemented, pandas-dev#7692 DOC: minor doc formatting fixes PERF: nested dict DataFrame construction DEPR: deprecate SparsePanel BLD: dateutil->python-dateutil in conda recipe BUG/API: GH11086 where freq is not inferred if both freq is None ENH: add merge indicator to DataFrame.merge PERF: improves performance in groupby.size BUG: DatetimeTZBlock.fillna raises TypeError PERF: infer_datetime_format without padding pandas-dev#11142 PERF: improves performance in SeriesGroupBy.transform TST: Verify fix for buffer overflow in read_csv with engine='c' (GH pandas-dev#9735) DEPR: Series.is_timeseries BUG: nested construction with timedelta pandas-dev#11129 ...
I am trying to read a file 57MB with
pandas.csv_read
. The file contains a header (5 rows), afterwads integer values and at the end float values:When I read the txt file:
import pandas as pd
pd.read_csv(file, skiprows=5+n_int_values, header=None, engine='c',
dtype=np.float, low_memory=False)
The result is an error:
This happens on pandas 0.16.0, on anaconda python 2.7.8. On an older version - 0.14.1. it works correctly.
Note: When I use
engine='python'
, the txt file is loaded normaly.The text was updated successfully, but these errors were encountered: