You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Was building on this example on SO, and found a bug in ExcelFile().parse option skiprows, when passed an index.
In [2]: xls=pd.ExcelFile('example.xlsx')
In [3]: df=xls.parse(xls.sheet_names[0])
In [4]: printdfBillDateMeter# Type SIC Code Billing Days Rate Code02011-10-0100:00:001892213E811129ALTOU12011-11-0100:00:001892213E811129ALTOU2NaNNaNNaNNaNNaNNaN3NaNNaNNaNNaNNaNNaN4BillDateMeter# Type SIC Code Billing Days Rate Code52011-10-0100:00:00553961G811129GN362011-11-0100:00:00553961G811129GN37NaNNaNNaNNaNNaNNaN8BillDateMeter# Type SIC Code Billing Days Rate Code92011-10-0100:00:006322158E29A102011-11-0100:00:006322158E29A
Wanting to avoid the repeated headers and empty rows, I parse with skiprows, but no dice (no difference with previously parsed sheet):
In [5]: skip_idx=np.array([2,3,4,7,8])
In [6]: df=xls.parse(xls.sheet_names[0], skiprows=skip_idx+1)
In [8]: dfOut[8]:
BillDateMeter# Type SIC Code Billing Days Rate Code02011-10-0100:00:001892213E811129ALTOU12011-11-0100:00:001892213E811129ALTOU2NaNNaNNaNNaNNaNNaN3NaNNaNNaNNaNNaNNaN4BillDateMeter# Type SIC Code Billing Days Rate Code52011-10-0100:00:00553961G811129GN362011-11-0100:00:00553961G811129GN37NaNNaNNaNNaNNaNNaN8BillDateMeter# Type SIC Code Billing Days Rate Code92011-10-0100:00:006322158E29A102011-11-0100:00:006322158E29A
Meanwhile, our little friend pd.read_csv(), doesn't seem to have the same hangups. Creating a csv directly from the original .xlsx and performing the same operations:
In [9]: df=pd.read_csv('example.csv')
In [10]: dfOut[10]:
BillDateMeter# Type SIC Code Billing Days Rate Code0Oct-111892213E811129ALTOU1Nov-111892213E811129ALTOU2NaNNaNNaNNaNNaNNaN3NaNNaNNaNNaNNaNNaN4BillDateMeter# Type SIC Code Billing Days Rate Code5Oct-11553961G811129GN36Nov-11553961G811129GN37NaNNaNNaNNaNNaNNaN8BillDateMeter# Type SIC Code Billing Days Rate Code9Oct-116322158E29A10Nov-116322158E29A
Looks fine, and now passing the skiprows option, and I get a correctly parsed df with all the guilty lines missing:
In [11]: df=pd.read_csv('example.csv', skiprows=skip_id+1)
In [12]: dfOut[12]:
BillDateMeter# Type SIC Code Billing Days Rate Code0Oct-111892213E811129ALTOU1Nov-111892213E811129ALTOU2Oct-11553961G811129GN33Nov-11553961G811129GN34Oct-116322158E29A5Nov-116322158E29A
Started tracing for troubleshooting, but after the 5th handoff of the skiprows parameter all over in .io, I gave up. :(
Thoughts?
The text was updated successfully, but these errors were encountered:
related/dup #4340
Was building on this example on SO, and found a bug in
ExcelFile().parse
optionskiprows
, when passed an index.Wanting to avoid the repeated headers and empty rows, I parse with
skiprows
, but no dice (no difference with previously parsed sheet):Meanwhile, our little friend
pd.read_csv()
, doesn't seem to have the same hangups. Creating a csv directly from the original.xlsx
and performing the same operations:Looks fine, and now passing the
skiprows
option, and I get a correctly parseddf
with all the guilty lines missing:Started tracing for troubleshooting, but after the 5th handoff of the
skiprows
parameter all over in.io
, I gave up. :(Thoughts?
The text was updated successfully, but these errors were encountered: