-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: pandas.read_excel(file_name, parse_dates = False)
doesn't work
#47269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The original date format is dd/mm/Y A working exame of above issue >>> df = pd.read_excel('combine_me/'+files_path[2], header = None, parse_dates = False)
>>> df.dropna(subset=[0], inplace = True)
>>> df[7]
7 EventDate
9 2012-01-04 00:00:00
10 2012-01-05 00:00:00
11 2012-01-06 00:00:00
12 2012-01-11 00:00:00
...
175 2012-12-21 00:00:00
176 2012-12-21 00:00:00
177 2012-12-24 00:00:00
178 2012-12-24 00:00:00
179 2012-12-28 00:00:00
Name: 7, Length: 172, dtype: object
>>> I tried to specify the column which has dates as str but it still gets parsed as dates >>> df = pd.read_excel('combine_me/'+files_path[2], header = None, converters ={7:str})
>>> df.dropna(subset=[0], inplace = True)
>>> df[7]
7 EventDate
9 2012-01-04 00:00:00
10 2012-01-05 00:00:00
11 2012-01-06 00:00:00
12 2012-01-11 00:00:00
...
175 2012-12-21 00:00:00
176 2012-12-21 00:00:00
177 2012-12-24 00:00:00
178 2012-12-24 00:00:00
179 2012-12-28 00:00:00
Name: 7, Length: 172, dtype: object
>>> |
Also one more thing, sometimes the dates are parsed wrong. i got a freelance job once to do some operations on an excel file. As u can see above, the dates are parsed and converted automatically, i changed the date format back to orignal before writing it to a file. And i sent the file back to the guy who gave me the work. A day later he tells me the dates have month and dates as swapped in some rows. For e.g. That was quite bad cause it may have gone undetected. |
Hi @Haiz14. My guess (having not seen your spreadsheet) is that your cells are already stored as dates in Excel. The |
Could you please provide a reproducible example? Something that is copy and pasteable would be perfect |
Here's, the excel file i used to show above examples. |
@ahawryluk there's a lot of excel files, it's not possible for me to convert the excel dates as text manually for that many files |
Yupp you are right, check the sample example i posted (2nd post in this thread), the datatype of printed DataFrame is object not "datetime", so pandas worked fine (it didn't parse it as dates, there should be some excel metadata ig). It's an excel problem. |
My above post isnt valid because u can see in the file i gave previously, it has a string in the date column. I checked in a simpler file It does gets parsed as datetime even though i passed >>> df = pd.read_excel('2012.xlsx', parse_dates = False)
>>> df['EventDate']
0 2012-01-04
1 2012-01-05
2 2012-01-06
3 NaT
4 NaT
...
185 NaT
186 NaT
187 NaT
188 NaT
189 NaT
Name: EventDate, Length: 190, dtype: datetime64[ns] |
Thanks @Haiz14 for the sample file. Here's how I understand the issue. Pandas loads data from xlsx files in two steps:
Dates represented in Excel as text can be parsed by pandas in step 2, but actual Excel dates are returned in step 1 in their actual format: >>> import openpyxl
>>> wb = openpyxl.open('2012.xlsx')
>>> ws = wb.worksheets[0]
>>> ws['H1'].value
'EventDate'
>>> ws['H1'].data_type
's' # string
>>> ws['H2'].value
datetime.datetime(2012, 1, 4, 0, 0)
>>> ws['H2'].data_type
'd' # date
>>> ws['H2'].number_format
'mm-dd-yy' The value in cell H2 is an Excel date, not a string, so the value of If I understand your task correctly, you need to load and save many spreadsheets, and you need both the value and the formatting of these dates to be preserved. Perhaps you could specify the output datetime_format with ExcelWriter, or you could use openpyxl rather than pandas for this task, as it can read and write the number_format for each cell. |
Ty, closing this issue then. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The above code parses the date from (dd/mm/Y) to (Y-mm-dd hh:mm:ss) even though i specified to not parse dates.
Expected Behavior
The written file should retain the orignal date format.
Installed Versions
INSTALLED VERSIONS
commit : 06d2301
python : 3.10.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.186+
Version : #1 SMP PREEMPT Tue Apr 5 22:31:47 CST 2022
machine : aarch64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.1
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.1.1
setuptools : 59.8.0
Cython : None
pytest : 7.1.2
hypothesis : 6.43.0
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
fastparquet : None
fsspec : 2022.3.0
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: