BUG: DateTime Index format bug when importing from .csv file #35402
Labels
Bug
Datetime
Datetime data dtype
Duplicate Report
Duplicate issue or pull request
IO CSV
read_csv, to_csv
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
I am sorry but I cannot provide a mwe because time is against me, but I still want to report the problem.
I have loaded a .csv file which has a column called "timestamp_utc" containing the full timestamp, date+time.
I have then set the timestamp_utc column as my index, and converted it to datetime object.
My code was not working because some dates were apparently missing while, by hand inspectyion of the .csv file, they were present.
The .csv file has all dates in dd/mm/yyyy format, from 02/12/2019 to 03/06/2020, but after running the code above and checking the index in the dataframe, this was shown (see screenshot):

so the index jumps from 2019/12/02 (which is yyyy/dd/mm format) to 2019/02/13 (which is yyyy/mm/dd format)!
Expected Output
I do not mind much if Pandas choses to import something as yyyy/dd/mm or yyyy/mm/dd, I know that I can use the "dayfirst" option in case, but at least Pandas should be consistent. This simply breaks the logic of indexes, which should always be increasing values. I find it quite surprising that no warning was given during the set_index operation.
These kind of issues happen on a daily basis when working with Pandas and in my group several tens of hours are spent debugging what should be a straightforward thing. My suggestion to avoid these problems in the future is that Pandas should expect and treat dates as if they were dd/mm/yyyy or yyyy/mm/dd format, throwing (at least) a warning when a date in mm/dd/yyyy format is found, so that the user can correct the import.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : None.None
pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
The text was updated successfully, but these errors were encountered: