Skip to content

BUG: to_datetime do not keep the date format throughout the column when using inferred format #34546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
lviani opened this issue Jun 3, 2020 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@lviani
Copy link

lviani commented Jun 3, 2020

  • [ X] I have checked that this issue has not already been reported.

  • [X ] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pandas as pd

# day|month first (no error is raised, and it shouldnt continue normally)
print('First case - year last | no error raised')
df = pd.DataFrame({'timestamp':['11-05-2020','12-05-2020','14-05-2020','13-05-2020','01-06-2020','02-06-2020']})
print(pd.to_datetime(df['timestamp']))

# Year first (an error is raised, thus it is what I would expect)
print('\n\nSecond case - year first | error raised')
df = pd.DataFrame({'timestamp':['2020-11-05','2020-12-05','2020-14-05','2020-13-05','2020-01-06','2020-02-06']})
pd.to_datetime(df['timestamp'])

Problem description

The function to_datetime do not keep the same date format throughout the column when using inferred format. This happens when the dates to parse do NOT start with the year.

Using the table below as input ('test.csv'):
timestamp
2020-11-05
2020-12-05
2020-14-05
2020-13-05
2020-01-06
2020-02-06

Problem:

  1. The problem happens when the dates do not start with the year. In this case, pandas switch the month and day (see row 2 and 3), thus changing the format detected.
    0 2020-11-05
    1 2020-12-05
    2 2020-05-14
    3 2020-05-13
    4 2020-01-06
    5 2020-02-06

  2. When the dates start with the year and error is raised (for me this should also happen in the previous case).
    ValueError: month must be in 1..12

Expected Output

raise an error if the datetime format of the entry values changes in the same column

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 5.3.2
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@lviani lviani added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2020
@phofl
Copy link
Member

phofl commented Jun 3, 2020

Hey,

thanks for your report. Could you please alter your example in a way, that it is copy-pastable? You could define your DataFrame with code for example.

@lviani
Copy link
Author

lviani commented Jun 4, 2020

Ok....I hope it is better now.

@simonjayhawkins
Copy link
Member

@lviani Thanks for the report. This looks like a duplicate of #12585 so closing. ping if i'm missing something.

@simonjayhawkins simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants