Skip to content

to_datetime does not ignore the error when there is NaN before wrong datetime when format is %Y%m%d #25512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gyli opened this issue Mar 1, 2019 · 2 comments · Fixed by #26561
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@gyli
Copy link

gyli commented Mar 1, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd

# Doesn't work
pd.to_datetime(pd.Series([pd.np.nan, '19750501', '19820001', '19770501']), format='%Y%m%d', errors='coerce')
pd.to_datetime(pd.Series(['19750501', pd.np.nan, '19820001', '19770501']), format='%Y%m%d', errors='coerce')

# Works
pd.to_datetime(pd.Series(['19750501', '19820001', '19770501']), format='%Y%m%d', errors='coerce')

# Works
pd.to_datetime(pd.Series(['19750501', '19820001', pd.np.nan, '19770501']), format='%Y%m%d', errors='coerce')

Problem description

with errors='ignore' or 'coerce', pandas should be able to ignore the wrong datetime '19820001' in it. However, if there is NaN before the wrong datetime, pandas returns error "OverflowError: signed integer is less than minimum"

It only happens when format %Y%m%d

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.1
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.2.17
pymysql: 0.9.3
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@mroeschke mroeschke added Bug Datetime Datetime data dtype labels Mar 1, 2019
@gyli gyli changed the title to_datetime does not ignore the error when there is NaN before wrong datetime to_datetime does not ignore the error when there is NaN before wrong datetime when format is %Y%m%d Mar 2, 2019
@coderop2
Copy link

coderop2 commented Mar 4, 2019

I checked the functioning of code for both the scenarios and it is working fine in datetimes.py file but in the scenarios when nan is passed the arguments is passed to cython file datetime...maybe the error is being raised from the cython module.
Any idea how can i solve that ?

@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone May 27, 2019
nathalier added a commit to nathalier/pandas that referenced this issue May 29, 2019
…as-dev#25512)

parsing.try_parse_year_month_day() in _attempt_YYYYMMDD() throws not only ValueError but also OverFlowError for incorrect dates. So handling of this error was added.
@nathalier
Copy link
Contributor

While writing tests for the issue I found a couple of issues when errors='ignore'.

  1. when there is an incorrect value in a series, conversion does not happen for all values not only incorrect one. I.e.,
    pd.to_datetime(pd.Series(['19750501', '19820001', '19770501']), format='%Y%m%d', errors='ignore')
    returns just the initial values.
    Not sure it's intended..
  2. Exception "TypeError: 'int' object is unsliceable" is thrown by array_strptime() for some incorrect integer values
    pd.to_datetime(pd.Series([19801222, 20011301, 19991222]), format='%Y%m%d', errors='ignore')
    returns error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants