Skip to content

pandas.to_datetime in ver 1.3.0 fails to convert dates in arrays of size 51 #42680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
agilevic opened this issue Jul 23, 2021 · 3 comments
Closed
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request

Comments

@agilevic
Copy link

agilevic commented Jul 23, 2021

In pandas 1.3.0 pd.to_datetime will produce NaT for valid datetime strings when processing lists or Series of certain size and content. Here's the actual reproducible example:

import pandas as pd
pd.__version__
ld = ['2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z']
pd.to_datetime(ld) 

And proof:

(p130) C:\Users\volex>python
Python 3.9.5 (default, May 18 2021, 14:42:02) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'1.3.0'
>>> ld = ['2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z']
>>> pd.to_datetime(ld)
DatetimeIndex(['2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-09-24 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT', '2024-12-17 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT'],
              dtype='datetime64[ns, UTC]', freq=None)
>>>

Interestingly when the list is of size 50, it will work fine:

>>> pd.to_datetime(ld[1:])
DatetimeIndex(['2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

No such anomaly observed in pandas 1.2.x.

This is on Windows 10 64 bit using Anaconda.

(p130) C:\Users\volex>conda list
# packages in environment at C:\Users\volex\AppData\Local\Continuum\anaconda3\envs\p130:
#
# Name                    Version                   Build  Channel
blas                      1.0                         mkl
bottleneck                1.3.2            py39h7cc1a96_1
ca-certificates           2021.7.5             haa95532_1
certifi                   2021.5.30        py39haa95532_0
intel-openmp              2021.3.0          haa95532_3372
mkl                       2021.3.0           haa95532_524
mkl-service               2.4.0            py39h2bbff1b_0
mkl_fft                   1.3.0            py39h277e83a_2
mkl_random                1.2.2            py39hf11a4ad_0
numexpr                   2.7.3            py39hb80d3ca_1
numpy                     1.20.3           py39ha4e8547_0
numpy-base                1.20.3           py39hc2deb75_0
openssl                   1.1.1k               h2bbff1b_0
pandas                    1.3.0            py39hd77b12b_0
pip                       21.1.3           py39haa95532_0
python                    3.9.5                h6244533_3
python-dateutil           2.8.2              pyhd3eb1b0_0
pytz                      2021.1             pyhd3eb1b0_0
setuptools                52.0.0           py39haa95532_0
six                       1.16.0             pyhd3eb1b0_0
sqlite                    3.36.0               h2bbff1b_0
tzdata                    2021a                h52ac0ba_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.36.2             pyhd3eb1b0_0
wincertstore              0.2              py39h2bbff1b_0
@phofl
Copy link
Member

phofl commented Jul 23, 2021

This works on master. May need a test

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Jul 23, 2021
@agilevic
Copy link
Author

Same problem on Linux /w pandas installed from pip. So, not an Anaconda or Windows specific issue.

@lithomas1 lithomas1 added Duplicate Report Duplicate issue or pull request Datetime Datetime data dtype and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Jul 23, 2021
@lithomas1
Copy link
Member

lithomas1 commented Jul 23, 2021

This is a dupe of #42259, which is patched for 1.3.1 and is related to the cache param. 1.3.1 will hopefully be released this Sunday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants