Skip to content

pandas.to_datetime in ver 1.3.0 fails to convert dates in arrays of size 51 #42680

Closed
@agilevic

Description

@agilevic

In pandas 1.3.0 pd.to_datetime will produce NaT for valid datetime strings when processing lists or Series of certain size and content. Here's the actual reproducible example:

import pandas as pd
pd.__version__
ld = ['2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-09-24T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-07-30T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2024-12-17T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z',
 '2021-08-16T00:00:00.0000000Z']
pd.to_datetime(ld) 

And proof:

(p130) C:\Users\volex>python
Python 3.9.5 (default, May 18 2021, 14:42:02) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'1.3.0'
>>> ld = ['2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-09-24T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-07-30T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2024-12-17T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z',
...  '2021-08-16T00:00:00.0000000Z']
>>> pd.to_datetime(ld)
DatetimeIndex(['2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-09-24 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT', '2024-12-17 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT',                       'NaT',
                                     'NaT'],
              dtype='datetime64[ns, UTC]', freq=None)
>>>

Interestingly when the list is of size 50, it will work fine:

>>> pd.to_datetime(ld[1:])
DatetimeIndex(['2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
               '2021-07-30 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
               '2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
               '2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

No such anomaly observed in pandas 1.2.x.

This is on Windows 10 64 bit using Anaconda.

(p130) C:\Users\volex>conda list
# packages in environment at C:\Users\volex\AppData\Local\Continuum\anaconda3\envs\p130:
#
# Name                    Version                   Build  Channel
blas                      1.0                         mkl
bottleneck                1.3.2            py39h7cc1a96_1
ca-certificates           2021.7.5             haa95532_1
certifi                   2021.5.30        py39haa95532_0
intel-openmp              2021.3.0          haa95532_3372
mkl                       2021.3.0           haa95532_524
mkl-service               2.4.0            py39h2bbff1b_0
mkl_fft                   1.3.0            py39h277e83a_2
mkl_random                1.2.2            py39hf11a4ad_0
numexpr                   2.7.3            py39hb80d3ca_1
numpy                     1.20.3           py39ha4e8547_0
numpy-base                1.20.3           py39hc2deb75_0
openssl                   1.1.1k               h2bbff1b_0
pandas                    1.3.0            py39hd77b12b_0
pip                       21.1.3           py39haa95532_0
python                    3.9.5                h6244533_3
python-dateutil           2.8.2              pyhd3eb1b0_0
pytz                      2021.1             pyhd3eb1b0_0
setuptools                52.0.0           py39haa95532_0
six                       1.16.0             pyhd3eb1b0_0
sqlite                    3.36.0               h2bbff1b_0
tzdata                    2021a                h52ac0ba_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.36.2             pyhd3eb1b0_0
wincertstore              0.2              py39h2bbff1b_0

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeDuplicate ReportDuplicate issue or pull request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions