Closed
Description
In pandas 1.3.0 pd.to_datetime will produce NaT for valid datetime strings when processing lists or Series of certain size and content. Here's the actual reproducible example:
import pandas as pd
pd.__version__
ld = ['2021-09-24T00:00:00Z',
'2021-09-24T00:00:00Z',
'2021-09-24T00:00:00Z',
'2021-09-24T00:00:00Z',
'2021-09-24T00:00:00Z',
'2021-07-30T00:00:00Z',
'2021-07-30T00:00:00Z',
'2021-07-30T00:00:00Z',
'2021-07-30T00:00:00Z',
'2021-07-30T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2024-12-17T00:00:00.0000000Z',
'2024-12-17T00:00:00.0000000Z',
'2024-12-17T00:00:00.0000000Z',
'2024-12-17T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z',
'2021-08-16T00:00:00.0000000Z']
pd.to_datetime(ld)
And proof:
(p130) C:\Users\volex>python
Python 3.9.5 (default, May 18 2021, 14:42:02) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'1.3.0'
>>> ld = ['2021-09-24T00:00:00Z',
... '2021-09-24T00:00:00Z',
... '2021-09-24T00:00:00Z',
... '2021-09-24T00:00:00Z',
... '2021-09-24T00:00:00Z',
... '2021-07-30T00:00:00Z',
... '2021-07-30T00:00:00Z',
... '2021-07-30T00:00:00Z',
... '2021-07-30T00:00:00Z',
... '2021-07-30T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2024-12-17T00:00:00.0000000Z',
... '2024-12-17T00:00:00.0000000Z',
... '2024-12-17T00:00:00.0000000Z',
... '2024-12-17T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z',
... '2021-08-16T00:00:00.0000000Z']
>>> pd.to_datetime(ld)
DatetimeIndex(['2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
'2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
'2021-09-24 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
'2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
'2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', '2024-12-17 00:00:00+00:00',
'2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
'2024-12-17 00:00:00+00:00', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT', 'NaT',
'NaT'],
dtype='datetime64[ns, UTC]', freq=None)
>>>
Interestingly when the list is of size 50, it will work fine:
>>> pd.to_datetime(ld[1:])
DatetimeIndex(['2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
'2021-09-24 00:00:00+00:00', '2021-09-24 00:00:00+00:00',
'2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
'2021-07-30 00:00:00+00:00', '2021-07-30 00:00:00+00:00',
'2021-07-30 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
'2024-12-17 00:00:00+00:00', '2024-12-17 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00',
'2021-08-16 00:00:00+00:00', '2021-08-16 00:00:00+00:00'],
dtype='datetime64[ns, UTC]', freq=None)
No such anomaly observed in pandas 1.2.x.
This is on Windows 10 64 bit using Anaconda.
(p130) C:\Users\volex>conda list
# packages in environment at C:\Users\volex\AppData\Local\Continuum\anaconda3\envs\p130:
#
# Name Version Build Channel
blas 1.0 mkl
bottleneck 1.3.2 py39h7cc1a96_1
ca-certificates 2021.7.5 haa95532_1
certifi 2021.5.30 py39haa95532_0
intel-openmp 2021.3.0 haa95532_3372
mkl 2021.3.0 haa95532_524
mkl-service 2.4.0 py39h2bbff1b_0
mkl_fft 1.3.0 py39h277e83a_2
mkl_random 1.2.2 py39hf11a4ad_0
numexpr 2.7.3 py39hb80d3ca_1
numpy 1.20.3 py39ha4e8547_0
numpy-base 1.20.3 py39hc2deb75_0
openssl 1.1.1k h2bbff1b_0
pandas 1.3.0 py39hd77b12b_0
pip 21.1.3 py39haa95532_0
python 3.9.5 h6244533_3
python-dateutil 2.8.2 pyhd3eb1b0_0
pytz 2021.1 pyhd3eb1b0_0
setuptools 52.0.0 py39haa95532_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 h2bbff1b_0
tzdata 2021a h52ac0ba_0
vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
wheel 0.36.2 pyhd3eb1b0_0
wincertstore 0.2 py39h2bbff1b_0