-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: With cache, to_datetime() returns pd.NaT for inputs that produce duplicated values #42259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
3 tasks done
Labels
Bug
Datetime
Datetime data dtype
Regression
Functionality that used to work in a prior pandas version
Milestone
Comments
take |
4 tasks
The code sample worked on 1.2.5. relabelling as regression, and changing milestone to 1.3.1 will backport #42261 and move release note |
3 tasks
simonjayhawkins
added a commit
to simonjayhawkins/pandas
that referenced
this issue
Jul 22, 2021
simonjayhawkins
added a commit
to simonjayhawkins/pandas
that referenced
this issue
Jul 22, 2021
simonjayhawkins
added a commit
that referenced
this issue
Jul 22, 2021
CGe0516
pushed a commit
to CGe0516/pandas
that referenced
this issue
Jul 29, 2021
feefladder
pushed a commit
to feefladder/pandas
that referenced
this issue
Sep 7, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Bug
Datetime
Datetime data dtype
Regression
Functionality that used to work in a prior pandas version
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
The current to_datetime will incorrectly parse and omit data in certain situations due to a slightly erroneous deduplication to fix
GH#39882 and GH#35888.
Expected Output
to parse the datetime correctly.
Eg. return Series([NaT] * 51 + [Timestamp("2012-07-26"), Timestamp("2012-07-26")], dtype="datetime64[ns]") in the example above.
Output of
pd.show_versions()
pandas : 1.4.0.dev0+108.gfa6b96e128
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 57.0.0
Cython : 0.29.23
pytest : 6.2.4
hypothesis : 6.14.0
sphinx : 4.0.2
blosc : 1.10.4
feather : None
xlsxwriter : 1.4.3
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.25.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 2021.05.0
fastparquet : 0.6.3
gcsfs : 2021.05.0
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : 2021.05.0
scipy : 1.7.0
sqlalchemy : 1.4.19
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1
The text was updated successfully, but these errors were encountered: