-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
join
on tz-aware column and tz-aware index wrongly fails (regression from 0.22)
#23931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Somewhere in ipdb> self.right_join_keys
[array(['2018-01-03T06:00:00.000000000', '2018-01-04T06:00:00.000000000',
'2018-01-05T06:00:00.000000000', '2018-01-06T06:00:00.000000000',
'2018-01-07T06:00:00.000000000'], dtype='datetime64[ns]')]
ipdb> I'd recommend going through |
I think #21184 may be affected by the same root cause as this issue. |
Thanks @TomAugspurger ! I think this patch should fix the issue (and added tests), if that looks about right I'll submit a PR. @mroeschke haven't checked if this also fixed #21184 but will take a look. |
same bug here: while the following works fine a = pd.DataFrame({'a': range(5), 'x': pd.date_range('2018-11-01', periods=5, freq='1T', tz=None)})
b = pd.DataFrame({'b': range(5)}, index=pd.date_range('2018-11-01', periods=5, freq='1T', tz=None))
a.join(b, on='x', lsuffix='_') if we change tz to a specific time zone, like following: a = pd.DataFrame({'a': range(5), 'x': pd.date_range('2018-11-01', periods=5, freq='1T', tz='utc')})
b = pd.DataFrame({'b': range(5)}, index=pd.date_range('2018-11-01', periods=5, freq='1T', tz='utc'))
a.join(b, on='x', lsuffix='_') it failed: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-84-30f4c345c06a> in <module>
1 a = pd.DataFrame({'a': range(5), 'x': pd.date_range('2018-11-01', periods=5, freq='1T', tz='utc')})
2 b = pd.DataFrame({'b': range(5)}, index=pd.date_range('2018-11-01', periods=5, freq='1T', tz='utc'))
----> 3 a.join(b, on='x', lsuffix='_')
lib/python3.7/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
6334 # For SparseDataFrame's benefit
6335 return self._join_compat(other, on=on, how=how, lsuffix=lsuffix,
-> 6336 rsuffix=rsuffix, sort=sort)
6337
6338 def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',
lib/python3.7/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
6349 return merge(self, other, left_on=on, how=how,
6350 left_index=on is None, right_index=True,
-> 6351 suffixes=(lsuffix, rsuffix), sort=sort)
6352 else:
6353 if on is not None:
lib/python3.7/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
59 right_index=right_index, sort=sort, suffixes=suffixes,
60 copy=copy, indicator=indicator,
---> 61 validate=validate)
62 return op.get_result()
63
lib/python3.7/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
553 # validate the merge keys dtypes. We may need to coerce
554 # to avoid incompat dtypes
--> 555 self._maybe_coerce_merge_keys()
556
557 # If argument passed to validate,
lib/python3.7/site-packages/pandas/core/reshape/merge.py in _maybe_coerce_merge_keys(self)
990 raise ValueError(msg)
991 elif is_datetime64tz_dtype(lk) and not is_datetime64tz_dtype(rk):
--> 992 raise ValueError(msg)
993 elif not is_datetime64tz_dtype(lk) and is_datetime64tz_dtype(rk):
994 raise ValueError(msg)
ValueError: You are trying to merge on datetime64[ns, UTC] and datetime64[ns] columns. If you wish to proceed you should use pd.concat pandas.version == '0.23.4' |
FYI, |
Code Sample, a copy-pastable example if possible
Problem description
In 0.23.4, this raises:
ValueError: You are trying to merge on datetime64[ns, America/Chicago] and datetime64[ns] columns. If you wish to proceed you should use pd.concat
This is incorrect - the two columns are both tz-aware and indeed
df2.set_index('date').index.dtype == df1.date.dtype
returns True.In 0.22.0, this does not raise, and returns the expected output. Haven't run git bisect yet but seems like the logical next step.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1072-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 39.2.0
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: 2.7.6.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: