-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
pd.merge() doesn't merge int and str column dtypes but no warning or error #9780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You are doing an inner merge, which doesn't match. Not sure if we could reliably detect this, as it involves a computation to figure out that you have strings that looks like numbers.
you can also do
|
Thanks for your reply and the quick fix. I actually don't think that one should check whether the string represents numbers or not. I think it is more about whether the dtypes of the columns with the same name match. If you do an inner merge on DataFrames with no matching columns you get a MergeError. I think it would make sense to also throw one if there are matching columns but their dtypes can't be silently cast to match. Especially if one specifically sets the option |
this should raise a |
Suppose the case of an int and float column. I don't think it should raise if you want to merge on those columns? (can eg already typically occur when having NaNs in one of both) |
right we care about 'obvious' mismatches here that by-definition cannot match. so |
Personally, I would leave this as a responsibility of the user. Something else, currently, as @jreback shows above (#9780 (comment)), the strings of
I suppose this happens on purpose? IMO it should return object array and keep the original values (like |
@jorisvandenbossche yes, we should certainly not coerce on mixed types in merging, unless they are losslessly convertible (e.g. int & float), so maybe make a separate issue. But I think we should raise on str/numeric, and datetimelike/(str or numeric), it is simply not possible (and if the user really wants that, then its just a concat). |
That's a good point :-) |
On a second thought, if we disallow merging on str/numeric (the case of the initial example), I don't think are cases left that we would allow but where no coercing should happen? (for which I wanted to open an new issue) |
@jorisvandenbossche can you show example / elaborate on your last?
|
Well, the original example above has a dataframe with integers in the key column, and another dataframe with strings in the key column. They are now coerced to integers, something I think should not happen:
And I wanted to open an new issue for this. But, this is also an example where we would want to raise an error about "incompatible columns to merge on". |
ah I c. I think this should raise a nice errors message (maybe saying that you might want to use concat
merge
so [20,21] are wrong (this should be But I would actually simply raise |
Was #18764 included in the Pandas 0.22.0 release? It looks like @jreback added that to the 0.22.0 milestone on Dec 7, 2017, and it was merged to master 19 days before 0.22.0 was released, but the initial example still fails for me on 0.22.0:
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.3
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None |
#18674 will be in the 0.23 release. The 0.22 release just had one change. You can always view the release notes for a version at http://pandas.pydata.org/pandas-docs/stable/whatsnew.html. |
When merging an int dtype with a str dtype the join does not work:
I think it would be better to get a warning that the join is performed on incompatible column dtypes.
This is my pandas version:
Thanks for all your work on pandas!
The text was updated successfully, but these errors were encountered: