Skip to content

BUG: boolean frames multiplied by floats have dtypes=object #18549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
erbian opened this issue Nov 28, 2017 · 6 comments · Fixed by #41674
Closed

BUG: boolean frames multiplied by floats have dtypes=object #18549

erbian opened this issue Nov 28, 2017 · 6 comments · Fixed by #41674
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@erbian
Copy link
Contributor

erbian commented Nov 28, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
(pd.DataFrame(True, list('ab'), list('cd')) * 1.0).dtypes
#  returns object

type(True * 1.0)
# returns float

Problem description

A boolean frame multiplied by a float should return a DataFrame of floats to be consistent with python scalar operations (e.g., bool * float -> float). Instead it returns dtypes = object.

Prior to 0.21.0, I believe this was the case?

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0
pytest: 3.2.5
pip: 9.0.1
setuptools: 37.0.0
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: 0.10.0
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
sqlalchemy: 1.1.15
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: 0.2.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Nov 29, 2017

yeah this looks like it should coerce. would take a PR to fix.

@jreback jreback added Bug Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions labels Nov 29, 2017
@jreback jreback added this to the Next Major Release milestone Nov 29, 2017
@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 29, 2017
@jreback jreback changed the title boolean frames multiplied by floats have dtypes=object BUG: boolean frames multiplied by floats have dtypes=object Nov 29, 2017
This was referenced Dec 3, 2017
@kgoehner
Copy link

This seems to be resolved. At least as of f7d162b when I tried to reproduce it.

>>> (pd.DataFrame(True, list('ab'), list('cd')) * 1.0).dtypes
c    float64
d    float64
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f7d162b
python : 3.7.4.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.26.0.dev0+555.gf7d162b18.dirty
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : 4.36.2
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
s3fs : 0.3.4
scipy : 1.3.1
sqlalchemy : 1.3.9
tables : 3.5.1
xarray : 0.13.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1

@mroeschke
Copy link
Member

Would you like to contribute a regression test @Kazz47?

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Difficulty Intermediate Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 13, 2019
@kgoehner
Copy link

kgoehner commented Oct 18, 2019

Looks like in some platforms the bool is being coerced to a int32 rather than the expected int64. Any ideas where this might be happening or how to reproduce it?

self = <pandas.tests.arithmetic.test_numeric.TestNumericArraylikeArithmeticWithBool object at 0xd5d8258c>
num = 1, all_arithmetic_functions = <built-in function mod>
box_with_array = <class 'pandas.core.frame.DataFrame'>

    @pytest.mark.parametrize("num", [1.0, 1])
    def test_array_like_bool_and_num_op_coerce(
        self, num, all_arithmetic_functions, box_with_array
    ):
        # GH 18549
        op = all_arithmetic_functions
        expected = [op(num, num)]
        expected = tm.box_expected(expected, box_with_array)
        bool_box = tm.box_expected([True], box_with_array)
        try:
>           tm.assert_equal(expected, op(bool_box, num))
E           AssertionError: Attributes are different
E           
E           Attribute "dtype" are different
E           [left]:  int64
E           [right]: int32

@mroeschke
Copy link
Member

Guessing there's a routine that's either calling astype(int) (which default to platform int) or forcing astype(np.int64) somewhere. @jbrockmendel would know more

@jbrockmendel
Copy link
Member

I'd guess that it is actually the python integer 1 being coerced to int32 by either numpy or numexpr

ShaharNaveh pushed a commit to ShaharNaveh/pandas that referenced this issue Feb 1, 2020
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Sep 21, 2020
@mroeschke mroeschke mentioned this issue May 26, 2021
8 tasks
@jreback jreback removed this from the Contributions Welcome milestone May 26, 2021
@jreback jreback added this to the 1.3 milestone May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
5 participants