-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
Code Sample
import numpy as np
import pandas as pd
df = pd.DataFrame({"col1": list("ABBC"), "col2": list("ZZXY")}).astype("string")
conditions = [
(df["col2"] == "Z") & (df["col1"] == "A"),
(df["col2"] == "Z") & (df["col1"] == "B"),
(df["col1"] == "B"),
]
print((df["col2"] == "Z").dtype) #BooleanDtype
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
# TypeError: invalid entry 0 in condlist: should be boolean ndarrayProblem description
When this string dataframe is cast to dtype "string", it seems subsequent columns created from it default to pandas built in types. For instance, now a boolean field created from one of these columns is of type BooleanDtype, pandas nullable boolean type.
print((df["col2"] == "Z").dtype) #BooleanDtypeHowever for some reason numpy is not seeing it as a boolean.
If all of the conditions are first cast to type bool then it works.
conditions = [i.astype(bool) for i in conditions]
df['color'] = np.select(conditions, choices, default='black')Expected Output
I would expect the np.select function to work, even with BooleanDtype columns.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 5f648bf
python : 3.8.11.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 1.3.2
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.0.1
setuptools : 52.0.0.post20210125
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.26.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None