Skip to content

BUG: Wrong result after comparison SparseArray with arrays #45284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
bdrum opened this issue Jan 9, 2022 · 1 comment
Open
3 tasks done

BUG: Wrong result after comparison SparseArray with arrays #45284

bdrum opened this issue Jan 9, 2022 · 1 comment
Assignees
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type

Comments

@bdrum
Copy link
Contributor

bdrum commented Jan 9, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

s = pd.arrays.SparseArray([1, 2, 3, 4, np.nan, np.nan], fill_value=np.nan)
s > [3,3,4,1,0,0]
# [False, False, False, True, False, False]
# Fill: False
# IntIndex
# Indices: array([0, 1, 2, 3, 4, 5], dtype=int32)

Issue Description

This issue is the consequence of global problem described in #45126.

The other part of the issue (comparison with scalars) already solved and merged: #44956, #45110.

I've separated this issue due to discussion in #45125.

In the mentioned pr I've also added two test and marked their as xfail in tests/extension/test_sparse.py:
TestComparisonOps::test_array
TestComparisonOps::test_sparse_array

Please pay attention that now boolean mask for SparseArray based on indices, i.e. such construction:

s = pd.arrays.SparseArray([1, 2, 3, 4, np.nan, np.nan], fill_value=np.nan)
s[s > [3,3,4,1,0,0]]

will return wrong result:

[1.0, 2.0, 3.0, 4.0, nan, nan]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3], dtype=int32)

instead of

[4.0]
Fill: nan
IntIndex
Indices: array([0], dtype=int32)

Expected Behavior

s = pd.arrays.SparseArray([1, 2, 3, 4, np.nan, np.nan], fill_value=np.nan)
s > [3,3,4,1,0,0]
# [False, False, False, True, False, False]
# Fill: False
# IntIndex
# Indices: array([3], dtype=int32)

Installed Versions

INSTALLED VERSIONS

commit : ccb25ab
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.60.1-microsoft-standard-WSL2
Version : #1 SMP Wed Aug 25 23:20:18 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.0.dev0+1613.gccb25ab1d2
numpy : 1.22.0
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.0.4
Cython : 0.29.26
pytest : 6.2.5
hypothesis : 6.34.2
sphinx : 4.3.2
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.30.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : None
gcsfs : 2021.11.0
matplotlib : 3.5.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.3
sqlalchemy : 1.4.29
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1
zstandard : None

@bdrum bdrum added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 9, 2022
@lithomas1 lithomas1 added Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 10, 2022
@bdrum
Copy link
Contributor Author

bdrum commented Jan 20, 2022

take

bdrum added a commit to bdrum/pandas that referenced this issue Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

2 participants