Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Passing string as axis argument leads to incorrect behavior #5094

Open
3 tasks done
noloerino opened this issue Oct 5, 2022 · 1 comment
Open
3 tasks done

BUG: Passing string as axis argument leads to incorrect behavior #5094

noloerino opened this issue Oct 5, 2022 · 1 comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@noloerino
Copy link
Collaborator

noloerino commented Oct 5, 2022

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import modin.pandas
import pandas
import numpy as np

def example(pd):
    df = pd.concat([pd.DataFrame({0: [1], 1: [3]}), pd.DataFrame({0: [2], 1: [4]})])
    df = df.rolling(window=2, axis="index")
    result = df.aggregate(np.sum)
    print(result)

print("Pandas:")
example(pandas)
# Pandas:
#     0   1
# 0 NaN NaN
# 0 3.0 7.0
print("Modin:")
example(modin.pandas)
# Modin:
#     0   1
# 0 NaN NaN
# 0 NaN NaN

Issue Description

Many query compiler, dataframe, and partition manager functions assume that the axis argument of a function will either be 0, 1, None, or modin.core.dataframe.base.dataframe.utils.Axis. However, the pandas API (and some of our own test cases) allow passing axis="index" or axis="columns", which has the potential to break some of our code.

The provided example is one such case: on this line in the partition manager, a comparison is made for axis == 0, but axis here is still the string "index". This can manifest itself in more insidious ways in other functions (for example, this line in the dataframe attempts to XOR axis even though it may be a string), though I'm not sure what the appropriate API call to hit that case would be.

Expected Behavior

The above code should match pandas. The axis argument should be normalized to an integer or Axis enum before it is used in comparisons in the query compiler/dataframe/partition manager.

Error Logs

N/A

Installed Versions

INSTALLED VERSIONS

commit : 7871c7b
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.15.0+171.g7871c7bc
ray : 1.13.0
dask : 2022.8.0
distributed : 2022.8.0
hdk : None

pandas dependencies

pandas : 1.5.0
numpy : 1.23.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 61.2.0
pip : 22.1.2
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : 0.8.2
fsspec : 2022.7.1
gcsfs : None
matplotlib : 3.5.3
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.8
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2022.7.1
scipy : 1.9.0
snappy : None
sqlalchemy : 1.4.40
tables : 3.7.0
tabulate : None
xarray : 2022.6.0
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : None

@noloerino noloerino added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Oct 5, 2022
@noloerino
Copy link
Collaborator Author

I plan to address this together with #4909 by adding a check like if axis == "columns": axis = 1 at the start of dataframe map/apply/broadcast methods, but this might still leave residual bugs in other operators that take the axis argument. Feel free to comment if anyone thinks of a better solution, or if this should be fixed separately.

@noloerino noloerino added pandas concordance 🐼 Functionality that does not match pandas P2 Minor bugs or low-priority feature requests and removed Triage 🩹 Issues that need triage labels Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

No branches or pull requests

1 participant