Skip to content

TST: add test for df.where() with category dtype #29454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ganevgv
Copy link
Contributor

@ganevgv ganevgv commented Nov 7, 2019

@gfyoung gfyoung added Categorical Categorical Data Type Testing pandas testing functions or related to the test suite labels Nov 7, 2019
@gfyoung
Copy link
Member

gfyoung commented Nov 7, 2019

@ganevgv : Good job with the tests so far! #16979 is a two-parter, and since both parts have been fixed on master, they both need tests.

If you would like, you can add (in this PR) a second test to address the .astype error.

@gfyoung gfyoung added this to the 1.0 milestone Nov 7, 2019
@ganevgv
Copy link
Contributor Author

ganevgv commented Nov 8, 2019

@gfyoung, thanks for the guidance up to now! After adding my second test test_df_where_change_dtype(...) in this PR, there seems to be some inconsistencies across the different operating systems.

df = DataFrame(np.arange(2 * 3).reshape(2, 3), columns=list("ABC"))
mask = np.array([[True, False, True], [False, True, True]])

result = df.where(mask)
expected = DataFrame([[0, np.nan, 2], [np.nan, 4, 5]], columns=list("ABC"))

tm.assert_frame_equal(result, expected)

After running the code above, on some operating systems (linux -- Python 3.6.7, win32 -- Python 3.6.7, win32 -- Python 3.7.5) applying the boolean mask changes the dtype of the last column [2, 5] from int64 to int32. I cannot reproduce this behaviour locally.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 35e91f9 python : 3.6.5.final.0 python-bits : 64 OS : Darwin OS-release : 17.7.0 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 0.26.0.dev0+734.g0de99558b.dirty
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 39.0.1
Cython : 0.29.14
pytest : 5.2.2
hypothesis : 4.42.6
sphinx : 2.2.1
blosc : 1.8.1
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : 0.3.5
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.6.1
xarray : 0.14.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2

What would be your suggestion? I'm happy to remove or xfail the test_df_where_change_dtype(...) test from this PR (so we can merge it) and raise an issue. I can also force the dtype but that's not ideal.

@gfyoung
Copy link
Member

gfyoung commented Nov 8, 2019

@ganevgv : Why don't you revert your last commit, and we can focus on it in a separate PR.

@ganevgv
Copy link
Contributor Author

ganevgv commented Nov 8, 2019

@gfyoung thanks - removed the second test in this PR, will create a new issue specifying the problem

@gfyoung
Copy link
Member

gfyoung commented Nov 8, 2019

Actually, can you comment in the original issue? I plan on keeping that one open.

@gfyoung gfyoung merged commit 6dbd2b1 into pandas-dev:master Nov 8, 2019
keechongtan added a commit to keechongtan/pandas that referenced this pull request Nov 11, 2019
…ndexing-1row-df

* upstream/master: (109 commits)
  stronger typing in libreduction (pandas-dev#29502)
  API: rename labels to codes (pandas-dev#29509)
  CLN: remove unnecessary type checks (pandas-dev#29517)
  implement _BaseGrouper (pandas-dev#29520)
  CLN: F-string formatting in pandas/_libs/*.pyx (pandas-dev#29527)
  Fixed more SS03 errors (pandas-dev#29540)
  consolidate dim checks (pandas-dev#29536)
  REF: separate out _get_cython_func_and_vals (pandas-dev#29537)
  remove unnecessary exception (pandas-dev#29538)
  TST:Add test to check single category col returns series with single row slice (pandas-dev#29521)
  Make color validation more forgiving (pandas-dev#29122)
  DOC: update bottleneck repo and documentation urls (pandas-dev#29516)
  TST: add test for df construction from dict with tuples (pandas-dev#29497)
  add test for pd.melt dtypes preservation (pandas-dev#29510)
  updated DataFrame.equals docstring (pandas-dev#29496)
  Resolved merge conflicts (pandas-dev#29506)
  DOC: Improved pandas/compact/__init__.py (pandas-dev#29507)
  DOC: Update performance comparison section of io docs (pandas-dev#28890)
  TST: add test for df.where() with category dtype (pandas-dev#29454)
  DOC: Fix docs on merging categoricals. (pandas-dev#28185)
  ...
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants