-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix categorical from codes nan 21767 #21775
Fix categorical from codes nan 21767 #21775
Conversation
pandas/core/arrays/categorical.py
Outdated
@@ -628,6 +628,8 @@ def from_codes(cls, codes, categories, ordered=False): | |||
categorical. If not given, the resulting categorical will be | |||
unordered. | |||
""" | |||
if isna(codes).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the array type, rather than this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_integer_dtype
lgtm. ping on green. |
I would slightly prefer to warn for a version when people provide floats. The current changes breaks We can consider checking whether |
Codecov Report
@@ Coverage Diff @@
## master #21775 +/- ##
==========================================
- Coverage 92.07% 92.06% -0.01%
==========================================
Files 170 170
Lines 50688 50700 +12
==========================================
+ Hits 46671 46678 +7
- Misses 4017 4022 +5
Continue to review full report at Codecov.
|
pandas/core/arrays/categorical.py
Outdated
@@ -628,8 +629,11 @@ def from_codes(cls, codes, categories, ordered=False): | |||
categorical. If not given, the resulting categorical will be | |||
unordered. | |||
""" | |||
codes = np.asarray(codes) | |||
if not is_integer_dtype(codes): | |||
raise ValueError("codes need to be array-like integers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so can you check for float dtype here and issue a warning, something like
if is_float_dtype(codes):
icodes = codes.astype('i8')
if (icodes != codes).all():
raise ValueError("not integer-like float codes are invalid")
warn.warning("float codes will be disallowed in the future", FutureWarning)
codes = icodes
and tests for same.
Hello @miker985! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 31, 2018 at 14:49 Hours UTC |
be7b765
to
7b42fa4
Compare
pandas/core/arrays/categorical.py
Outdated
FutureWarning) | ||
else: | ||
err = True | ||
if err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using err
, you can create the message say around line 635 like
msg = ......
and then just raise ValueError(msg)
where needed, much easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not is_integer_dtype(codes):
msg = "codes need to be array-like integers"
if is_float_dtype(codes):
icodes = codes.astype('i8')
if (icodes == codes).all():
err = False
codes = icodes
warn("float codes will be disallowed in the future",
FutureWarning)
else:
raise ValueError(msg)
else:
raise ValueError(msg)
is that what you want? That doesn't read easier to me, but I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the msg
way to the err
way as well.
# GH21767 | ||
codes = [1.0, 2.0, 0] | ||
categories = ['a', 'b', 'c'] | ||
with warnings.catch_warnings(record=True) as w: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_produces_warning
here.
you may also have to change some tests which inadvertantly are testing using float codes, so pls change those (or you can assert_produces_warning there too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change to assert_produces_warning
No tests display as having warnings when I run pytest pandas/tests/categorical/test_constructors.py
Please tell me what you want me to fix, because I see nothing obvious
doc/source/whatsnew/v0.24.0.txt
Outdated
- | ||
- | ||
- | ||
- Bug in :meth:`Categorical.from_codes` where `NaN` values in `codes` were silently converted to `0` (:issue:`21767`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double-backticks around NaN & 0.
doc/source/whatsnew/v0.24.0.txt
Outdated
- | ||
- | ||
- | ||
- Bug in :meth:`Categorical.from_codes` where `NaN` values in `codes` were silently converted to `0` (:issue:`21767`) | ||
|
||
Datetimelike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a note in deprection section about from_codes accepting floats.
codes = [1.0, 2.0, 0] # integer, but in float dtype | ||
categories = ['a', 'b', 'c'] | ||
|
||
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to check the stacklevel here. You may need to adjust it in from_codes
so that the correct line of user-code is reported.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -240,7 +240,7 @@ Deprecations | |||
- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated the ``encoding`` argument. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`). | |||
- :meth:`MultiIndex.to_hierarchical` is deprecated and will be removed in a future version (:issue:`21613`) | |||
- :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`) | |||
- | |||
- :meth:`Categorical.from_codes` has deprecated providing float values for the ``codes`` argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference this PR number.
Note the future behavior (ValueError)
doc/source/whatsnew/v0.24.0.txt
Outdated
- | ||
- | ||
- | ||
- Bug in :meth:`Categorical.from_codes` where ``NaN`` values in `codes` were silently converted to ``0`` (:issue:`21767`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're also changing the behavior of .from_codes([1.1, 2.0])
. Make a note of that here.
Also note that both of these cases now raise a ValueError
.
pandas/core/arrays/categorical.py
Outdated
if (icodes == codes).all(): | ||
err = False | ||
codes = icodes | ||
warn("float codes will be disallowed in the future", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say that in the future this will raise a ValueError
.
pandas/core/arrays/categorical.py
Outdated
FutureWarning) | ||
else: | ||
err = True | ||
if err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the msg
way to the err
way as well.
@miker985 can you merge in master and fix the conflict? |
5bcf6f6
to
990240e
Compare
@jreback It says you have requested changes but I think I have addressed them. Can you comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny comment
pls rebase in master and be sure no warning are generated by the current code (is why u r failing i think)
pls fix any occurrences of this (as u r already testing the warning)
pandas/core/arrays/categorical.py
Outdated
@@ -629,8 +631,21 @@ def from_codes(cls, codes, categories, ordered=False): | |||
categorical. If not given, the resulting categorical will be | |||
unordered. | |||
""" | |||
codes = np.asarray(codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add a comment with the issue reference here
990240e
to
3d955d1
Compare
@jreback pinging again - I'm not clear if I've fixed the requested changes and the tool is broken, or if I've missed something (it shows changes requested, but then says the area you requested them is outdated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miker985 just a small addition to the test. ping no green.
categories = ['a', 'b', 'c'] | ||
|
||
with tm.assert_produces_warning(FutureWarning): | ||
Categorical.from_codes(codes, categories) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can you assert the results of the float converted to integer here (as it just warns but doesn't raise).
@jreback ping |
thanks @miker985 nice patch! |
* master: (47 commits) Run tests in conda build [ci skip] (pandas-dev#22190) TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165) CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184) Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179) DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175) DOC: updated Series.str.contains see also section (pandas-dev#22176) 0.23.4 whatsnew (pandas-dev#22177) fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973) BUG: Fix get dummies unicode error (pandas-dev#22131) Fixed py36-only syntax [ci skip] (pandas-dev#22167) DEPR: pd.read_table (pandas-dev#21954) DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119) BUG: Matplotlib scatter datetime (pandas-dev#22039) CLN: Use public method to capture UTC offsets (pandas-dev#22164) implement tslibs/src to make tslibs self-contained (pandas-dev#22152) Fix categorical from codes nan 21767 (pandas-dev#21775) BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125) use memoryviews instead of ndarrays (pandas-dev#22147) Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155) API: Default to_* methods to compression='infer' (pandas-dev#22011) ...
pandas.Categorical.from_codes
incorrectly converts NaN codes to 0. #21767git diff upstream/master -u -- "*.py" | flake8 --diff