-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix export .to_numpy() with nullable type #41196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
taytzehao
commented
Apr 28, 2021
- closes BUG: series.to_numpy does not work well with pd.Float64Dtype #40630
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
Under such a situation where a change causes cascading failed cases across multiple tests. Should I be modifying the other test cases? |
If not, what strategy should I adopt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr @taytzehao. Some comments, plus not 100% sure this is what should be done anyway.
The whole purpose of the new missing value NA
is that it means something different than np.nan
. Because of that, defaulting to object type makes sense to me because that is the common type between int or float and NA
. Also, casting ints to float is not lossless. The behavior of replacing NA
with np.nan
can still be achieved by specifying a na_value
.
|
||
# For numerical input, pd.na is replaced with np.nan | ||
if self._hasna is True: | ||
data[np.where(self._mask is True)] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line looks suspect ... self._mask
is an ndarray, so won't comparing identity with True just always give False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what about the specified na_value?
# If there is NA and the data is of int type, a float | ||
# is being returned as int type cannot support np.nan. | ||
if is_integer_dtype(self) and self._hasna: | ||
data = self._data.astype(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this ignore any specified dtype?
There are a bunch of test failures, so I think the approach should be looked at more carefully first. Running tests locally should also always be the first step to catch test failures before using CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, your approach doesn't seem to be working, you'll have to review. In general, what you'd want to do to fix a bug is:
- Implement a test, as you did, and run the original code and make sure the test fails
- Then change the code, and make sure the both the new test and all existing tests work
So, answering your question, you shouldn't modify the tests that are broken with your implementation (except in very rare cases where the tests were wrong, which is not the case).
@pytest.mark.parametrize( | ||
"data_content,data_type,expected_result", | ||
[ | ||
([1, 2, 3], pd.Float64Dtype(), np.array([1, 2, 3], dtype=np.float64)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not build the Series here instead? I think just two parameters data
and expected_results
is enough (I think expected
is descriptive enough and more common, but no big deal).
@@ -708,6 +708,7 @@ Conversion | |||
- Bug in :func:`factorize` where, when given an array with a numeric numpy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (:issue:`41132`) | |||
- Bug in :class:`DataFrame` construction with a dictionary containing an arraylike with ``ExtensionDtype`` and ``copy=True`` failing to make a copy (:issue:`38939`) | |||
- Bug in :meth:`qcut` raising error when taking ``Float64DType`` as input (:issue:`40730`) | |||
- Bug in :meth:`BaseMaskedArray.to_numpy` does not output ``numeric_dtype`` with ``numeric_dtype`` input (:issue:`40630`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this sentence a bit difficult to understand, can you rephrase please?