BUG: Fix export .to_numpy() with nullable type #41196

taytzehao · 2021-04-28T14:44:14Z

closes BUG: series.to_numpy does not work well with pd.Float64Dtype #40630
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

…loat_int

taytzehao · 2021-04-28T16:03:41Z

Under such a situation where a change causes cascading failed cases across multiple tests. Should I be modifying the other test cases?

taytzehao · 2021-04-28T16:04:11Z

If not, what strategy should I adopt?

mzeitlin11

Thanks for the pr @taytzehao. Some comments, plus not 100% sure this is what should be done anyway.

The whole purpose of the new missing value NA is that it means something different than np.nan. Because of that, defaulting to object type makes sense to me because that is the common type between int or float and NA. Also, casting ints to float is not lossless. The behavior of replacing NA with np.nan can still be achieved by specifying a na_value.

mzeitlin11 · 2021-04-28T16:28:25Z

pandas/core/arrays/masked.py

+
+            # For numerical input, pd.na is replaced with np.nan
+            if self._hasna is True:
+                data[np.where(self._mask is True)] = np.nan


This line looks suspect ... self._mask is an ndarray, so won't comparing identity with True just always give False?

Also, what about the specified na_value?

mzeitlin11 · 2021-04-28T16:29:31Z

pandas/core/arrays/masked.py

+            # If there is NA and the data is of int type, a float
+            # is being returned as int type cannot support np.nan.
+            if is_integer_dtype(self) and self._hasna:
+                data = self._data.astype(float)


Doesn't this ignore any specified dtype?

mzeitlin11 · 2021-04-28T16:42:11Z

Under such a situation where a change causes cascading failed cases across multiple tests. Should I be modifying the other test cases?

There are a bunch of test failures, so I think the approach should be looked at more carefully first. Running tests locally should also always be the first step to catch test failures before using CI

datapythonista

As mentioned, your approach doesn't seem to be working, you'll have to review. In general, what you'd want to do to fix a bug is:

Implement a test, as you did, and run the original code and make sure the test fails
Then change the code, and make sure the both the new test and all existing tests work

So, answering your question, you shouldn't modify the tests that are broken with your implementation (except in very rare cases where the tests were wrong, which is not the case).

datapythonista · 2021-05-07T19:10:46Z

pandas/tests/arrays/test_numpy.py

+@pytest.mark.parametrize(
+    "data_content,data_type,expected_result",
+    [
+        ([1, 2, 3], pd.Float64Dtype(), np.array([1, 2, 3], dtype=np.float64)),


Why not build the Series here instead? I think just two parameters data and expected_results is enough (I think expected is descriptive enough and more common, but no big deal).

datapythonista · 2021-05-07T19:12:24Z

doc/source/whatsnew/v1.3.0.rst

@@ -708,6 +708,7 @@ Conversion
 - Bug in :func:`factorize` where, when given an array with a numeric numpy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (:issue:`41132`)
 - Bug in :class:`DataFrame` construction with a dictionary containing an arraylike with ``ExtensionDtype`` and ``copy=True`` failing to make a copy (:issue:`38939`)
 - Bug in :meth:`qcut` raising error when taking ``Float64DType`` as input (:issue:`40730`)
+- Bug in :meth:`BaseMaskedArray.to_numpy` does not output ``numeric_dtype`` with ``numeric_dtype`` input (:issue:`40630`)


I find this sentence a bit difficult to understand, can you rephrase please?

taytzehao added 2 commits April 28, 2021 22:39

to_numpy support numeric dtype

4a0fc08

Merge branch 'master' of github.com:pandas-dev/pandas into to_numpy_f…

7ca36c1

…loat_int

mzeitlin11 reviewed Apr 28, 2021

View reviewed changes

datapythonista reviewed May 7, 2021

View reviewed changes

datapythonista changed the title ~~To numpy float int~~ BUG: Fix export .to_numpy() with nullable type May 7, 2021

datapythonista added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels May 7, 2021

taytzehao marked this pull request as draft May 11, 2021 03:05

taytzehao closed this May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix export .to_numpy() with nullable type #41196

BUG: Fix export .to_numpy() with nullable type #41196

Uh oh!

taytzehao commented Apr 28, 2021

Uh oh!

taytzehao commented Apr 28, 2021

Uh oh!

taytzehao commented Apr 28, 2021

Uh oh!

mzeitlin11 left a comment

Uh oh!

mzeitlin11 Apr 28, 2021

Uh oh!

mzeitlin11 Apr 28, 2021

Uh oh!

mzeitlin11 Apr 28, 2021

Uh oh!

mzeitlin11 commented Apr 28, 2021

Uh oh!

datapythonista left a comment

Uh oh!

datapythonista May 7, 2021

Uh oh!

datapythonista May 7, 2021

Uh oh!

Uh oh!

Uh oh!

BUG: Fix export .to_numpy() with nullable type #41196

BUG: Fix export .to_numpy() with nullable type #41196

Uh oh!

Conversation

taytzehao commented Apr 28, 2021

Uh oh!

taytzehao commented Apr 28, 2021

Uh oh!

taytzehao commented Apr 28, 2021

Uh oh!

mzeitlin11 left a comment

Choose a reason for hiding this comment

Uh oh!

mzeitlin11 Apr 28, 2021

Choose a reason for hiding this comment

Uh oh!

mzeitlin11 Apr 28, 2021

Choose a reason for hiding this comment

Uh oh!

mzeitlin11 Apr 28, 2021

Choose a reason for hiding this comment

Uh oh!

mzeitlin11 commented Apr 28, 2021

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

datapythonista May 7, 2021

Choose a reason for hiding this comment

Uh oh!

datapythonista May 7, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!