Skip to content

BUG: Fix export .to_numpy() with nullable type #41196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

taytzehao
Copy link
Contributor

@taytzehao
Copy link
Contributor Author

Under such a situation where a change causes cascading failed cases across multiple tests. Should I be modifying the other test cases?

@taytzehao
Copy link
Contributor Author

If not, what strategy should I adopt?

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr @taytzehao. Some comments, plus not 100% sure this is what should be done anyway.

The whole purpose of the new missing value NA is that it means something different than np.nan. Because of that, defaulting to object type makes sense to me because that is the common type between int or float and NA. Also, casting ints to float is not lossless. The behavior of replacing NA with np.nan can still be achieved by specifying a na_value.


# For numerical input, pd.na is replaced with np.nan
if self._hasna is True:
data[np.where(self._mask is True)] = np.nan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks suspect ... self._mask is an ndarray, so won't comparing identity with True just always give False?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what about the specified na_value?

# If there is NA and the data is of int type, a float
# is being returned as int type cannot support np.nan.
if is_integer_dtype(self) and self._hasna:
data = self._data.astype(float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this ignore any specified dtype?

@mzeitlin11
Copy link
Member

Under such a situation where a change causes cascading failed cases across multiple tests. Should I be modifying the other test cases?

There are a bunch of test failures, so I think the approach should be looked at more carefully first. Running tests locally should also always be the first step to catch test failures before using CI

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, your approach doesn't seem to be working, you'll have to review. In general, what you'd want to do to fix a bug is:

  • Implement a test, as you did, and run the original code and make sure the test fails
  • Then change the code, and make sure the both the new test and all existing tests work

So, answering your question, you shouldn't modify the tests that are broken with your implementation (except in very rare cases where the tests were wrong, which is not the case).

@pytest.mark.parametrize(
"data_content,data_type,expected_result",
[
([1, 2, 3], pd.Float64Dtype(), np.array([1, 2, 3], dtype=np.float64)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not build the Series here instead? I think just two parameters data and expected_results is enough (I think expected is descriptive enough and more common, but no big deal).

@@ -708,6 +708,7 @@ Conversion
- Bug in :func:`factorize` where, when given an array with a numeric numpy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (:issue:`41132`)
- Bug in :class:`DataFrame` construction with a dictionary containing an arraylike with ``ExtensionDtype`` and ``copy=True`` failing to make a copy (:issue:`38939`)
- Bug in :meth:`qcut` raising error when taking ``Float64DType`` as input (:issue:`40730`)
- Bug in :meth:`BaseMaskedArray.to_numpy` does not output ``numeric_dtype`` with ``numeric_dtype`` input (:issue:`40630`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this sentence a bit difficult to understand, can you rephrase please?

@datapythonista datapythonista changed the title To numpy float int BUG: Fix export .to_numpy() with nullable type May 7, 2021
@datapythonista datapythonista added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels May 7, 2021
@taytzehao taytzehao marked this pull request as draft May 11, 2021 03:05
@taytzehao taytzehao closed this May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: series.to_numpy does not work well with pd.Float64Dtype
3 participants