Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix concat of Sparse with non-sparse dtypes #34338

Merged

Conversation

jorisvandenbossche
Copy link
Member

Closes #34336

This adds tests with the behaviour as it was on 0.25.3 / 1.0.3, and some changes to get back to that behaviour.
(but whether this behaviour is fully "sane", I am not sure ..)

@jorisvandenbossche jorisvandenbossche added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels May 23, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone May 23, 2020
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I'm not totally sure about the expected behavior of concat([Sparse, Categorical]). I suspect it was never properly discussed / intentionally implemented. Happy to just match 1.0.3 behavior for now htough.

pandas/core/dtypes/concat.py Show resolved Hide resolved
Co-authored-by: Tom Augspurger <TomAugspurger@users.noreply.github.com>
@jorisvandenbossche
Copy link
Member Author

OK, will go forward with this PR as is (matching the previous behaviour) to unblock the other concat PRs, but will open a set of follow-up issues.

@@ -1063,7 +1063,8 @@ def astype(self, dtype=None, copy=True):
"""
dtype = self.dtype.update_dtype(dtype)
subtype = dtype._subtype_with_str
sp_values = astype_nansafe(self.sp_values, subtype, copy=copy)
# TODO copy=False is broken for astype_nansafe with int -> float
sp_values = astype_nansafe(self.sp_values, subtype, copy=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #34456 for this (and added link in the comment)

if is_sparse(arr.dtype) and not is_sparse(dtype):
# problem case: SparseArray.astype(dtype) doesn't follow the specified
# dtype exactly, but converts this to Sparse[dtype] -> first manually
# convert to dense array
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #34457 for this

@jorisvandenbossche
Copy link
Member Author

And opened #34459 for the general "what should concat(sparse, categorical) do?" question

@@ -20,7 +21,7 @@
)
from pandas.core.dtypes.generic import ABCCategoricalIndex, ABCRangeIndex, ABCSeries

from pandas.core.arrays import ExtensionArray
from pandas.core.arrays import ExtensionArray, SparseArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I already pushed a fix.
I don't really understand why, though, as SparseArray is included in the pandas.core.arrays init, just as ExtensionArray.

@jorisvandenbossche jorisvandenbossche merged commit cc63484 into pandas-dev:master May 29, 2020
@jorisvandenbossche jorisvandenbossche deleted the concat-sparse-object branch June 7, 2020 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: concat of Sparse with incompatible dtype now gives Sparse[object] instead of object
2 participants