-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/PERF: Sparse get_dummies uses concat #24372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @TomAugspurger! Thanks for submitting the PR.
|
@@ -1613,6 +1613,7 @@ Sparse | |||
- Bug in :meth:`SparseArary.unique` not returning the unique values (:issue:`19595`) | |||
- Bug in :meth:`SparseArray.nonzero` and :meth:`SparseDataFrame.dropna` returning shifted/incorrect results (:issue:`21172`) | |||
- Bug in :meth:`DataFrame.apply` where dtypes would lose sparseness (:issue:`23744`) | |||
- Bug in :func:`concat` when concatenating a list of :class:`Series` with all-sparse values changing the ``fill_value`` and converting to a dense Series (:issue:`24371`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the input to concat
is a List[Series[Sparse]]
, we now return a DataFrame with sparse values. Previously this was a dense DataFrame (probably a bug), so it isn't API breaking.
Codecov Report
@@ Coverage Diff @@
## master #24372 +/- ##
===========================================
- Coverage 92.29% 42.97% -49.32%
===========================================
Files 162 162
Lines 51832 51836 +4
===========================================
- Hits 47839 22279 -25560
- Misses 3993 29557 +25564
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24372 +/- ##
==========================================
+ Coverage 92.29% 92.29% +<.01%
==========================================
Files 162 162
Lines 51832 51836 +4
==========================================
+ Hits 47839 47843 +4
Misses 3993 3993
Continue to review full report at Codecov.
|
n.b. |
@@ -909,7 +910,15 @@ def _make_col_name(prefix, prefix_sep, level): | |||
index = None | |||
|
|||
if sparse: | |||
sparse_series = {} | |||
|
|||
if is_integer_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a routine in pandas.core.dtypes.missing for this already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
na_value_for_dtype
, or something else? We need something a little different, since we want the 0 value for each dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have that too let’s try to not reinvent the wheel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't a function like this in any of the dtypes modules.
Do you know the method name? I don't see one that does it.
…On Thu, Dec 20, 2018 at 7:30 AM Jeff Reback ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/reshape/reshape.py
<#24372 (comment)>:
> @@ -909,7 +910,15 @@ def _make_col_name(prefix, prefix_sep, level):
index = None
if sparse:
- sparse_series = {}
+
+ if is_integer_dtype(dtype):
we have that too let’s try to not reinvent the wheel
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24372 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIsVh9gjb1l4vfllL5IRhx3YlAv9Mks5u65D5gaJpZM4ZcH5c>
.
|
Working around the DataFrame constructor perf issue in #24368
Fixes deprecation warnings in the ASV files so there's something to run.
Closes #24371