BUG/PERF: Sparse get_dummies uses concat #24372

TomAugspurger · 2018-12-20T12:43:51Z

Working around the DataFrame constructor perf issue in #24368

Fixes deprecation warnings in the ASV files so there's something to run.

Closes #24371

(cherry picked from commit f566b46)

(cherry picked from commit eb219ac)

* Preserve sparsity * Preserve fill value

pep8speaks · 2018-12-20T12:43:54Z

Hello @TomAugspurger! Thanks for submitting the PR.

There are no PEP8 issues in the file asv_bench/benchmarks/join_merge.py !
There are no PEP8 issues in the file asv_bench/benchmarks/panel_ctor.py !
There are no PEP8 issues in the file asv_bench/benchmarks/reindex.py !
There are no PEP8 issues in the file asv_bench/benchmarks/timedelta.py !
There are no PEP8 issues in the file asv_bench/benchmarks/timestamp.py !
There are no PEP8 issues in the file pandas/core/dtypes/concat.py !
There are no PEP8 issues in the file pandas/core/reshape/reshape.py !

TomAugspurger · 2018-12-20T12:45:47Z

doc/source/whatsnew/v0.24.0.rst

 - Bug in :meth:`SparseArary.unique` not returning the unique values (:issue:`19595`)
 - Bug in :meth:`SparseArray.nonzero` and :meth:`SparseDataFrame.dropna` returning shifted/incorrect results (:issue:`21172`)
 - Bug in :meth:`DataFrame.apply` where dtypes would lose sparseness (:issue:`23744`)
+- Bug in :func:`concat` when concatenating a list of :class:`Series` with all-sparse values changing the ``fill_value`` and converting to a dense Series (:issue:`24371`)


When the input to concat is a List[Series[Sparse]], we now return a DataFrame with sparse values. Previously this was a dense DataFrame (probably a bug), so it isn't API breaking.

codecov · 2018-12-20T13:21:51Z

Codecov Report

Merging #24372 into master will decrease coverage by 49.31%.
The diff coverage is 9.09%.

@@             Coverage Diff             @@
##           master   #24372       +/-   ##
===========================================
- Coverage   92.29%   42.97%   -49.32%     
===========================================
  Files         162      162               
  Lines       51832    51836        +4     
===========================================
- Hits        47839    22279    -25560     
- Misses       3993    29557    +25564

Flag	Coverage Δ
#multiple	`?`
#single	`42.97% <9.09%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/reshape/reshape.py	`13.31% <0%> (-86.24%)`	⬇️
pandas/core/dtypes/concat.py	`57.35% <50%> (-39.25%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-98.65%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
... and 122 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6cf7d9...6a65cbc. Read the comment docs.

codecov · 2018-12-20T13:21:51Z

Codecov Report

Merging #24372 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24372      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51832    51836       +4     
==========================================
+ Hits        47839    47843       +4     
  Misses       3993     3993

Flag	Coverage Δ
#multiple	`90.7% <100%> (ø)`	⬆️
#single	`42.98% <9.09%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/dtypes/concat.py	`97.05% <100%> (+0.45%)`	⬆️
pandas/core/reshape/reshape.py	`99.56% <100%> (ø)`	⬆️
pandas/util/testing.py	`87.57% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6cf7d9...6a65cbc. Read the comment docs.

TomAugspurger · 2018-12-20T13:23:37Z

n.b.
6a65cbc has an API breaking change for SparseSeries.unstack. With this PR that returns a DataFrame of sparse values instead of a SparseDataFrame.

jreback · 2018-12-20T13:23:34Z

pandas/core/reshape/reshape.py

    if sparse:
-        sparse_series = {}
+
+        if is_integer_dtype(dtype):


we have a routine in pandas.core.dtypes.missing for this already

na_value_for_dtype, or something else? We need something a little different, since we want the 0 value for each dtype.

we have that too let’s try to not reinvent the wheel

I didn't a function like this in any of the dtypes modules.

TomAugspurger · 2018-12-20T13:34:33Z

Do you know the method name? I don't see one that does it.

…

On Thu, Dec 20, 2018 at 7:30 AM Jeff Reback ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/reshape/reshape.py <#24372 (comment)>: > @@ -909,7 +910,15 @@ def _make_col_name(prefix, prefix_sep, level): index = None if sparse: - sparse_series = {} + + if is_integer_dtype(dtype): we have that too let’s try to not reinvent the wheel — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24372 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIsVh9gjb1l4vfllL5IRhx3YlAv9Mks5u65D5gaJpZM4ZcH5c> .

TomAugspurger · 2018-12-21T15:05:10Z

I think this should be merged soon if possible. The CI failures are blocking #20796 and the explode PR.

I haven't made too much progress on fixing #24368 properly. Too many edge cases in our constructors.

TomAugspurger added 3 commits December 20, 2018 06:41

Fixed warnings in asv files

aa08a6d

(cherry picked from commit f566b46)

avoid series constructor

ae026b2

(cherry picked from commit eb219ac)

BUG: Fix concat(Series[sparse], axis=1)

b253674

* Preserve sparsity * Preserve fill value

TomAugspurger added this to the 0.24.0 milestone Dec 20, 2018

TomAugspurger added Performance Memory or execution speed performance Sparse Sparse Data Type labels Dec 20, 2018

TomAugspurger commented Dec 20, 2018

View reviewed changes

TomAugspurger mentioned this pull request Dec 20, 2018

ENH: Implemented lazy iteration #20796

Merged

4 tasks

SparseSeries unstack

6a65cbc

jreback requested changes Dec 20, 2018

View reviewed changes

TomAugspurger mentioned this pull request Dec 20, 2018

DEPR: Deprecate range-based PeriodIndex construction #24354

Merged

changhiskhan mentioned this pull request Dec 20, 2018

[ENH] Add DataFrame method to explode a list-like column (GH #16538) #24366

Closed

4 tasks

jreback approved these changes Dec 21, 2018

View reviewed changes

jreback merged commit 0bb3772 into pandas-dev:master Dec 21, 2018

TomAugspurger deleted the sparse-perf branch January 2, 2019 20:17

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG/PERF: Sparse get_dummies uses concat (pandas-dev#24372)

1a479f6

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG/PERF: Sparse get_dummies uses concat (pandas-dev#24372)

4c1c3a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG/PERF: Sparse get_dummies uses concat #24372

BUG/PERF: Sparse get_dummies uses concat #24372

Uh oh!

TomAugspurger commented Dec 20, 2018

Uh oh!

pep8speaks commented Dec 20, 2018

Uh oh!

TomAugspurger Dec 20, 2018

Uh oh!

codecov bot commented Dec 20, 2018

Uh oh!

codecov bot commented Dec 20, 2018 •

edited

Loading

Uh oh!

TomAugspurger commented Dec 20, 2018

Uh oh!

jreback Dec 20, 2018

Uh oh!

TomAugspurger Dec 20, 2018

Uh oh!

jreback Dec 20, 2018

Uh oh!

TomAugspurger Dec 20, 2018

Uh oh!

TomAugspurger commented Dec 20, 2018 via email

Uh oh!

TomAugspurger commented Dec 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

BUG/PERF: Sparse get_dummies uses concat #24372

BUG/PERF: Sparse get_dummies uses concat #24372

Uh oh!

Conversation

TomAugspurger commented Dec 20, 2018

Uh oh!

pep8speaks commented Dec 20, 2018

Uh oh!

TomAugspurger Dec 20, 2018

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 20, 2018

Codecov Report

Uh oh!

codecov bot commented Dec 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomAugspurger commented Dec 20, 2018

Uh oh!

jreback Dec 20, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Dec 20, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Dec 20, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Dec 20, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Dec 20, 2018 via email

Uh oh!

TomAugspurger commented Dec 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 20, 2018 •

edited

Loading