-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: preserve categorical & sparse types when grouping / pivot #27071
Conversation
that test is not appropriate for checking the results of this change as most of those ops don't work on ordered categoricals; i have covered the most common of first/last/min/max above. |
Codecov Report
@@ Coverage Diff @@
## master #27071 +/- ##
==========================================
- Coverage 92.04% 90.66% -1.39%
==========================================
Files 180 180
Lines 50714 50727 +13
==========================================
- Hits 46680 45991 -689
- Misses 4034 4736 +702
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #27071 +/- ##
===========================================
+ Coverage 41.96% 92.02% +50.06%
===========================================
Files 180 180
Lines 50707 50727 +20
===========================================
+ Hits 21277 46681 +25404
+ Misses 29430 4046 -25384
Continue to review full report at Codecov.
|
pandas/core/internals/blocks.py
Outdated
try: | ||
|
||
result = self._holder._from_sequence( | ||
np.asarray(result).ravel(), dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned by the asarray
here. Is that just so we can do the .ravel
?
Consider a silly example like
df.groupby('key').apply(lambda x: x.array)
Will that end up hitting this, and so calling asarray
and converting to ndarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this make sense?
df.groupby('A').B.apply(lambda x: x.array)
(Pdb) p df
A B
0 1 2000-01-01 18:00:00-06:00
1 1 2000-01-01 18:00:00-06:00
2 2 NaT
3 2 NaT
4 3 1999-12-31 18:00:00-06:00
5 3 1999-12-31 18:00:00-06:00
6 1 2000-01-01 18:00:00-06:00
7 4 2000-01-02 18:00:00-06:00
(Pdb) p result
A
1 [2000-01-01 18:00:00-06:00, 2000-01-01 18:00:0...
2 [NaT, NaT]
3 [1999-12-31 18:00:00-06:00, 1999-12-31 18:00:0...
4 [2000-01-02 18:00:00-06:00]
Name: B, dtype: object
9b8f2b4
to
25e8d1b
Compare
Does this also preserve the dtypes under transpose? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
no that's a more general issues |
@jorisvandenbossche any idea what this means? |
result = ts.resample('3T').mean() | ||
expected = Series([1, 4, 7], | ||
index=pd.date_range('1/1/2000', periods=3, freq='3T'), | ||
dtype='Int64') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback @jorisvandenbossche why returning Int64 here? I would expect float64 or Float64.
e.g. if we do ts[-1] += 1
before the resample, the mean comes back as float64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be Float64, because it is only accidentally that the results are all integer-like.
This is one of the cases that I listed in #37494
closes #18502
replaces #26550