-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Group-by on an empty data object dtype loses the index name (cython aggregation is ok) #8093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The problem is that pandas, when reading a query, does not know anything about the table structure itself. It constructs the resulting frame only from the returned values from the query. And if there are no values, it cannot determine the dtype. |
By the way, if I run this with pandas 0.14.1, I don't get a AttributeError. |
I'm running this on pandas 0.13.1
Is it different for you? |
yes:
But I don't know which of both is correct. As converting the object column to float is also strange I think. |
Any idea why those 'object' columns are 'float64' after the group-by? Where does the 'day_of_week' column go? This still isn't what I would expect/want from the group-by operation. |
@jreback object columns that get converted to float, is that OK? You can keep the 'day_of_week' column by providing |
@carterk yes, that is by definition, the index of a returned groupby uses the grouper. |
@jreback Yeah, I thought |
@jreback there are two things I don't understand/are a bit strange:
|
neither of those are true the result of the input array deteomes the dtype - they r not coerced name of the groupby column is preserved I suspect the input to the frame creation is not exactly right - save that and u will see |
What do you exactly mean with 'input to frame creation is not exactly right'?
|
I think that the object dtype causes a python aggregation (while the float is a cython aggregation). somewhere the name is getting lost. call this a bug. |
@TomAugspurger This seems already fixed. In [1]: import pandas as pd
In [2]: print(pd.__version__)
0.21.0.dev+627.ge001500cb.dirty
In [3]: df = pd.DataFrame(columns=list('ABC'))
...: df.groupby('A').sum().index.name
...:
Out[3]: 'A' |
Thanks, it'd be nice to ensure we have a regression test in place.
…On Tue, Oct 24, 2017 at 11:08 AM, Licht Takeuchi ***@***.***> wrote:
@TomAugspurger <https://github.com/tomaugspurger> This seems fixed.
In [1]: import pandas as pd
In [2]: print(pd.__version__)0.21.0.dev+627.ge001500cb.dirty
In [3]: df = pd.DataFrame(columns=list('ABC'))
...: df.groupby('A').sum().index.name
...:
Out[3]: 'A'
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8093 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpFjvpbZnqDd4cv5dtNizOT4gk1rks5svgubgaJpZM4CaIHr>
.
|
@TomAugspurger Okay. I'll do that. |
If
pd.read_sql
is used to load a data frame with the results of an SQL query that returns no results, the columns in the data frame will be of type 'object'. That type cannot be aggregated, so a subsequent group-by operation on that empty data frame will drop all the columns. So instead of 'profit' in the below example being an empty series, an attribute error is thrown because the columns 'revenue' and 'expenses' cannot be found in the data frame.Two things I can think of that could fix this:
pd.read_sql
populate the data frame with empty columns of the correct type even if the SQL query returns no results. Then the group-by would not drop the columns because they are of a type that can be aggregated.groupby
to not drop columns of types that cannot be aggregated: maybe a drop_non_agg flag. I think not dropping columns of types that cannot be aggregated should be the default behaviour. Columns with data that cannot be aggregated can just be populated with null after a group-by.I think 1) probably should be implemented, and 2) is kind of a design decision.
You can run this code to reproduce the issue.
The text was updated successfully, but these errors were encountered: