Closed
Description
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Question about pandas
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df.A = df.agg('City_{0[A]}'.format, axis=1)
df.index = (df.index)*int(np.random.randint(6,23)**0.5)+int(np.random.randint(2,23))
This is my df:
A | B | C | D |
---|---|---|---|
City_2 | 3 | 7 | 3 |
City_0 | 4 | 8 | 9 |
City_1 | 1 | 2 | 1 |
City_5 | 5 | 0 | 9 |
City_5 | 1 | 6 | 0 |
City_0 | 3 | 8 | 6 |
City_7 | 2 | 6 | 6 |
City_1 | 6 | 0 | 2 |
City_8 | 6 | 2 | 4 |
City_2 | 2 | 5 | 6 |
If I do
df.groupby([df.index//5,df.A], as_index=True).mean()
or
df.groupby([df.index//5,df.A]).mean()
Result:
B | C | D | ||
---|---|---|---|---|
A | ||||
1 | City_2 | 3 | 7 | 3 |
2 | City_0 | 4 | 8 | 9 |
3 | City_1 | 1 | 2 | 1 |
City_5 | 5 | 0 | 9 | |
4 | City_0 | 3 | 8 | 6 |
City_5 | 1 | 6 | 0 | |
5 | City_7 | 2 | 6 | 6 |
6 | City_1 | 6 | 0 | 2 |
City_8 | 6 | 2 | 4 | |
7 | City_2 | 2 | 5 | 6 |
So If do .reset_index()
I will get this (that index with no column name will get name as level_0
):
level_0 | A | B | C | D |
---|---|---|---|---|
1 | City_2 | 3 | 7 | 3 |
2 | City_0 | 4 | 8 | 9 |
3 | City_1 | 1 | 2 | 1 |
3 | City_5 | 5 | 0 | 9 |
4 | City_0 | 3 | 8 | 6 |
4 | City_5 | 1 | 6 | 0 |
5 | City_7 | 2 | 6 | 6 |
6 | City_1 | 6 | 0 | 2 |
6 | City_8 | 6 | 2 | 4 |
7 | City_2 | 2 | 5 | 6 |
The same is not with the as_index=False
df.groupby([df.index//5,df.A], as_index=False).mean()
will give me
A | B | C | D |
---|---|---|---|
City_2 | 3 | 7 | 3 |
City_0 | 4 | 8 | 9 |
City_1 | 1 | 2 | 1 |
City_5 | 5 | 0 | 9 |
City_0 | 3 | 8 | 6 |
City_5 | 1 | 6 | 0 |
City_7 | 2 | 6 | 6 |
City_1 | 6 | 0 | 2 |
City_8 | 6 | 2 | 4 |
City_2 | 2 | 5 | 6 |
I will lose the groupby index info
.
My question:
shouldn't be as_index=False
give df.index//5
column also. Or as_index
is designed to work like this only.