Closed
Description
I have a DataFrame that looks like the following format:
df = pd.DataFrame({'foo': [1, 2, 2], 'bar': [True, False, False]})
I want group this by foo and count the number of True values in the bar column. Counting the True values can be achieved with the sum command.
In [7]: bar = [True, False, True, False, False]
In [8]: sum(bar)
Out[8]: 2
In [9]: sum(df['bar'])
Out[9]: 1
To group and count this:
In [16]: df.groupby('foo').aggregate(sum)
Out[16]:
bar
foo
1 True
2 False
This output is erroneous. Expected output is:
bar
foo
1 1
2 0
It works in the following case (changed so that not all cases for foo:2 are false).
In [18]: df = pd.DataFrame({'foo': [1, 2, 2, 2, 2], 'bar': [True, True, True, False, False]})
In [18]: df.groupby('foo').aggregate(sum)
Out[18]:
bar
foo
1 1
2 2
Here are my installed versions:
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.7.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.3
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None