BUG: pandas std broken, erratic behavior #11524

Qu-Bit · 2015-11-05T11:03:46Z

I seem to have encountered a bug while using
DataFrame.apply(np.std) or DF.groupby.agg(np.std)

tested on:

debian 7.9 (wheezy), pandas version: 0.16.2.dev
ubuntu 15.10, pandas version: 0.15.0
- more version details for the tested snippets below

test code:

import numpy as np
import pandas as pd
print("pandas version: ", pd.__version__)


# why is there a difference ?
s = pd.Series([1.11] * 10)
print(np.std(s))
#1.88486436615e-08
print(np.std(s.values))
#2.22044604925e-16

# why is it significantly != 0 ?
# why is there a difference ?
print(np.std(pd.Series([53426.7756333882,] * 50)))
#0.0011048543456
print(np.std(pd.Series([53426.7756333882,] * 123)))
#0.000704429402084

# doing that with data frames
df = pd.DataFrame([538512.198638,] * 123)
print(df.apply(np.std))
#0    0.030867
# dtype: float64
print(np.std(df))
#0    0.030867
# dtype: float64
print(np.std(df.values))
#2.32830643654e-10

using the following data set even NaNs appear as result for std
funnily this happens if you add enough digits after the comma

import numpy as np
import pandas as pd
print("pandas version: ", pd.__version__)

df = pd.DataFrame([538512.1986379109,]*126)
df['uid']=15
df.set_index('uid', inplace=True)

# why do I occasionally even get NaN for identical values?
# NaN appears if there are enough decimal places after the comma 
print(df.groupby(level=0).agg([np.mean, np.std]))
#                 0    
#              mean std
#uid                   
#15   538512.198638 NaN
df['bla'] = 911.
print(df.groupby(level=0).agg([np.mean, np.std]))
#                 0      bla    
#              mean std mean std
#uid                            
#15   538512.198638 NaN  911   0
df['bla'] = 911.12351171571542312243214
print(df.groupby(level=0).agg([np.mean, np.std]))
#                 0      bla    
#              mean std mean std
#uid                            
#15   538512.198638 NaN  911 NaN

bug hypotheses:
this looks like a detail-problem regarding the way pandas applies functions
is the size of the mapped-on container calculated wrong? (e.g. not the single column series length is used, but the overall container's number of elements)

version details: (ubuntu 15.10)

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-43-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_AT.UTF-8

pandas: 0.15.0
nose: 1.3.6
Cython: 0.23.3
numpy: 1.8.2
scipy: 0.14.1
statsmodels: 0.5.0
IPython: 2.3.0
sphinx: 1.2.3
patsy: 0.4.0
dateutil: 2.2
pytz: 2014.10
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: 2.3.0-b1
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
rpy2: 2.6.2
sqlalchemy: 1.0.8
pymysql: 0.6.2.None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)

Qu-Bit · 2015-11-05T11:35:20Z

suggesting labels
#BUG, #Prio-high, #Numeric, #Internals

jreback · 2015-11-05T12:43:02Z

dupe of #10242 was fixed in 0.17.0

Qu-Bit changed the title ~~pandas std broken, erratic behavior~~ BUG: pandas std broken, erratic behavior Nov 5, 2015

jreback closed this as completed Nov 5, 2015

jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pandas std broken, erratic behavior #11524

BUG: pandas std broken, erratic behavior #11524

Qu-Bit commented Nov 5, 2015

Qu-Bit commented Nov 5, 2015

jreback commented Nov 5, 2015

BUG: pandas std broken, erratic behavior #11524

BUG: pandas std broken, erratic behavior #11524

Comments

Qu-Bit commented Nov 5, 2015

INSTALLED VERSIONS

Qu-Bit commented Nov 5, 2015

jreback commented Nov 5, 2015