Skip to content

BUG: pandas std broken, erratic behavior #11524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Qu-Bit opened this issue Nov 5, 2015 · 2 comments
Closed

BUG: pandas std broken, erratic behavior #11524

Qu-Bit opened this issue Nov 5, 2015 · 2 comments
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@Qu-Bit
Copy link

Qu-Bit commented Nov 5, 2015

I seem to have encountered a bug while using
DataFrame.apply(np.std) or DF.groupby.agg(np.std)

tested on:

  • debian 7.9 (wheezy), pandas version: 0.16.2.dev
  • ubuntu 15.10, pandas version: 0.15.0
    • more version details for the tested snippets below

test code:

import numpy as np
import pandas as pd
print("pandas version: ", pd.__version__)


# why is there a difference ?
s = pd.Series([1.11] * 10)
print(np.std(s))
#1.88486436615e-08
print(np.std(s.values))
#2.22044604925e-16

# why is it significantly != 0 ?
# why is there a difference ?
print(np.std(pd.Series([53426.7756333882,] * 50)))
#0.0011048543456
print(np.std(pd.Series([53426.7756333882,] * 123)))
#0.000704429402084

# doing that with data frames
df = pd.DataFrame([538512.198638,] * 123)
print(df.apply(np.std))
#0    0.030867
# dtype: float64
print(np.std(df))
#0    0.030867
# dtype: float64
print(np.std(df.values))
#2.32830643654e-10

using the following data set even NaNs appear as result for std
funnily this happens if you add enough digits after the comma

import numpy as np
import pandas as pd
print("pandas version: ", pd.__version__)

df = pd.DataFrame([538512.1986379109,]*126)
df['uid']=15
df.set_index('uid', inplace=True)

# why do I occasionally even get NaN for identical values?
# NaN appears if there are enough decimal places after the comma 
print(df.groupby(level=0).agg([np.mean, np.std]))
#                 0    
#              mean std
#uid                   
#15   538512.198638 NaN
df['bla'] = 911.
print(df.groupby(level=0).agg([np.mean, np.std]))
#                 0      bla    
#              mean std mean std
#uid                            
#15   538512.198638 NaN  911   0
df['bla'] = 911.12351171571542312243214
print(df.groupby(level=0).agg([np.mean, np.std]))
#                 0      bla    
#              mean std mean std
#uid                            
#15   538512.198638 NaN  911 NaN

bug hypotheses:
this looks like a detail-problem regarding the way pandas applies functions
is the size of the mapped-on container calculated wrong? (e.g. not the single column series length is used, but the overall container's number of elements)

version details: (ubuntu 15.10)

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-43-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_AT.UTF-8

pandas: 0.15.0
nose: 1.3.6
Cython: 0.23.3
numpy: 1.8.2
scipy: 0.14.1
statsmodels: 0.5.0
IPython: 2.3.0
sphinx: 1.2.3
patsy: 0.4.0
dateutil: 2.2
pytz: 2014.10
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.4.2
openpyxl: 2.3.0-b1
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
rpy2: 2.6.2
sqlalchemy: 1.0.8
pymysql: 0.6.2.None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)

@Qu-Bit Qu-Bit changed the title pandas std broken, erratic behavior BUG: pandas std broken, erratic behavior Nov 5, 2015
@Qu-Bit
Copy link
Author

Qu-Bit commented Nov 5, 2015

suggesting labels
#BUG, #Prio-high, #Numeric, #Internals

@jreback
Copy link
Contributor

jreback commented Nov 5, 2015

dupe of #10242 was fixed in 0.17.0

@jreback jreback closed this as completed Nov 5, 2015
@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

2 participants