Pandas: why pandas.Series.std() is quite different from numpy.std() #10489

infozyzhang · 2015-07-02T06:30:27Z

I got two snippets code as follows.

numpy.std([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346])

0

pd.Series([766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346, 766897346]).std(ddof=0)

10.119288512538814

The two lists are identical but the result are quite different. I think the pandas' result must be wrong. I work on the latest version 0.16.2 with Python 3.4.

May I ask why? Is it a bug?

shoyer · 2015-07-02T07:15:40Z

Closing this as a duplicate of #10242

Pandas uses a correct formula for the standard deviation. However, the formula we use is not as numerically stable as the formula used by numpy. Pull requests to fix this would definitely be welcome!

shoyer closed this as completed Jul 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas: why pandas.Series.std() is quite different from numpy.std() #10489

Pandas: why pandas.Series.std() is quite different from numpy.std() #10489

infozyzhang commented Jul 2, 2015

shoyer commented Jul 2, 2015

Pandas: why pandas.Series.std() is quite different from numpy.std() #10489

Pandas: why pandas.Series.std() is quite different from numpy.std() #10489

Comments

infozyzhang commented Jul 2, 2015

shoyer commented Jul 2, 2015