-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Regarding pd.DataFrame.std() function #12230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What do you mean? They are both biased estimators of standard deviation. |
Hi!, Here's one of the links to the material in quenstion: Обе предложенные оценки - выборочная дисперсия и исправленная выборочная Both the SAMPLE variance estimator and the unbiased SAMPLE Where am I wrong? 2016-02-05 2:56 GMT+10:00 Ka Wo Chen notifications@github.com:
|
In the docstring it says
So working as intended. If you want the |
Thanks for the lead, I see now.)) 2016-02-05 12:26 GMT+10:00 Tom Augspurger notifications@github.com:
|
@TomAugspurger Then the doc string is either wrong or misleading. If you treat the data as a sample of some distribution, then the doc string is wrong regarding bias. The unbiased estimator of standard deviation has no closed form without further knowledge of the distribution. You can think of the data as the population, with each data point having the same probability, but then there should be no bias to speak of (and |
It looks like it calculates an adjusted value of std, i.e. a squared deviation from the mean, (x_i - x_bar) ** 2, divided by n-1 instead of n.
It is a reasonable measure for cases when n <= 50 allowing to avoid underestimation of std. But in cases when n > 50 actually there is no difference what number is used in the denominator, n or n-1. I guess you need to take this into account.
Many thanks for your excellent piece of work.
B.R., Andre Logunov, Russia
https://plus.google.com/u/1/
The text was updated successfully, but these errors were encountered: