Why does pandas dataframe.var() method return unbiased variance by default?? #27202

y-vectorfield · 2019-07-03T08:02:36Z

Why does pandas dataframe.var() method return "unbiased variance" by default?
I think variance implies "sample variance" in general.
Off course, I know this method can return "sample variance" if we provide ddof=0 option.
However, this setting option is very confusing.
Why don't you add new methods sample_var() and unbiased_var() or return "sample variance" by default?
Other data analysis OSS such as numpy, R and so on, their method return "sample variance" by default.
I think pandas is outstanding library for data analysis with Python! Therefore, I would like to know about this confusing specification.

jorisvandenbossche · 2019-07-03T17:18:55Z

As far as I know, the var method in pandas does calculate the sample variance (it has a default of ddof=1, providing ddof=0 gives you the population variance). This is equivalent with the default behaviour of var in R (it does contrast with numpy, though, that uses ddof=0 by default).

y-vectorfield · 2019-07-03T21:12:16Z

@jorisvandenbossche
Thank you very much for your answer.
You mean var() method of pandas return

by default??

As far as I know, this method return

"unbiased variance" by default.
This means unbiased estimator of population variance.

jorisvandenbossche · 2019-07-06T14:37:11Z

As far as I know, this method return "unbiased variance" by default. This means unbiased estimator of population variance.

And this is the "sample variance" you are asking for (and which is the same as the default in R).

(closing as this is not something to fix in pandas, but feel free to ask for further clarification. Or indicate what can be improved in the docs to make this clearer)

y-vectorfield · 2019-07-13T05:42:06Z

@jorisvandenbossche
OK, I understand thank you.

jorisvandenbossche added the Usage Question label Jul 3, 2019

jorisvandenbossche closed this as completed Jul 6, 2019

jorisvandenbossche added this to the No action milestone Jul 6, 2019

ghost mentioned this issue Jul 27, 2019

Numerical errors in rolling std (0.23.4) #27593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why does pandas dataframe.var() method return unbiased variance by default?? #27202

Why does pandas dataframe.var() method return unbiased variance by default?? #27202

y-vectorfield commented Jul 3, 2019

jorisvandenbossche commented Jul 3, 2019 •

edited

Loading

Uh oh!

y-vectorfield commented Jul 3, 2019 •

edited

Loading

Uh oh!

jorisvandenbossche commented Jul 6, 2019

Uh oh!

y-vectorfield commented Jul 13, 2019

Uh oh!

Uh oh!

Why does pandas dataframe.var() method return unbiased variance by default?? #27202

Why does pandas dataframe.var() method return unbiased variance by default?? #27202

Comments

y-vectorfield commented Jul 3, 2019

jorisvandenbossche commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

y-vectorfield commented Jul 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Jul 6, 2019

Uh oh!

y-vectorfield commented Jul 13, 2019

Uh oh!

jorisvandenbossche commented Jul 3, 2019 •

edited

Loading

y-vectorfield commented Jul 3, 2019 •

edited

Loading