Skip to content

Why does pandas dataframe.var() method return unbiased variance by default?? #27202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
y-vectorfield opened this issue Jul 3, 2019 · 4 comments

Comments

@y-vectorfield
Copy link

Why does pandas dataframe.var() method return "unbiased variance" by default?
I think variance implies "sample variance" in general.
Off course, I know this method can return "sample variance" if we provide ddof=0 option.
However, this setting option is very confusing.
Why don't you add new methods sample_var() and unbiased_var() or return "sample variance" by default?
Other data analysis OSS such as numpy, R and so on, their method return "sample variance" by default.
I think pandas is outstanding library for data analysis with Python! Therefore, I would like to know about this confusing specification.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jul 3, 2019

As far as I know, the var method in pandas does calculate the sample variance (it has a default of ddof=1, providing ddof=0 gives you the population variance). This is equivalent with the default behaviour of var in R (it does contrast with numpy, though, that uses ddof=0 by default).

@y-vectorfield
Copy link
Author

y-vectorfield commented Jul 3, 2019

@jorisvandenbossche
Thank you very much for your answer.
You mean var() method of pandas return
image
by default??

As far as I know, this method return
image
"unbiased variance" by default.
This means unbiased estimator of population variance.

@jorisvandenbossche
Copy link
Member

As far as I know, this method return "unbiased variance" by default. This means unbiased estimator of population variance.

And this is the "sample variance" you are asking for (and which is the same as the default in R).

(closing as this is not something to fix in pandas, but feel free to ask for further clarification. Or indicate what can be improved in the docs to make this clearer)

@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jul 6, 2019
@y-vectorfield
Copy link
Author

@jorisvandenbossche
OK, I understand thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants