-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
from pandas.core import nanops
a_with_nan = np.array([1+2j, 2+3j, 3+4j, np.nan + 0j], dtype=np.complex128)
s = pd.Series(a_with_nan)
var = s.var(ddof=1)
print(var) # 15.5
a_without_nan = np.array([1+2j, 2+3j, 3+4j], dtype=np.complex128)
s = pd.Series(a_without_nan)
var = s.var(ddof=1)
print(var) # 15.5
# Reference result from numpy
print(a_without_nan.var(ddof=1)) # 2.0
# My manual calculation: 2.0
Issue Description
pandas.Series.var seems to produce incorrect results for complex arrays. I did a cross-check between the output of pandas.Series.var, my own manual calculations, and the results obtained from NumPy, and found that pandas.Series.var yields different results from both my manual and numpy calculation. I have traced back to issues of pandas and found that this issue relate to issue #61645, which was fixed in PR #61646. However, the fix seems to only correct the sign of the output values to ensure that it is positive, rather than addressing the root cause. I think that the root cause lies in this line, where the computation of avg forced the value to be cast to float64. Indeed, numpy raises the following warnings:
ComplexWarning: Casting complex values to real discards the imaginary part
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
Expected Behavior
Both my manual and numpy calculation yields a same result of 2.0 but pandas.Series.var return 15.5 for [1+2j, 2+3j, 3+4j] with ddof of 1
Installed Versions
pandas: 3.0.0.dev0+2421.ge79f1565e6
numpy: 2.2.3
I also tested with pandas 2.3.2 as well