BUG: .corr() values higher than 1 #35135

PanPip · 2020-07-06T12:39:29Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

Sorry for including an external dataset. I couldn't reproduce this bug with a smaller one.

url = 'https://raw.githubusercontent.com/MislavSag/trademl/master/trademl/modeling/random_forest/X_TEST.csv'
df = pd.read_csv(url, sep=',')
df = X_TEST.loc[:,['RSI30','CMO30']]

df.corr() > 1

Problem description

When applying .corr() on the given dataset the output Pearson's correlation is slightly >1 (6.661338e-16). I'd assume it should be equal to 1.

Expected Output

The expected result would be correlation values <= 1.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.3.0.post20200616
Cython : None
pytest : None
hypothesis : None
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.49.1

The text was updated successfully, but these errors were encountered:

gimseng · 2020-07-10T13:35:08Z

From a subset of your date:

df = pd.DataFrame({
                'A': {0: 35.22366795733074,1: 34.74626605356115},
                'B': {0: -29.55266408533853,1: -30.507467892877692}
                  })
df.corr() - 1

One gets:


  A | B
  -- | --
A| 0.000000e+00 | 2.220446e-16
B| 2.220446e-16 | 0.000000e+00

jreback · 2020-07-10T16:13:32Z

these are numerical precision issues

gimseng · 2020-07-15T08:47:05Z

@jreback In that case, should we have rounded the calculations / output of .corr() to appropriate digits ?

rhshadrach · 2020-08-20T03:39:32Z

@gimseng Another potential solution other than rounding would be to use .clip(lower=-1.0, upper=1.0). However regardless of what you do, be aware that numerical issues will always exist to some extent*, even when the values lie inside of [-1.0, 1.0], and the amount of numerical inaccuracy is dependent on the inputs.

*though pandas formula for Pearson correlation is the numerically stable version.

bashtage · 2020-08-20T06:44:22Z

the amount of numerical inaccuracy is dependent on the inputs.

It also depends on both the CPU and the operating system. Using the most stable version doesn't mean that calculations are not subject to numerical precision limits.

rhshadrach · 2020-08-22T16:49:47Z

Closing as this is not an issue with pandas, but just numerical computations.

PanPip added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 6, 2020

rhshadrach added Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2020

rhshadrach closed this as completed Aug 22, 2020

krassowski mentioned this issue Mar 14, 2025

BUG: .corr() values significantly higher than 1 #61120

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: .corr() values higher than 1 #35135

BUG: .corr() values higher than 1 #35135

PanPip commented Jul 6, 2020

INSTALLED VERSIONS

gimseng commented Jul 10, 2020 •

edited

Loading

Uh oh!

jreback commented Jul 10, 2020

Uh oh!

gimseng commented Jul 15, 2020

Uh oh!

rhshadrach commented Aug 20, 2020 •

edited

Loading

Uh oh!

bashtage commented Aug 20, 2020

Uh oh!

rhshadrach commented Aug 22, 2020

Uh oh!

Uh oh!

BUG: .corr() values higher than 1 #35135

BUG: .corr() values higher than 1 #35135

Comments

PanPip commented Jul 6, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

gimseng commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Jul 10, 2020

Uh oh!

gimseng commented Jul 15, 2020

Uh oh!

rhshadrach commented Aug 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bashtage commented Aug 20, 2020

Uh oh!

rhshadrach commented Aug 22, 2020

Uh oh!

Output of `pd.show_versions()`

gimseng commented Jul 10, 2020 •

edited

Loading

rhshadrach commented Aug 20, 2020 •

edited

Loading