Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int64 Series input to numpy.polyfit() fails #32989

Open
MaozGelbart opened this issue Mar 24, 2020 · 4 comments
Open

Int64 Series input to numpy.polyfit() fails #32989

MaozGelbart opened this issue Mar 24, 2020 · 4 comments
Labels
Bug Compat pandas objects compatability with Numpy or Python functions NA - MaskedArrays Related to pd.NA and nullable extension arrays ufuncs __array_ufunc__ and __array_function__

Comments

@MaozGelbart
Copy link

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

x = pd.Series([1,2,3]).astype('Int64')
y = pd.Series([4,4,6])

np.polyfit(x,y,1)

Problem description

The code raises from within Numpy:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 6, in polyfit
  File "C:\Anaconda3\lib\site-packages\numpy\lib\polynomial.py", line 609, in polyfit
    rcond = len(x)*finfo(x.dtype).eps
  File "C:\Anaconda3\lib\site-packages\numpy\core\getlimits.py", line 381, in __new__
    raise ValueError("data type %r not inexact" % (dtype))
ValueError: data type <class 'numpy.object_'> not inexact

However, the same Series with dtype=int64 do not raise. This happens for both pandas=0.24.0 and pandas=1.0.3

Since this dtype is documented as experimental I opened it as a pandas issue. Not sure if this is related to #29738 .

Expected Output

array([1.        , 2.66666667])

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 7
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200209
Cython : None
pytest : 4.5.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.0
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.5.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

xref mwaskom/seaborn#1971
cc @mojones who opened the linked seaborn issue

@TomAugspurger
Copy link
Contributor

I think fixing this would mean implementing __array_function__ on IntegerArray, which will require some care (#26380).

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Mar 26, 2020
@TomAugspurger
Copy link
Contributor

@MaozGelbart do you know what polyfit does with missing data (NaNs)?

@MaozGelbart
Copy link
Author

@TomAugspurger with one point as nan, it raises a LinAlgError from within the least squares method (for both dependent and independent vectors):

>>> np.polyfit([1,2,3,4,5,6],[4,np.nan,6,7,8,9],1)

Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 6, in polyfit
  File "C:\Anaconda3\lib\site-packages\numpy\lib\polynomial.py", line 631, in polyfit
    c, resids, rank, s = lstsq(lhs, rhs, rcond)
  File "<__array_function__ internals>", line 6, in lstsq
  File "C:\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 2259, in lstsq
    x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
  File "C:\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 109, in _raise_linalgerror_lstsq
    raise LinAlgError("SVD did not converge in Linear Least Squares")
numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 27, 2020 via email

@mroeschke mroeschke added Bug Compat pandas objects compatability with Numpy or Python functions labels Jul 30, 2021
@jbrockmendel jbrockmendel added the ufuncs __array_ufunc__ and __array_function__ label Jul 27, 2023
@mroeschke mroeschke added NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions NA - MaskedArrays Related to pd.NA and nullable extension arrays ufuncs __array_ufunc__ and __array_function__
Projects
None yet
Development

No branches or pull requests

4 participants