Skip to content

.unique() inconsistency between Series and Index objects #14437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datajanko opened this issue Oct 17, 2016 · 3 comments
Closed

.unique() inconsistency between Series and Index objects #14437

datajanko opened this issue Oct 17, 2016 · 3 comments

Comments

@datajanko
Copy link
Contributor

A small, complete example of the issue

ix = pd.Index([0,0,1,2])
sr = pd.Series([0,0,1,2])

ix.unique()
Int64Index([0, 1, 2], dtype='int64')

sr.unique()
array([0, 1, 2], dtype=int64)

Expected Output

I'd expect that unique returns the original datatype or always an array.
I'd prefer the original data_type since arrays do not provide a to_series function but indexes do.
So when working with pipelines, this is much more convenient

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.0
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.0
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 17, 2016

This has changed in 0.19 after some discussion. See the notice in the whatsnew here: http://pandas.pydata.org/pandas-docs/version/0.19.0/whatsnew.html#index-unique-consistently-returns-index and the discussion here: #13395

Basically, before 0.19 Index.unique() returned either an index or an array depending on the data type. Therefore we choose to always return an Index for consistency.
The problem with the return type of Series.unique(), is that a Series has a default index that is meaningless for the unique values. Therfore we choose to stick to the return type of array in this case.

Those choices indeed cause an inconsistency between Index and Series.

@jreback jreback closed this as completed Oct 17, 2016
@jreback jreback added this to the No action milestone Oct 17, 2016
@datajanko
Copy link
Contributor Author

oh, I overrad this on the release notes and didn't find it in the github issues. Shame on me. Thanks for the clarification

@jorisvandenbossche
Copy link
Member

@Jan-Ko No problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants