.values on ExtensionArray-backed containers

Discussed briefly on the call today, but we should go through things formally.

What should the return type of `Series[extension_array].values` and `Index[extension_array].values` be? I believe the two options are

1. Return the ExtensionArray backing it (e.g. like what Categorical does)
2. Return an ndarray with some information loss / performance cost
   - e.g. like Series[datetimeTZ].values -> datetime64ns at UTC
   - e.g. Series[period].values -> ndarray[Period objects]

## Current State

Not sure how much weight we should put on the current behavior, but for reference:

type        | Series.values           | Index.values
----------- | ----------------------- | ------------
datetime    | datetime64ns            | datetime64ns
datetime-tz | datetine64ns(UTC&naive) | datetime64ns(UTC&naive)
categorical | Categorical             | Categorical
period      | NA                      | ndarray[Period objects]
interval    | NA                      | ndarray[Interval objects]

<details>

```python
In [5]: pd.Series(pd.date_range('2017', periods=1)).values
Out[5]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

In [6]: pd.Series(pd.date_range('2017', periods=1, tz='US/Eastern')).values
Out[6]: array(['2017-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

In [7]: pd.Series(pd.Categorical([1])).values
Out[7]:
[1]
Categories (1, int64): [1]

In [8]: pd.Series(pd.SparseArray([1])).values
Out[8]:
[1]
Fill: 0
IntIndex
Indices: array([0], dtype=int32)

In [9]: pd.date_range('2017', periods=1).values
Out[9]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

In [10]: pd.date_range('2017', periods=1, tz='US/Central').values
Out[10]: array(['2017-01-01T06:00:00.000000000'], dtype='datetime64[ns]')

In [11]: pd.period_range('2017', periods=1, freq='D').values
Out[11]: array([Period('2017-01-01', 'D')], dtype=object)

In [12]: pd.interval_range(start=0, periods=1).values
Out[12]: array([Interval(0, 1, closed='right')], dtype=object)

In [13]: pd.CategoricalIndex([1]).values
Out[13]:
[1]
Categories (1, int64): [1]
```

</details>

If we decide to have the return values be ExtensionArrays, we'll need to discuss
to what extent they're part of the public API.

Regardless of the choice for `.values`, we'll probably want to support the other
use case (maybe just by documenting "call `np.asarray` on it). Internally, we
have `._values` ("best" array, ndarray or EA) and `._ndarray_values` (always an
ndarray).


cc @jreback @jorisvandenbossche @jschendel @jbrockmendel @shoyer @chris-b1 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

.values on ExtensionArray-backed containers #19954

Current State

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

type	Series.values	Index.values
datetime	datetime64ns	datetime64ns
datetime-tz	datetine64ns(UTC&naive)	datetime64ns(UTC&naive)
categorical	Categorical	Categorical
period	NA	ndarray[Period objects]
interval	NA	ndarray[Interval objects]

Uh oh!

.values on ExtensionArray-backed containers #19954

Description

Current State

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions