Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate NDFrame.as_matrix #18458

Merged
merged 3 commits into from
Nov 26, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.22.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Deprecations
~~~~~~~~~~~~

- ``Series.from_array`` and ``SparseSeries.from_array`` are deprecated. Use the normal constructor ``Series(..)`` and ``SparseSeries(..)`` instead (:issue:`18213`).
-
- ``DataFrame.as_matrix`` is deprecated. Use ``DataFrame.values`` instead (:issue:`18458`).
-

.. _whatsnew_0220.prior_deprecations:
Expand Down
17 changes: 11 additions & 6 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3735,6 +3735,9 @@ def _get_bool_data(self):

def as_matrix(self, columns=None):
"""
DEPRECATED: as_matrix will be removed in a future version.
Use :meth:`DataFrame.values` instead.

Convert the frame to its Numpy-array representation.

Parameters
Expand Down Expand Up @@ -3770,10 +3773,11 @@ def as_matrix(self, columns=None):
--------
pandas.DataFrame.values
"""
warnings.warn("Method .as_matrix will be removed in a future version. "
"Use .values instead.", FutureWarning, stacklevel=2)
self._consolidate_inplace()
if self._AXIS_REVERSED:
return self._data.as_matrix(columns).T
return self._data.as_matrix(columns)
return self._data.as_array(transpose=self._AXIS_REVERSED,
items=columns)

@property
def values(self):
Expand All @@ -3791,7 +3795,8 @@ def values(self):
int32. By numpy.find_common_type convention, mixing int64 and uint64
will result in a flot64 dtype.
"""
return self.as_matrix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return .values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the def of .values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, then we should push this entirely down to the BlockManager, so pass the axis as well. The BM can then do the transpose, we don't like to do things in user code like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually deprecate .values as well, in favor of .to_numpy() which is the future spelling anyhow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already almost entirely pushed down, since it is calling directly BlockManager.as_matrix (but OK, the transpose could be done inside BlockManager.as_matrix). The consolidate_inplace needs to happens in NDFrame I think.

Deprecating .values is a no-go IMO, but let's deprecate that in another issue if you want

Copy link
Contributor Author

@topper-123 topper-123 Nov 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, then we should push this entirely down to the BlockManager, so pass the axis as well. The BM can then do the transpose, we don't like to do things in user code like this.

I'm not too familiar with the BlockManager, except accessing it with ._data. How do I pass the axis to it, and how do I transpose in a BlockManager? E.g. it doesn't have a .transpose method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at _take in pandas/core/generic.py, you just pass the get_block_manager_axis to it.

then add axis= to def as_matrix in pandas/core/internals and do the transpose there.

The reason for pushing this to internals is to avoid all of the wrapping layers (e.g. frame/series, etc.) to know about internal details like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_matrix doesn't have a axis parameter. Do you mean adding a new axis parameter there and transposing inside, if axis=0 (where I assume from my limited knowledge that BlockManager.axes[0] is always the same as dataFrame.axes[1], correct?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jreback, I've tried quite a bit today and I can't push this down while getting the tests to pass.

Could you look at my latest commit (not passing ATM) and give some advice?

self._consolidate_inplace()
return self._data.as_array(transpose=self._AXIS_REVERSED)

@property
def _values(self):
Expand All @@ -3801,11 +3806,11 @@ def _values(self):
@property
def _get_values(self):
# compat
return self.as_matrix()
return self.values

def get_values(self):
"""same as values (but handles sparseness conversions)"""
return self.as_matrix()
return self.values

def get_dtype_counts(self):
"""Return the counts of dtypes in this object."""
Expand Down
27 changes: 22 additions & 5 deletions pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -3484,7 +3484,7 @@ def replace_list(self, src_list, dest_list, inplace=False, regex=False,
mgr = self

# figure out our mask a-priori to avoid repeated replacements
values = self.as_matrix()
values = self.as_array()

def comp(s):
if isna(s):
Expand Down Expand Up @@ -3670,19 +3670,36 @@ def copy(self, deep=True, mgr=None):
return self.apply('copy', axes=new_axes, deep=deep,
do_integrity_check=False)

def as_matrix(self, items=None):
def as_array(self, transpose=False, items=None):
"""Convert the blockmanager data into an numpy array.

Parameters
----------
transpose : boolean, default False
If True, transpose the return array
items : list of strings or None
Names of block items that will be included in the returned
array. ``None`` means that all block items will be used

Returns
-------
arr : ndarray
"""
if len(self.blocks) == 0:
return np.empty(self.shape, dtype=float)
arr = np.empty(self.shape, dtype=float)
return arr.transpose() if transpose else arr

if items is not None:
mgr = self.reindex_axis(items, axis=0)
else:
mgr = self

if self._is_single_block or not self.is_mixed_type:
return mgr.blocks[0].get_values()
arr = mgr.blocks[0].get_values()
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to transpose instead. should be straightforward from here.

Copy link
Contributor Author

@topper-123 topper-123 Nov 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, transpose is of course much better than axis.

The issue was actually in the if len(self.blocks) == 0: block, as the empty array also must be transposed.

Everything is green now locally and I've pushed that upstream.

return mgr._interleave()
arr = mgr._interleave()

return arr.transpose() if transpose else arr

def _interleave(self):
"""
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/panel.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,7 +464,7 @@ def to_excel(self, path, na_rep='', engine=None, **kwargs):

def as_matrix(self):
self._consolidate_inplace()
return self._data.as_matrix()
return self._data.as_array()

# ----------------------------------------------------------------------
# Getting and setting elements
Expand Down
33 changes: 20 additions & 13 deletions pandas/tests/frame/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -243,31 +243,31 @@ def test_itertuples(self):
def test_len(self):
assert len(self.frame) == len(self.frame.index)

def test_as_matrix(self):
def test_values(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave one test (or now add a test_as_matrix_deprecated that copies eg the first case of this test) that uses as_matrix and that asserts it raises a warning and is the same as .values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I've added in below test_repr_with_mi_nat

frame = self.frame
mat = frame.as_matrix()
arr = frame.values

frameCols = frame.columns
for i, row in enumerate(mat):
frame_cols = frame.columns
for i, row in enumerate(arr):
for j, value in enumerate(row):
col = frameCols[j]
col = frame_cols[j]
if np.isnan(value):
assert np.isnan(frame[col][i])
else:
assert value == frame[col][i]

# mixed type
mat = self.mixed_frame.as_matrix(['foo', 'A'])
assert mat[0, 0] == 'bar'
arr = self.mixed_frame[['foo', 'A']].values
assert arr[0, 0] == 'bar'

df = self.klass({'real': [1, 2, 3], 'complex': [1j, 2j, 3j]})
mat = df.as_matrix()
assert mat[0, 0] == 1j
arr = df.values
assert arr[0, 0] == 1j

# single block corner case
mat = self.frame.as_matrix(['A', 'B'])
arr = self.frame[['A', 'B']].values
expected = self.frame.reindex(columns=['A', 'B']).values
assert_almost_equal(mat, expected)
assert_almost_equal(arr, expected)

def test_transpose(self):
frame = self.frame
Expand Down Expand Up @@ -311,8 +311,8 @@ def test_class_axis(self):
DataFrame.index # no exception!
DataFrame.columns # no exception!

def test_more_asMatrix(self):
values = self.mixed_frame.as_matrix()
def test_more_values(self):
values = self.mixed_frame.values
assert values.shape[1] == len(self.mixed_frame.columns)

def test_repr_with_mi_nat(self):
Expand Down Expand Up @@ -369,6 +369,13 @@ def test_values(self):
self.frame.values[:, 0] = 5.
assert (self.frame.values[:, 0] == 5).all()

def test_as_matrix_deprecated(self):
# GH18458
with tm.assert_produces_warning(FutureWarning):
result = self.frame.as_matrix(columns=self.frame.columns.tolist())
expected = self.frame.values
tm.assert_numpy_array_equal(result, expected)

def test_deepcopy(self):
cp = deepcopy(self.frame)
series = cp['A']
Expand Down
32 changes: 16 additions & 16 deletions pandas/tests/frame/test_block_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@ def test_consolidate_inplace(self):
for letter in range(ord('A'), ord('Z')):
self.frame[chr(letter)] = chr(letter)

def test_as_matrix_consolidate(self):
def test_values_consolidate(self):
self.frame['E'] = 7.
assert not self.frame._data.is_consolidated()
_ = self.frame.as_matrix() # noqa
_ = self.frame.values # noqa
assert self.frame._data.is_consolidated()

def test_modify_values(self):
Expand All @@ -91,50 +91,50 @@ def test_boolean_set_uncons(self):
self.frame[self.frame > 1] = 2
assert_almost_equal(expected, self.frame.values)

def test_as_matrix_numeric_cols(self):
def test_values_numeric_cols(self):
self.frame['foo'] = 'bar'

values = self.frame.as_matrix(['A', 'B', 'C', 'D'])
values = self.frame[['A', 'B', 'C', 'D']].values
assert values.dtype == np.float64

def test_as_matrix_lcd(self):
def test_values_lcd(self):

# mixed lcd
values = self.mixed_float.as_matrix(['A', 'B', 'C', 'D'])
values = self.mixed_float[['A', 'B', 'C', 'D']].values
assert values.dtype == np.float64

values = self.mixed_float.as_matrix(['A', 'B', 'C'])
values = self.mixed_float[['A', 'B', 'C']].values
assert values.dtype == np.float32

values = self.mixed_float.as_matrix(['C'])
values = self.mixed_float[['C']].values
assert values.dtype == np.float16

# GH 10364
# B uint64 forces float because there are other signed int types
values = self.mixed_int.as_matrix(['A', 'B', 'C', 'D'])
values = self.mixed_int[['A', 'B', 'C', 'D']].values
assert values.dtype == np.float64

values = self.mixed_int.as_matrix(['A', 'D'])
values = self.mixed_int[['A', 'D']].values
assert values.dtype == np.int64

# B uint64 forces float because there are other signed int types
values = self.mixed_int.as_matrix(['A', 'B', 'C'])
values = self.mixed_int[['A', 'B', 'C']].values
assert values.dtype == np.float64

# as B and C are both unsigned, no forcing to float is needed
values = self.mixed_int.as_matrix(['B', 'C'])
values = self.mixed_int[['B', 'C']].values
assert values.dtype == np.uint64

values = self.mixed_int.as_matrix(['A', 'C'])
values = self.mixed_int[['A', 'C']].values
assert values.dtype == np.int32

values = self.mixed_int.as_matrix(['C', 'D'])
values = self.mixed_int[['C', 'D']].values
assert values.dtype == np.int64

values = self.mixed_int.as_matrix(['A'])
values = self.mixed_int[['A']].values
assert values.dtype == np.int32

values = self.mixed_int.as_matrix(['C'])
values = self.mixed_int[['C']].values
assert values.dtype == np.uint8

def test_constructor_with_convert(self):
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/frame/test_nonunique_indexes.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ def test_columns_with_dups(self):
xp.columns = ['A', 'A', 'B']
assert_frame_equal(rs, xp)

def test_as_matrix_duplicates(self):
def test_values_duplicates(self):
df = DataFrame([[1, 2, 'a', 'b'],
[1, 2, 'a', 'b']],
columns=['one', 'one', 'two', 'two'])
Expand Down
Loading