Skip to content

Commit a6ee127

Browse files
committed
Merge pull request #10726 from jreback/sorted
API/WIP: .sorted
2 parents d406273 + 13d2d71 commit a6ee127

23 files changed

+792
-486
lines changed

doc/source/api.rst

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -434,9 +434,8 @@ Reshaping, sorting
434434
:toctree: generated/
435435

436436
Series.argsort
437-
Series.order
438437
Series.reorder_levels
439-
Series.sort
438+
Series.sort_values
440439
Series.sort_index
441440
Series.sortlevel
442441
Series.swaplevel
@@ -908,7 +907,7 @@ Reshaping, sorting, transposing
908907

909908
DataFrame.pivot
910909
DataFrame.reorder_levels
911-
DataFrame.sort
910+
DataFrame.sort_values
912911
DataFrame.sort_index
913912
DataFrame.sortlevel
914913
DataFrame.nlargest
@@ -1293,7 +1292,6 @@ Modifying and Computations
12931292
Index.insert
12941293
Index.min
12951294
Index.max
1296-
Index.order
12971295
Index.reindex
12981296
Index.repeat
12991297
Index.take
@@ -1319,8 +1317,7 @@ Sorting
13191317
:toctree: generated/
13201318

13211319
Index.argsort
1322-
Index.order
1323-
Index.sort
1320+
Index.sort_values
13241321

13251322
Time-specific operations
13261323
~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/basics.rst

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1440,39 +1440,56 @@ description.
14401440

14411441
.. _basics.sorting:
14421442

1443-
Sorting by index and value
1444-
--------------------------
1443+
Sorting
1444+
-------
1445+
1446+
.. warning::
1447+
1448+
The sorting API is substantially changed in 0.17.0, see :ref:`here <whatsnew_0170.api_breaking.sorting>` for these changes.
1449+
In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``).
14451450

14461451
There are two obvious kinds of sorting that you may be interested in: sorting
1447-
by label and sorting by actual values. The primary method for sorting axis
1448-
labels (indexes) across data structures is the :meth:`~DataFrame.sort_index` method.
1452+
by label and sorting by actual values.
1453+
1454+
By Index
1455+
~~~~~~~~
1456+
1457+
The primary method for sorting axis
1458+
labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index()`` methods.
14491459

14501460
.. ipython:: python
14511461
14521462
unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
14531463
columns=['three', 'two', 'one'])
1464+
1465+
# DataFrame
14541466
unsorted_df.sort_index()
14551467
unsorted_df.sort_index(ascending=False)
14561468
unsorted_df.sort_index(axis=1)
14571469
1458-
:meth:`DataFrame.sort_index` can accept an optional ``by`` argument for ``axis=0``
1470+
# Series
1471+
unsorted_df['three'].sort_index()
1472+
1473+
By Values
1474+
~~~~~~~~~
1475+
1476+
The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row).
1477+
:meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0``
14591478
which will use an arbitrary vector or a column name of the DataFrame to
14601479
determine the sort order:
14611480

14621481
.. ipython:: python
14631482
14641483
df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})
1465-
df1.sort_index(by='two')
1484+
df1.sort_values(by='two')
14661485
14671486
The ``by`` argument can take a list of column names, e.g.:
14681487

14691488
.. ipython:: python
14701489
14711490
df1[['one', 'two', 'three']].sort_index(by=['one','two'])
14721491
1473-
Series has the method :meth:`~Series.order` (analogous to `R's order function
1474-
<http://stat.ethz.ch/R-manual/R-patched/library/base/html/order.html>`__) which
1475-
sorts by value, with special treatment of NA values via the ``na_position``
1492+
These methods have special treatment of NA values via the ``na_position``
14761493
argument:
14771494

14781495
.. ipython:: python
@@ -1481,11 +1498,11 @@ argument:
14811498
s.order()
14821499
s.order(na_position='first')
14831500
1484-
.. note::
14851501
1486-
:meth:`Series.sort` sorts a Series by value in-place. This is to provide
1487-
compatibility with NumPy methods which expect the ``ndarray.sort``
1488-
behavior. :meth:`Series.order` returns a copy of the sorted data.
1502+
.. _basics.searchsorted:
1503+
1504+
searchsorted
1505+
~~~~~~~~~~~~
14891506

14901507
Series has the :meth:`~Series.searchsorted` method, which works similar to
14911508
:meth:`numpy.ndarray.searchsorted`.
@@ -1515,7 +1532,7 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.
15151532
15161533
s = pd.Series(np.random.permutation(10))
15171534
s
1518-
s.order()
1535+
s.sort_values()
15191536
s.nsmallest(3)
15201537
s.nlargest(3)
15211538

doc/source/whatsnew/v0.17.0.txt

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ users upgrade to this version.
1414
Highlights include:
1515

1616
- Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
17+
- The sorting API has been revamped to remove some long-time inconsistencies, see :ref:`here <whatsnew_0170.api_breaking.sorting>`
1718
- The default for ``to_datetime`` will now be to ``raise`` when presented with unparseable formats,
1819
previously this would return the original input, see :ref:`here <whatsnew_0170.api_breaking.to_datetime>`
1920
- The default for ``dropna`` in ``HDFStore`` has changed to ``False``, to store by default all rows even
@@ -207,6 +208,65 @@ Other enhancements
207208
Backwards incompatible API changes
208209
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209210

211+
.. _whatsnew_0170.api_breaking.sorting:
212+
213+
Changes to sorting API
214+
^^^^^^^^^^^^^^^^^^^^^^
215+
216+
The sorting API has had some longtime inconsistencies. (:issue:`9816`,:issue:`8239`).
217+
218+
Here is a summary of the **prior** to 0.17.0 API
219+
220+
- ``Series.sort`` is **INPLACE** while ``DataFrame.sort`` returns a new object.
221+
- ``Series.order`` returned a new object
222+
- It was possible to use ``Series/DataFrame.sort_index`` to sort by **values** by passing the ``by`` keyword.
223+
- ``Series/DataFrame.sortlevel`` worked only on a ``MultiIndex`` for sorting by index.
224+
225+
To address these issues, we have revamped the API:
226+
227+
- We have introduced a new method, :meth:`DataFrame.sort_values`, which is the merger of ``DataFrame.sort()``, ``Series.sort()``,
228+
and ``Series.order``, to handle sorting of **values**.
229+
- The existing method ``Series.sort()`` has been deprecated and will be removed in a
230+
future version of pandas.
231+
- The ``by`` argument of ``DataFrame.sort_index()`` has been deprecated and will be removed in a future version of pandas.
232+
- The methods ``DataFrame.sort()``, ``Series.order()``, will not be recommended to use and will carry a deprecation warning
233+
in the doc-string.
234+
- The existing method ``.sort_index()`` will gain the ``level`` keyword to enable level sorting.
235+
236+
We now have two distinct and non-overlapping methods of sorting. A ``*`` marks items that
237+
will show a ``FutureWarning``.
238+
239+
To sort by the **values**:
240+
241+
================================= ====================================
242+
Previous Replacement
243+
================================= ====================================
244+
\*``Series.order()`` ``Series.sort_values()``
245+
\*``Series.sort()`` ``Series.sort_values(inplace=True)``
246+
\*``DataFrame.sort(columns=...)`` ``DataFrame.sort_values(by=...)``
247+
================================= ====================================
248+
249+
To sort by the **index**:
250+
251+
================================= ====================================
252+
Previous Equivalent
253+
================================= ====================================
254+
``Series.sort_index()`` ``Series.sort_index()``
255+
``Series.sortlevel(level=...)`` ``Series.sort_index(level=...``)
256+
``DataFrame.sort_index()`` ``DataFrame.sort_index()``
257+
``DataFrame.sortlevel(level=...)`` ``DataFrame.sort_index(level=...)``
258+
\*``DataFrame.sort()`` ``DataFrame.sort_index()``
259+
================================== ====================================
260+
261+
We have also deprecated and changed similar methods in two Series-like classes, ``Index`` and ``Categorical``.
262+
263+
================================== ====================================
264+
Previous Replacement
265+
================================== ====================================
266+
\*``Index.order()`` ``Index.sort_values()``
267+
\*``Categorical.order()`` ``Categorical.sort_values``
268+
================================== ====================================
269+
210270
.. _whatsnew_0170.api_breaking.to_datetime:
211271

212272
Changes to to_datetime and to_timedelta
@@ -591,7 +651,7 @@ Removal of prior version deprecations/changes
591651
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
592652

593653
- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)
594-
654+
- Removal of ``na_last`` parameters from ``Series.order()`` and ``Series.sort()``, in favor of ``na_position``, xref (:issue:`5231`)
595655

596656
.. _whatsnew_0170.performance:
597657

pandas/core/algorithms.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -265,9 +265,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
265265
result.index = bins[:-1]
266266

267267
if sort:
268-
result.sort()
269-
if not ascending:
270-
result = result[::-1]
268+
result = result.sort_values(ascending=ascending)
271269

272270
if normalize:
273271
result = result / float(values.size)
@@ -500,7 +498,7 @@ def select_n_slow(dropped, n, take_last, method):
500498
reverse_it = take_last or method == 'nlargest'
501499
ascending = method == 'nsmallest'
502500
slc = np.s_[::-1] if reverse_it else np.s_[:]
503-
return dropped[slc].order(ascending=ascending).head(n)
501+
return dropped[slc].sort_values(ascending=ascending).head(n)
504502

505503

506504
_select_methods = {'nsmallest': nsmallest, 'nlargest': nlargest}

pandas/core/categorical.py

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1083,7 +1083,7 @@ def argsort(self, ascending=True, **kwargs):
10831083
result = result[::-1]
10841084
return result
10851085

1086-
def order(self, inplace=False, ascending=True, na_position='last'):
1086+
def sort_values(self, inplace=False, ascending=True, na_position='last'):
10871087
""" Sorts the Category by category value returning a new Categorical by default.
10881088
10891089
Only ordered Categoricals can be sorted!
@@ -1092,10 +1092,10 @@ def order(self, inplace=False, ascending=True, na_position='last'):
10921092
10931093
Parameters
10941094
----------
1095-
ascending : boolean, default True
1096-
Sort ascending. Passing False sorts descending
10971095
inplace : boolean, default False
10981096
Do operation in place.
1097+
ascending : boolean, default True
1098+
Sort ascending. Passing False sorts descending
10991099
na_position : {'first', 'last'} (optional, default='last')
11001100
'first' puts NaNs at the beginning
11011101
'last' puts NaNs at the end
@@ -1139,6 +1139,37 @@ def order(self, inplace=False, ascending=True, na_position='last'):
11391139
return Categorical(values=codes,categories=self.categories, ordered=self.ordered,
11401140
fastpath=True)
11411141

1142+
def order(self, inplace=False, ascending=True, na_position='last'):
1143+
"""
1144+
DEPRECATED: use :meth:`Categorical.sort_values`
1145+
1146+
Sorts the Category by category value returning a new Categorical by default.
1147+
1148+
Only ordered Categoricals can be sorted!
1149+
1150+
Categorical.sort is the equivalent but sorts the Categorical inplace.
1151+
1152+
Parameters
1153+
----------
1154+
inplace : boolean, default False
1155+
Do operation in place.
1156+
ascending : boolean, default True
1157+
Sort ascending. Passing False sorts descending
1158+
na_position : {'first', 'last'} (optional, default='last')
1159+
'first' puts NaNs at the beginning
1160+
'last' puts NaNs at the end
1161+
1162+
Returns
1163+
-------
1164+
y : Category or None
1165+
1166+
See Also
1167+
--------
1168+
Category.sort
1169+
"""
1170+
warn("order is deprecated, use sort_values(...)",
1171+
FutureWarning, stacklevel=2)
1172+
return self.sort_values(inplace=inplace, ascending=ascending, na_position=na_position)
11421173

11431174
def sort(self, inplace=True, ascending=True, na_position='last'):
11441175
""" Sorts the Category inplace by category value.
@@ -1163,10 +1194,10 @@ def sort(self, inplace=True, ascending=True, na_position='last'):
11631194
11641195
See Also
11651196
--------
1166-
Category.order
1197+
Category.sort_values
11671198
"""
1168-
return self.order(inplace=inplace, ascending=ascending,
1169-
na_position=na_position)
1199+
return self.sort_values(inplace=inplace, ascending=ascending,
1200+
na_position=na_position)
11701201

11711202
def ravel(self, order='C'):
11721203
""" Return a flattened (numpy) array.

pandas/core/common.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2155,6 +2155,9 @@ def _mut_exclusive(**kwargs):
21552155
return val2
21562156

21572157

2158+
def _not_none(*args):
2159+
return (arg for arg in args if arg is not None)
2160+
21582161
def _any_none(*args):
21592162
for arg in args:
21602163
if arg is None:

0 commit comments

Comments
 (0)