Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename DataFrame.arrange as DataFrame.sort #777

Merged
merged 6 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/explorer/backend/data_frame.ex
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ defmodule Explorer.Backend.DataFrame do
@callback mask(df, mask :: series) :: df
@callback filter_with(df, out_df :: df(), lazy_series()) :: df
@callback mutate_with(df, out_df :: df(), mutations :: [{column_name(), lazy_series()}]) :: df
@callback arrange_with(
@callback sort_with(
df,
out_df :: df(),
directions :: [{:asc | :desc, lazy_series()}],
Expand Down
62 changes: 37 additions & 25 deletions lib/explorer/data_frame.ex
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ defmodule Explorer.DataFrame do
- `select/2` for picking columns and `discard/2` to discard them
- `filter/2` for picking rows based on predicates
- `mutate/2` for adding or replacing columns that are functions of existing columns
- `arrange/2` for changing the ordering of rows
- `sort_by/2` for changing the ordering of rows
- `distinct/2` for picking unique rows
- `summarise/2` for reducing multiple rows down to a single summary
- `pivot_longer/3` and `pivot_wider/4` for massaging dataframes into longer or
Expand Down Expand Up @@ -3041,13 +3041,13 @@ defmodule Explorer.DataFrame do
defp append_unless_present([], name), do: [name]

@doc """
Arranges/sorts rows by columns using `Explorer.Query`.
Sorts rows by columns using `Explorer.Query`.

> #### Notice {: .notice}
>
> This is a macro. You must `require Explorer.DataFrame` before using it.

See `arrange_with/2` for a callback version of this function without
See `sort_with/2` for a callback version of this function without
`Explorer.Query`.

## Options
Expand All @@ -3072,7 +3072,7 @@ defmodule Explorer.DataFrame do
A single column name will sort ascending by that column:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, a)
iex> Explorer.DataFrame.sort_by(df, a)
#Explorer.DataFrame<
Polars[3 x 2]
a string ["a", "b", "c"]
Expand All @@ -3082,7 +3082,7 @@ defmodule Explorer.DataFrame do
You can also sort descending:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange(df, desc: a)
iex> Explorer.DataFrame.sort_by(df, desc: a)
#Explorer.DataFrame<
Polars[3 x 2]
a string ["c", "b", "a"]
Expand All @@ -3092,7 +3092,7 @@ defmodule Explorer.DataFrame do
You can specify how `nil`s are sorted:

iex> df = Explorer.DataFrame.new(a: ["b", "c", nil, "a"])
iex> Explorer.DataFrame.arrange(df, [desc: a], nils: :first)
iex> Explorer.DataFrame.sort_by(df, [desc: a], nils: :first)
#Explorer.DataFrame<
Polars[4 x 1]
a string [nil, "c", "b", "a"]
Expand All @@ -3101,7 +3101,7 @@ defmodule Explorer.DataFrame do
Sorting by more than one column sorts them in the order they are entered:

iex> df = Explorer.Datasets.fossil_fuels()
iex> Explorer.DataFrame.arrange(df, asc: total, desc: country)
iex> Explorer.DataFrame.sort_by(df, asc: total, desc: country)
#Explorer.DataFrame<
Polars[1094 x 10]
year integer [2010, 2010, 2011, 2011, 2012, ...]
Expand All @@ -3118,18 +3118,18 @@ defmodule Explorer.DataFrame do

## Grouped examples

When used in a grouped dataframe, arrange is going to sort each group individually and
then return the entire dataframe with the existing groups. If one of the arrange columns
When used in a grouped dataframe, sort_by is going to sort each group individually and
then return the entire dataframe with the existing groups. If one of the sort_by columns
is also a group, the sorting for that column is not going to work. It is necessary to
first summarise the desired column and then arrange it.
first summarise the desired column and then sort_by it.

Here is an example using the Iris dataset. We group by species and then we try to sort
the dataframe by species and petal length, but only "petal length" is taken into account
because "species" is a group.

iex> df = Explorer.Datasets.iris()
iex> grouped = Explorer.DataFrame.group_by(df, "species")
iex> Explorer.DataFrame.arrange(grouped, desc: species, asc: sepal_width)
iex> Explorer.DataFrame.sort_by(grouped, desc: species, asc: sepal_width)
#Explorer.DataFrame<
Polars[150 x 5]
Groups: ["species"]
Expand All @@ -3141,25 +3141,33 @@ defmodule Explorer.DataFrame do
>
"""
@doc type: :single
defmacro arrange(df, query, opts \\ []) do
defmacro sort_by(df, query, opts \\ []) do
quote do
require Explorer.Query

Explorer.DataFrame.arrange_with(
Explorer.DataFrame.sort_with(
unquote(df),
Explorer.Query.query(unquote(query)),
unquote(opts)
)
end
end

@deprecated "Use sort_by/3 instead"
@doc type: :single
defmacro arrange(df, query, opts \\ []) do
quote do
Explorer.DataFrame.sort_by(unquote(df), unquote(query), unquote(opts))
end
end

@doc """
Arranges/sorts rows by columns using a callback function.
Sorts rows by columns using a callback function.

The callback receives a lazy dataframe which stores
operations instead of values for efficient sorting.

This is a callback version of `arrange/2`.
This is a callback version of `sort_by/2`.

## Options

Expand All @@ -3183,7 +3191,7 @@ defmodule Explorer.DataFrame do
A single column name will sort ascending by that column:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange_with(df, &(&1["a"]))
iex> Explorer.DataFrame.sort_with(df, &(&1["a"]))
#Explorer.DataFrame<
Polars[3 x 2]
a string ["a", "b", "c"]
Expand All @@ -3193,7 +3201,7 @@ defmodule Explorer.DataFrame do
You can also sort descending:

iex> df = Explorer.DataFrame.new(a: ["b", "c", "a"], b: [1, 2, 3])
iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"]])
iex> Explorer.DataFrame.sort_with(df, &[desc: &1["a"]])
#Explorer.DataFrame<
Polars[3 x 2]
a string ["c", "b", "a"]
Expand All @@ -3203,7 +3211,7 @@ defmodule Explorer.DataFrame do
You can specify how `nil`s are sorted:

iex> df = Explorer.DataFrame.new(a: ["b", "c", nil, "a"])
iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"]], nils: :first)
iex> Explorer.DataFrame.sort_with(df, &[desc: &1["a"]], nils: :first)
#Explorer.DataFrame<
Polars[4 x 1]
a string [nil, "c", "b", "a"]
Expand All @@ -3212,7 +3220,7 @@ defmodule Explorer.DataFrame do
Sorting by more than one column sorts them in the order they are entered:

iex> df = Explorer.DataFrame.new(a: [3, 1, 3], b: [2, 1, 3])
iex> Explorer.DataFrame.arrange_with(df, &[desc: &1["a"], asc: &1["b"]])
iex> Explorer.DataFrame.sort_with(df, &[desc: &1["a"], asc: &1["b"]])
#Explorer.DataFrame<
Polars[3 x 2]
a integer [3, 3, 1]
Expand All @@ -3223,7 +3231,7 @@ defmodule Explorer.DataFrame do

iex> df = Explorer.Datasets.iris()
iex> grouped = Explorer.DataFrame.group_by(df, "species")
iex> Explorer.DataFrame.arrange_with(grouped, &[desc: &1["species"], asc: &1["sepal_width"]])
iex> Explorer.DataFrame.sort_with(grouped, &[desc: &1["species"], asc: &1["sepal_width"]])
#Explorer.DataFrame<
Polars[150 x 5]
Groups: ["species"]
Expand All @@ -3235,13 +3243,13 @@ defmodule Explorer.DataFrame do
>
"""
@doc type: :single
@spec arrange_with(
@spec sort_with(
df :: DataFrame.t(),
(Explorer.Backend.LazyFrame.t() ->
Series.lazy_t() | [Series.lazy_t()] | [{:asc | :desc, Series.lazy_t()}]),
opts :: [nils: :first | :last, stable: boolean()]
) :: DataFrame.t()
def arrange_with(%DataFrame{} = df, fun, opts \\ []) when is_function(fun, 1) do
def sort_with(%DataFrame{} = df, fun, opts \\ []) when is_function(fun, 1) do
[_descending? | opts] = Shared.validate_sort_options!(opts)

ldf = Explorer.Backend.LazyFrame.new(df)
Expand All @@ -3265,12 +3273,16 @@ defmodule Explorer.DataFrame do
{:asc, lazy_series}

other ->
raise "not a valid lazy series or arrange instruction: #{inspect(other)}"
raise "not a valid lazy series or sort_by instruction: #{inspect(other)}"
end)

Shared.apply_impl(df, :arrange_with, [df, dir_and_lazy_series_pairs] ++ opts)
Shared.apply_impl(df, :sort_with, [df, dir_and_lazy_series_pairs] ++ opts)
end

@deprecated "Use sort_with/3 instead"
@doc type: :single
def arrange_with(df, fun, opts \\ []), do: sort_with(df, fun, opts)

@doc """
Takes distinct rows by a selection of columns.

Expand Down Expand Up @@ -5645,7 +5657,7 @@ defmodule Explorer.DataFrame do
df
|> group_by(columns)
|> summarise_with(&[counts: Series.count(&1[col])])
|> arrange_with(&[desc: &1[:counts]])
|> sort_with(&[desc: &1[:counts]])
end

def frequencies(_df, []), do: raise(ArgumentError, "columns cannot be empty")
Expand Down
6 changes: 3 additions & 3 deletions lib/explorer/polars_backend/data_frame.ex
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,7 @@ defmodule Explorer.PolarsBackend.DataFrame do
end

@impl true
def arrange_with(
def sort_with(
%DataFrame{} = df,
out_df,
column_pairs,
Expand All @@ -659,7 +659,7 @@ defmodule Explorer.PolarsBackend.DataFrame do
|> Enum.map(fn {dir, %{args: [col]}} -> {dir == :desc, col} end)
|> Enum.unzip()

Shared.apply_dataframe(df, out_df, :df_arrange, [
Shared.apply_dataframe(df, out_df, :df_sort_by, [
column_names,
directions,
maintain_order?,
Expand All @@ -673,7 +673,7 @@ defmodule Explorer.PolarsBackend.DataFrame do
|> Enum.map(fn {dir, lazy_series} -> {dir == :desc, to_expr(lazy_series)} end)
|> Enum.unzip()

Shared.apply_dataframe(df, out_df, :df_arrange_with, [
Shared.apply_dataframe(df, out_df, :df_sort_with, [
expressions,
directions,
maintain_order?,
Expand Down
8 changes: 4 additions & 4 deletions lib/explorer/polars_backend/lazy_frame.ex
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ defmodule Explorer.PolarsBackend.LazyFrame do
end

@impl true
def arrange_with(
def sort_with(
%DF{groups: []} = df,
out_df,
column_pairs,
Expand All @@ -369,7 +369,7 @@ defmodule Explorer.PolarsBackend.LazyFrame do
|> Enum.map(fn {direction, lazy_series} -> {direction == :desc, to_expr(lazy_series)} end)
|> Enum.unzip()

Shared.apply_dataframe(df, out_df, :lf_arrange_with, [
Shared.apply_dataframe(df, out_df, :lf_sort_with, [
expressions,
directions,
maintain_order?,
Expand All @@ -378,8 +378,8 @@ defmodule Explorer.PolarsBackend.LazyFrame do
end

@impl true
def arrange_with(_df, _out_df, _directions, _maintain_order?, _multithreaded?, _nulls_last?) do
raise "arrange_with/2 with groups is not supported yet for lazy frames"
def sort_with(_df, _out_df, _directions, _maintain_order?, _multithreaded?, _nulls_last?) do
raise "sort_with/2 with groups is not supported yet for lazy frames"
end

@impl true
Expand Down
6 changes: 3 additions & 3 deletions lib/explorer/polars_backend/native.ex
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,10 @@ defmodule Explorer.PolarsBackend.Native do

def df_from_arrow_stream_pointer(_stream_ptr), do: err()

def df_arrange(_df, _by, _reverse, _maintain_order?, _multithreaded?, _nulls_last?, _groups),
def df_sort_by(_df, _by, _reverse, _maintain_order?, _multithreaded?, _nulls_last?, _groups),
do: err()

def df_arrange_with(
def df_sort_with(
_df,
_expressions,
_directions,
Expand Down Expand Up @@ -246,7 +246,7 @@ defmodule Explorer.PolarsBackend.Native do

def lf_filter_with(_df, _expression), do: err()

def lf_arrange_with(
def lf_sort_with(
_df,
_expressions,
_directions,
Expand Down
4 changes: 2 additions & 2 deletions lib/explorer/query.ex
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ defmodule Explorer.Query do

Queries are supported in the following operations:

* `Explorer.DataFrame.arrange/2`
* `Explorer.DataFrame.sort_by/2`
* `Explorer.DataFrame.filter/2`
* `Explorer.DataFrame.mutate/2`
* `Explorer.DataFrame.summarise/2`
Expand Down Expand Up @@ -218,7 +218,7 @@ defmodule Explorer.Query do
petal_width_mean f64 [0.2439999999999999, 1.3259999999999998, 2.026]
>

`arrange` expects a list of columns to sort by, while for-comprehensions
`sort_by` expects a list of columns to sort by, while for-comprehensions
in `filter` generate a list of conditions, which are joined using `and`.
For example, to filter all entries have both sepal and petal length above
average, using a filter on the column name, one could write:
Expand Down
4 changes: 2 additions & 2 deletions lib/explorer/series.ex
Original file line number Diff line number Diff line change
Expand Up @@ -1694,7 +1694,7 @@ defmodule Explorer.Series do
require Explorer.DataFrame

Explorer.DataFrame.new(_: unquote(series))
|> Explorer.DataFrame.arrange([{unquote(direction), unquote(query)}], unquote(opts))
|> Explorer.DataFrame.sort_by([{unquote(direction), unquote(query)}], unquote(opts))
|> Explorer.DataFrame.pull(:_)
end
end
Expand Down Expand Up @@ -1752,7 +1752,7 @@ defmodule Explorer.Series do
{direction, opts} = Keyword.pop(opts, :direction, :asc)

Explorer.DataFrame.new(series: series)
|> Explorer.DataFrame.arrange_with(&[{direction, fun.(&1[:series])}], opts)
|> Explorer.DataFrame.sort_with(&[{direction, fun.(&1[:series])}], opts)
|> Explorer.DataFrame.pull(:series)
end

Expand Down
4 changes: 2 additions & 2 deletions native/explorer/src/dataframe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ fn arrow_to_explorer_error(error: impl std::fmt::Debug) -> ExplorerError {
}

#[rustler::nif(schedule = "DirtyCpu")]
pub fn df_arrange(
pub fn df_sort_by(
df: ExDataFrame,
by_columns: Vec<String>,
reverse: Vec<bool>,
Expand Down Expand Up @@ -354,7 +354,7 @@ pub fn df_arrange(
}

#[rustler::nif(schedule = "DirtyCpu")]
pub fn df_arrange_with(
pub fn df_sort_with(
data: ExDataFrame,
expressions: Vec<ExExpr>,
directions: Vec<bool>,
Expand Down
2 changes: 1 addition & 1 deletion native/explorer/src/lazyframe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ pub fn lf_filter_with(data: ExLazyFrame, ex_expr: ExExpr) -> Result<ExLazyFrame,
}

#[rustler::nif]
pub fn lf_arrange_with(
pub fn lf_sort_with(
data: ExLazyFrame,
expressions: Vec<ExExpr>,
directions: Vec<bool>,
Expand Down
6 changes: 3 additions & 3 deletions native/explorer/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ rustler::init!(
"Elixir.Explorer.PolarsBackend.Native",
[
df_from_arrow_stream_pointer,
df_arrange,
df_arrange_with,
df_sort_by,
df_sort_with,
df_concat_columns,
df_concat_rows,
df_describe,
Expand Down Expand Up @@ -294,7 +294,7 @@ rustler::init!(
lf_from_parquet_cloud,
lf_from_ndjson,
lf_filter_with,
lf_arrange_with,
lf_sort_with,
lf_distinct,
lf_mutate_with,
lf_summarise_with,
Expand Down
Loading
Loading