Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional _ macros in Series #730

Closed
billylanchantin opened this issue Nov 8, 2023 · 8 comments
Closed

Additional _ macros in Series #730

billylanchantin opened this issue Nov 8, 2023 · 8 comments

Comments

@billylanchantin
Copy link
Contributor

billylanchantin commented Nov 8, 2023

Overview

With #728 merged, we now have a strategy for adding macros to Series. The aim is closer parity with DataFrames.

This issue follows up from #726. We need to decide:

  • which DataFrame macros we want Series to also support and
  • how to name them.

Current Macros

DataFrames support four macros/callbacks:

  • arrange/arrange_with
  • filter/filter_with
  • mutate/mutate_with
  • summarise/summarise_with

We've already done filter. We probably don't want to support summarise. That leaves arrange and mutate.

Name options

DF Parity option Rename option
DF.arrange{_with} S.arrange{_with} S.sort{_with}
DF.mutate{_with} S.mutate{_with} S.map{_with}

Considerations

  1. There already exists an S.sort function, and it allows an additional nils: :first | :last option.
  2. There exists an S.transform function which is similar to DF.mutate, but it's for Elixir functions instead of Series functions.
@cigrainger
Copy link
Member

I'm not a huge fan of the idea of mutate for series. Maybe it's just the R in me. mutate carries the create/delete aspects of mutating a dataframe to me and it's been around forever in R. I think I could get past it though for continuity.

We should just bring the nils opt to DF.sort for parity, no? Or is it not available in the expression?

@billylanchantin
Copy link
Contributor Author

I'm leaning toward map as well.

Like you say, it doesn't carry any baggage over from R. And the parallels to Enum make it approachable for Elixir devs. Heck, we could probably document DF.mutate in terms of S.map and newcomers might even pick it up quicker.

We should just bring the nils opt to DF.sort for parity, no? Or is it not available in the expression?

(DF.arrange?) Yes I believe it's available. Though I'm still getting used to Rust TBH.

df_arrange in dataframe.rs calls sort. That function doesn't allow a nulls options. But the function sort_with_options does. I think the only consideration is that sort calls sort_in_place while sort_with_options does not. That makes me think there's additional memory allocation. But both sort and sort_with_options eventually call sort_impl to do the actual work:

  • sort -> sort_in_place -> sort_impl
  • sort_with_options -> sort_impl

Since sort_impl is public, I'm sure we can wrangle a memory-efficient version that allows a nulls option.

We should still decide on names though. I think I'm fine with S.sort. It's the Enum-parallel option, which means DF.arrange/S.sort will match the DF.mutate/S.map pair.

@josevalim
Copy link
Member

I think we can just call sort_with_options. sort does call sort_in_place but it clones before. sort_with_options clones as well.

Regarding the name, we can definitely call it sort, but we cannot have the current sort and the sort macro coexisting on the same API (the second argument is ambiguous, it is either a keyword or a query, but we cannot resolve this ambiguity at the macro level), so we would need to rename one of them.

Anyway, I think the first step is to add :nils to arrange. Btw, this may also be a good opportunity to default to non-stable sorting (which is much more efficient).

@billylanchantin
Copy link
Contributor Author

Wait a second, I'm second guessing. Does S.sort need to be a macro? I'm realizing that there's not much to put in there: there's only one "column" to choose from.

Unless the idea is that S.sort acts like Enum.sort_by. E.g.

[2, 4, 6]
|> S.from_list()
|> S.sort(remainder(_, 3))
|> S.to_list()
# [6, 4, 2]

Btw, this may also be a good opportunity to default to non-stable sorting (which is much more efficient).

Makes sense, might as well do that while we're in there.

@josevalim
Copy link
Member

The whole underscore bit relies on it being a macro, right?

@billylanchantin
Copy link
Contributor Author

billylanchantin commented Nov 9, 2023

Yep my bad, ignore me! I was looking at the docs for DF.arrange and all the examples were like:

df = DF.new(a: [3, 2, 1])
DF.arrange(df, a)

I didn't realize you could do any expression, e.g.:

DF.arrange(df, 1 - a)

So DF.arrange is like Enum.sort_by, not Enum.sort.

@billylanchantin
Copy link
Contributor Author

I think this issue is now closed :)

Are there any other macros I'm overlooking?

@cigrainger
Copy link
Member

I don't think so! Great work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants