Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename DataFrame.arrange as DataFrame.sort #777

Merged
merged 6 commits into from
Dec 19, 2023
Merged

Conversation

billylanchantin
Copy link
Contributor

@billylanchantin billylanchantin commented Dec 16, 2023

Renames:

  • DataFrame.arrange -> DataFrame.sort_by
  • DataFrame.arrange_with -> DataFrame.sort_with

Closes:

@billylanchantin
Copy link
Contributor Author

billylanchantin commented Dec 16, 2023

Naming

There's a naming issue I hadn't considered. So far I've named the DataFrame functions to match their Series counterparts:

Type Series DataFrame
Function Series.sort N/A
Expression Series.sort_by DataFrame.sort_by
Callback Series.sort_with DataFrame.sort_with

But this is how I think we should name them instead:

Type Series DataFrame
Function Series.sort_values N/A
Expression Series.sort DataFrame.sort
Callback Series.sort_with DataFrame.sort_with

That change would be breaking because Series.sort would change meaning. (We discussed this here.)

However, I hadn't considered "five main verbs" angle. As it stands, this is how that section of "Exploring Explorer" reads:

In Explorer, like in dplyr, we have five main verbs to work with dataframes:

  • select
  • filter
  • mutate
  • sort_by
  • summarise

Now we could just change that bullet to "sort" and be done with it. But each of the other four follows the pattern:

  • Explorer.DataFrame.verb (expression)
  • Explorer.DataFrame.verb_with (callback)

Since we're pre-1.0, I think it's worth breaking Series.sort to remain consistent.

As a bonus, there'd be an opportunity for a DataFrame analog of Series.sort_values. Maybe something like:

DataFrame.sort_columns(asc: "column_name")

WDYT? /cc @josevalim @philss @cigrainger

@philss
Copy link
Member

philss commented Dec 18, 2023

@billylanchantin I'm more inclined to the version we have in this PR - the sort / sort_by / sort_with. But I'm not against the proposed new names! And I think it's fine to break the API at this point.

@jcmkk3
Copy link

jcmkk3 commented Dec 18, 2023

I'm just a dataframe geek that likes to keep an eye on this project, but I don't find sort_by to be a problem. There is also group_by, which is another verb that provides context to subsequent operations. I'm glad to see the change from arrange. I'm a big fan of the tidyverse, but don't view some of the names to be optimal. They were restricted by wanting to avoid explicit namespacing and not wanting to conflict with built-in functions.

@billylanchantin
Copy link
Contributor Author

Thanks @philss! I'm gonna wait for @cigrainger to weigh in too just in case. He was who originally brought up changing the arrange choice.

@cigrainger
Copy link
Member

Hey! Sorry for weighing in late. I don't think sort_by is a problem. It's also 100% fine to break stuff at this point in time, so don't let that bound you in. We should focus on getting it right. Thanks @jcmkk3 I agree totally. And I think the _by suffix makes sense in a dataframe.

Copy link
Member

@cigrainger cigrainger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

@billylanchantin
Copy link
Contributor Author

Thanks all! If no one objects to _by, then I don't either.

Also sorry for the snub, @jcmkk3! I never saw your input. Turns out we commented within 60s of each other 😅 I think group_by is a great parallel to draw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants