Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.row_index/1 #862

Merged
merged 5 commits into from
Feb 20, 2024
Merged

Series.row_index/1 #862

merged 5 commits into from
Feb 20, 2024

Conversation

iurimateus
Copy link
Contributor

@iurimateus iurimateus commented Feb 18, 2024

Interesting. So what if we:

  1. Add a Series.row_index()
  2. Make it raise for the default backend saying it is only supported in dataframes
  3. Implement it properly for expressions

Would that work? Perhaps we try that as a separate PR so we can compare them side by side?

Originally posted by @josevalim in #833 (comment)

@josevalim is this what you had in mind? I think it's the first /0 function in Series; I'm also not sure on the implementation.

Should we rename len() to size() for consistency?

cc @cigrainger

@josevalim
Copy link
Member

Thank you @iurimateus. Quick question: do we need to use expr_len? couldn't we use size(...) and expect the series as argument for row_index?

@billylanchantin
Copy link
Contributor

@iurimateus I think expr_len is already present, it's just named expr_size:

https://github.com/elixir-explorer/explorer/blob/main/native/explorer/src/expressions.rs#L576-L580

@iurimateus
Copy link
Contributor Author

iurimateus commented Feb 18, 2024

For the record, these changes were before we had size.

@iurimateus I think expr_len is already present, it's just named expr_size:

Thanks for pointing it out, I was following py-polars implementation of using dsl::len(), but that doesn't seem necessary (and not really compatible with Explorer's API, on which it doesn't expose 'expressions').

Thank you @iurimateus. Quick question: do we need to use expr_len? couldn't we use size(...) and expect the series as argument for row_index?

That works, I've pushed a new commit.

Thoughts on int_range?

@josevalim
Copy link
Member

I would prefer to postpone int_range for now. So let's remove it from the Elixir side but we can still keep it on the Rust side.

We also need to list row_index as a callback here:

@callback categories(s) :: s

And then provide an implementation for regular series (which will basically return a list of u32).

lib/explorer/series.ex Outdated Show resolved Hide resolved
@@ -6024,6 +6024,14 @@ defmodule Explorer.Series do
def member?(%Series{dtype: dtype}, _value),
do: dtype_error("member?/2", dtype, [{:list, :_}])

def row_index(%Series{} = series) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add docs here. :)

def row_index(series) do
Explorer.DataFrame.new(series: series)
|> Explorer.DataFrame.mutate_with(&[row_index: Series.row_index(&1[:series])])
|> Explorer.DataFrame.pull(:row_index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should expose this function in Rust instead? https://docs.rs/polars/latest/polars/datatypes/struct.UInt32Type.html#method.new 🤔 would it have any performance difference?

Copy link
Contributor Author

@iurimateus iurimateus Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And perhaps new(lazy: true), but I'm not sure about allocations...)

Thanks for the reference, I've pushed a commit exposing it in Rust.
I'll rebase after review.

@josevalim
Copy link
Member

Awesome job. I have pushed one last comment and I think we are good to go!

@iurimateus iurimateus changed the title DataFrame.with_row_count/2 (impl using int_range/len) Series.row_index/1 (impl using int_range/len) Feb 19, 2024
@iurimateus iurimateus changed the title Series.row_index/1 (impl using int_range/len) Series.row_index/1 Feb 19, 2024
lib/explorer/series.ex Outdated Show resolved Hide resolved
lib/explorer/series.ex Outdated Show resolved Hide resolved
@josevalim josevalim marked this pull request as ready for review February 20, 2024 07:32
@josevalim josevalim merged commit 9e75178 into elixir-explorer:main Feb 20, 2024
4 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants