Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add replace/3 expressions #719

Merged
merged 2 commits into from
Oct 14, 2023

Conversation

DeemoONeill
Copy link
Contributor

This is a naive initial implementation this might not work with more complex expressions.

The api for Polars replace is a funny one, it takes expressions not strings, this is so they can do things like passing columns as arguments

however if i make the argument type an expression it fails due to a missing decodor in rustler::nif

If i make it an ExExpr then i get an argument error.

Any ideas on this?

links to #718

might not work with expressions
@josevalim
Copy link
Member

Hi @DeemoONeill! You don't have to worry about the other arguments right now because Explorer.Series requires both arguments to be strings. If we want to support series as arguments, then we would change lazy_series.ex to cast all of replace/3 arguments to expressions, and then you could match on ExExpr as it would be guaranteed you get all expressions.

You are welcome to try that in another PR if you want, but this one looks great to me. ❤️

@josevalim josevalim merged commit a0631ce into elixir-explorer:main Oct 14, 2023
4 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@DeemoONeill
Copy link
Contributor Author

@josevalim do the expressions expect the same arguments as their series counterparts? for example if i changed the series Replace types, would that affect the expressions too?

I've not quite worked out how we know what types the expressions recieve

@josevalim
Copy link
Member

Series are represented like this:

%Series{dtype: ..., data: %backend{}}

By default, the backend is a Polars series, but inside mutate, the backend is Explorer.Backend.LazySeries.

In other words, when you call Explorer.Series.replace/3 inside mutate, you always know if the arguments are a string or a series and, if a series, their dtype. We then proxy the call to the backend.

So it goes like this:

  1. You call Explorer.Series.replace(series, "foo", "bar")

  2. Which calls series.data.__struct__.replace(series, "foo", "bar")

  3. Since data.__struct__ is Explorer.Backend.LazySeries, you call Explorer.Backend.LazySeries.replace(series, "foo", "bar"), which then makes sure all of the arguments are converted to expressions/literals before you call Rust

Now, more than one of the replace arguments can be a series, but that introduces some complexities. For example, should their dtypes match? What happens if they come from different backends? That's the most annoying part, you can look at how other functions, such as arithmetic operators and select deal with this. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants