Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Series.length/1 and Series.member?/2 #746

Merged
merged 3 commits into from
Nov 28, 2023

Conversation

costaraphael
Copy link
Contributor

It's me again! 💜

This PR is adding Series.length/1 (to calculate the length of list elements) and Series.member?/2 (to check membership in list elements). There are a couple of things to consider:

  • Function names. I'm not sure that's the pattern you want to follow (it seems like it is with join/2), but I also thought of something like list_lenghts/1 and list_member?/2. Let me know if you have any thoughts.
  • join/2 was not being implemented for LazySeries, so I added a test for it in DataFrame.mutate/2 and implemented it as well.
  • I moved the conversion of struct to expression literals closer to the structs (by directly implementing the Literal trait. This felt cleaner/easier to re-use, but I'm more than happy to undo it if you prefer the original way.
  • For the eager implementation of member?/2, I followed the Python approach (wrap the series into a frame and re-use the lazy implementation). Let me know if you have any thoughts on this.
  • To make member?/2 work with all our allowed Elixir types, I introduced an ExValidValue enum in the Rust code, which essentially dynamically decodes the Erlang terms into a single type, allowing us to have a single function definition in the Rust code. I think this could be re-used for functions like fill_missing/2 that currently use a double-dispatch approach.
  • member?/2 is not working properly for :datetime. It doesn't error out, but it returns false when it shouldn't (I've left a skipped test with a comment to demonstrate this). My gut feeling is that it is a precision issue, but I ran out of time to continue investigating tonight. Please let me know if you have any ideas 😅

At the same time that it feels like this PR is doing too much, the changes do seem connected, so I've decided to keep it as is instead of splitting it preemptively. Please let me know if you want me to split it up.

@josevalim
Copy link
Member

It all looks good to me, awesome job. We just need to tackle the issue with member? and datetimes.

Also, we should probably make it so left in right supports literal in series[list], but it can be made after this is merged. :)

@costaraphael
Copy link
Contributor Author

@josevalim I figured what was the problem with the datetimes not working! It was indeed a precision issue.

By default, datetimes are mapped with precision :microsecond, while the literal conversion provided out of the box for chrono::naive::NaiveDateTime may use nanoseconds or microseconds, depending on the year of the datetime. So in that specific test case, it was comparing microseconds with nanoseconds.

I figured out a way to solve this by introducing a lit_with_matching_precision function to the ExValidValue struct, which checks the target data type and casts precisions accordingly if necessary.

This worked wonders, I can even mix-and-match duration precisions, and as long as the absolute values are the same it just works (I've even updated the duration test to showcase that).

The only caveat is that for this to work nicely for both eager and lazy cases, I started passing the data type information from Elixir to Rust (not really necessary in the eager case, but I thought it would be worth it to keep things consistent).

@cigrainger
Copy link
Member

cigrainger commented Nov 28, 2023

This is awesome! Yeah... re: naming I do think there's a lot of potential to get confused with Series.length/1 and Series.member?/2. I think list_{fun} makes sense here but I'm open to being convinced otherwise. I would definitely get confused between Series.length/1 and Series.size/1. And to a lesser extent Series.member?/2 w/ Series.in/2.

@josevalim
Copy link
Member

josevalim commented Nov 28, 2023

@cigrainger what if we call it lengths, partially borrowing from Polars but still keeping Elixir's convention of calling it "length"? I am fine with member? as is.

@cigrainger
Copy link
Member

Oh I like lengths. Sold. I'm fine with member? as well.

@costaraphael
Copy link
Contributor Author

Aye, pushing a commit with the rename now! 😄

@josevalim josevalim merged commit acd6756 into elixir-explorer:main Nov 28, 2023
4 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants