Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change :float to {:f, 64} and add {:f, 32} dtype #739

Merged
merged 9 commits into from
Nov 24, 2023
Merged

Conversation

philss
Copy link
Member

@philss philss commented Nov 23, 2023

The idea is to support more dtypes in the way they are represented in Arrow/Polars.

In the near future, we won't need to "normalise" numeric columns anymore, and by consequence, Explorer is going to use less memory if needed.

This is related to #471

@philss
Copy link
Member Author

philss commented Nov 23, 2023

I feel that we may want to have a "shortcut" version for declaring the dtypes, like in Nx.
We could accept either :float for {:f, 64} or :f32 and :f64.

Also, the inspection of the series and dataframes could be different, and follow Nx as well - just f32 or f64 -, or slightly larger, like float64.

WDYT @josevalim @cigrainger @billylanchantin ?

@josevalim
Copy link
Member

I think we should definitely still support :float for backwards compatibility. We can add f32/f64 later, it should be a quick addition. Also, please add tests that to_iovec with f32 works. Other than that, this is great, ship it!

@josevalim
Copy link
Member

I think we should definitely still support :float for backwards compatibility.

Suggestion: rename valid_dtype? to normalize_dtype (which returns nil if no dtype) and rename validate_dtype to normalize_dtype, which returns the normalize dtype. Now change all call sites of normalize_dtype and normalize_dtype! to consider its return value (which is the normalized dtype). Then it should be easy to add :float, but I think we can do that in the next PR. So let's just add the test for to_iovec with f32 to this one and the normalize stuff can come in a separate one!

@cigrainger
Copy link
Member

Yep I agree -- let's keep the 'simple' names like :float and give them defaults as they are today. Otherwise this is fantastic. Re: inspection, I prefer to follow Nx as closely as possible.

The idea is to support more dtypes in the way they are represented
in Arrow/Polars.

In the near future, we won't need to "normalise" numeric columns anymore,
and by consequence, Explorer is going to use less memory if needed.

This is related to #471
This is going to make easier to adopt "aliases" for dtypes.
Use the `"f64"` string instead of `float[64]`.
@philss philss marked this pull request as ready for review November 24, 2023 05:44
@philss
Copy link
Member Author

philss commented Nov 24, 2023

@josevalim @cigrainger I made changes applying your suggestions. Please take a look.

The only thing I didn't change was to add the aliases :f32 and :f64 for the float dtypes. I'm planning to do that in another PR tomorrow, with docs as well.

Copy link
Member

@cigrainger cigrainger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped one Q but otherwise this looks fantastic. I'm excited to support more explicit datatypes, esp w/ zero-copy conversion to tensors.

README.md Outdated
@@ -13,7 +13,7 @@ data exploration to Elixir.

Explorer high-level features are:

- Simply typed series: `:binary`, `:boolean`, `:category`, `:date`, `:datetime`, `:duration`, `:float`, `:integer`, `:string`, and `:time`.
- Simply typed series: `:binary`, `:boolean`, `:category`, `:date`, `:datetime`, `:duration`, `{:f, 32}`, `{:f, 64}`, `:integer`, `:string`, and `:time`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't do something like:

Suggested change
- Simply typed series: `:binary`, `:boolean`, `:category`, `:date`, `:datetime`, `:duration`, `{:f, 32}`, `{:f, 64}`, `:integer`, `:string`, and `:time`.
- Simply typed series: `:binary`, `:boolean`, `:category`, `:date`, `:datetime`, `:duration`, `:float` (`{:f, 32}` or `{:f, 64}`), `:integer`, `:string`, and `:time`.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I added more mentions of it in other places as well: 3605cfc

@philss philss merged commit c1a7203 into main Nov 24, 2023
4 checks passed
@philss philss deleted the ps-expand-float-dtypes branch November 24, 2023 16:29
philss added a commit that referenced this pull request Nov 27, 2023
It adds `:f32` and `:f64` as shortcuts/aliases for the float dtypes.

This is related to #739
philss added a commit that referenced this pull request Nov 28, 2023
* Add shortcut atoms for float dtypes

It adds `:f32` and `:f64` as shortcuts/aliases for the float dtypes.

This is related to #739

* Update series.ex

---------

Co-authored-by: José Valim <jose.valim@dashbit.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants