Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert BuiltInWindowFunction::Ntile to a user defined window function #12694

Closed
Tracked by #8709
jcsherin opened this issue Oct 1, 2024 · 4 comments · Fixed by #13040
Closed
Tracked by #8709

Convert BuiltInWindowFunction::Ntile to a user defined window function #12694

jcsherin opened this issue Oct 1, 2024 · 4 comments · Fixed by #13040
Labels
enhancement New feature or request

Comments

@jcsherin
Copy link
Contributor

jcsherin commented Oct 1, 2024

Is your feature request related to a problem or challenge?

Part of #8709

There is now no difference between "built in" / "prepackaged" scalar and aggregate functions in DataFusion, however there are still some "built in" window functions -- see the current source for BuiltInWindowFunction for the up to date list of what remains

The problem with having two different kinds of window functions is

  1. There are some features that may not be available to User Defined Window Functions that rely on built in
  2. Users can not easily choose which window functions to include or override the behavior if they need something different

Describe the solution you'd like

I would like to remove the "built in" version of this function and convert it to a user defined function

Describe alternatives you've considered

At a high level the process is:

  1. Add a new WindowUDFImpl in the functions-window crate
  2. Port the code from the relevant existing implementation of the the built in functions in datafusion/physical-expr/src/window
  3. Remove the BuiltInWindowFunction variant and then get everything to compile (the compiler will show you where the existing implementations are)

Additional context

Here are some good examples:

@jcsherin jcsherin added the enhancement New feature or request label Oct 1, 2024
@jcsherin
Copy link
Contributor Author

jcsherin commented Oct 1, 2024

This is a good first issue.

@hailelagi
Copy link
Contributor

take

@jonathanc-n
Copy link
Contributor

@hailelagi Are you working on this? I can take it up if not

@hailelagi
Copy link
Contributor

hailelagi commented Oct 12, 2024

@jonathanc-n please feel free to work on this in parallel, if you can/want to complete it urgently that works just as well. see also: https://datafusion.apache.org/contributor-guide/index.html#open-contribution-and-assigning-tickets

If someone is already working on an issue that you want or need but hasn’t been able to finish it yet, you should feel free to work on it as well.

With that said there a few other built-ins in need of conversion to a udf without an issue i believe? but could be wrong cc: @jcsherin

@hailelagi hailelagi removed their assignment Oct 23, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue Oct 28, 2024
Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this issue Oct 28, 2024
Michael-J-Ward added a commit to apache/datafusion-python that referenced this issue Nov 10, 2024
* patch datafusion deps

* migrate from deprecated RuntimeEnv::new to RuntimeEnv::try_new

Ref: apache/datafusion#12566

* remove Arc from create_udf call

Ref: apache/datafusion#12489

* doc typo

* migrage new UnnestOptions API

Ref: https://github.com/apache/datafusion/pull/12836/files

* update API for logical expr Limit

Ref: apache/datafusion#12836

* remove logical expr CrossJoin

It was removed upstream.

Ref: apache/datafusion#13076

* update PyWindowUDF

Ref: apache/datafusion#12803

* migrate window functions lead and lag to udwf

Ref: apache/datafusion#12802

* migrate window functions rank, dense_rank, and percent_rank to udwf

Ref: apache/datafusion#12648

* convert window function cume_dist to udwf

Ref: apache/datafusion#12695

* convert window function ntile to udwf

Ref: apache/datafusion#12694

* clean up functions_window invocation

* Only one column was being passed to udwf

* Update to DF 43.0.0

* Update tests to look for string_view type

* String view is now the default type for strings

* Making a variety of adjustments in wrappers and unit tests to account for the switch from string to string_view as default

* Resolve errors in doc building

---------

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants