Skip to content

Conversation

findepi
Copy link
Member

@findepi findepi commented Aug 13, 2025

Follows similar change for WindowUDFImpl and AggregateUDFImpl, i.e.
the 8494a39 and b8bf7c5 commits.

Previously, the ScalarUDFImpl trait contained equals and
hash_value methods with contracts following the Eq and Hash
traits. However, the existence of default implementations of these
methods made it error-prone, with many functions (scalar, aggregate,
window) missing to customize the equals even though they ought to.
There is no fix to this that's not an API breaking change, so a breaking
change is warranted.

Removing the default implementations would be enough of a solution, but
at the cost of a lot of boilerplate needed in implementations.

Instead, this removes the methods from the trait, and reuses DynEq,
DynHash traits used previously only for physical expressions. This
allows for functions to provide their implementations using no more than
#[derive(PartialEq, Eq, Hash)] in a typical case.

@findepi findepi force-pushed the findepi/udf-perfect-eq branch from 6037ccb to bff9ef1 Compare August 13, 2025 08:18
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate proto Related to proto crate functions Changes to functions implementation ffi Changes to the ffi crate spark labels Aug 13, 2025
Follows similar change for `WindowUDFImpl` and `AggregateUDFImpl`, i.e.
the 8494a39 and
b8bf7c5 commits.

Previously, the `ScalarUDFImpl` trait contained `equals` and
`hash_value` methods with contracts following the `Eq` and `Hash`
traits.  However, the existence of default implementations of these
methods made it error-prone, with many functions (scalar, aggregate,
window) missing to customize the equals even though they ought to.
There is no fix to this that's not an API breaking change, so a breaking
change is warranted.

Removing the default implementations would be enough of a solution, but
at the cost of a lot of boilerplate needed in implementations.

Instead, this removes the methods from the trait, and reuses `DynEq`,
`DynHash` traits used previously only for physical expressions. This
allows for functions to provide their implementations using no more than
`#[derive(PartialEq, Eq, Hash)]` in a typical case.
@findepi findepi force-pushed the findepi/udf-perfect-eq branch from bff9ef1 to e7a9934 Compare August 13, 2025 08:39

- search for `\#\[derive\(Debug\)\](\n *(pub )?struct \w+ \{\n *signature\: Signature\,\n *\})`,
- replace with `#[derive(Debug, PartialEq, Eq, Hash)]$1`,
- review all the changes and make sure only function structs were changed.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the only non-mechanical change in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reference, due to how DF functions are written, i used a different regex

\#\[derive\(Debug\)\](\n *(pub(\((super|crate)\))? )?struct \w+ \{\n( *name: String,\n)? *signature\: Signature\,\n( *aliases: Vec<String>,\n)? *\})
  • still not perfect, eg some functions derive Clone (just because they can) and the pattern doesn't cater for this
  • it's unlikely that downstream projects have fields such as name and aliases in their functions, so I didn't cater for these in the upgrade guide.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loogs good to me -- thanks @findepi and @timsaucer

@findepi findepi merged commit 1961fe7 into apache:main Aug 13, 2025
28 checks passed
@findepi findepi deleted the findepi/udf-perfect-eq branch August 13, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate documentation Improvements or additions to documentation ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate spark sql SQL Planner

Projects

None yet

3 participants