Skip to content

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Mar 6, 2025

Which issue does this PR close?

See #14123 (comment)

Also (main) part of #14652 and closes #13515

Rationale for this change

People should use invoke_with_args instead. Calling invoke_batch for any DF-builtin function will fail at runtime with something like Function X does not implement invoke but called

We should also consider just removing the old functions (invoke_batch, invoke, invoke_no_args), to clear up the trait.

What changes are included in this PR?

Mark invoke_batch as deprecated. It was already called so in the docstring, but the compiler doesn't read those.

Are these changes tested?

Existing tests

Are there any user-facing changes?

New deprecation

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Mar 6, 2025
@Blizzara Blizzara mentioned this pull request Mar 6, 2025
26 tasks
@github-actions github-actions bot added the functions Changes to functions implementation label Mar 6, 2025
number_rows: usize,
) -> Result<ColumnarValue> {
#[allow(deprecated)]
self.inner.invoke_batch(args, number_rows)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this invoke_batch with invoke_with_args? So that the allow(deprecated) can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, since for invoke_with_args we'd need the return type which isn't provided here. Also I don't see that as very important change anyways, this is also invoke_batch and deprecated so passing directly through makes sense, and both these invoke_batchs can be removed at the same time.

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean up! I think this pr could be merged after fix clippy issue, thanks @Blizzara 👍

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Mar 7, 2025
// All benches are single batch run with 8192 rows
let character_length = datafusion_functions::unicode::character_length();

let return_type = DataType::Utf8;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look up what the correct return type should be, but there is nothing validating that I got it correct nor that it won't get broken in the future, e.g. if some of these functions start returning stringview instead. But given these are benches, I guess that's fine? If the type is used and wrong, presumably it'll complain somewhere, if it's not used and it's wrong it's not the end of the world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this change is fine. Thank you

@Blizzara
Copy link
Contributor Author

Blizzara commented Mar 7, 2025

Nice clean up! I think this pr could be merged after fix clippy issue, thanks @Blizzara 👍

Thanks, clippy pointed out a whole bunch of invoke_batch usage, so I ended up cleaning them all 😅

@alamb alamb added the api change Changes the API exposed to users of the crate label Mar 7, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @Blizzara 🙏

We should also consider just removing the old functions (invoke_batch, invoke, invoke_no_args), to clear up the trait.

Yeah, I have been thinking that too -- maybe just remove them and even though it would be a breaking API change it would make things much less confusing 🤔

// All benches are single batch run with 8192 rows
let character_length = datafusion_functions::unicode::character_length();

let return_type = DataType::Utf8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this change is fine. Thank you

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I agree with @alamb; it would be better to remove invole_batch because I was confused when I first reviewed this PR.

@Blizzara
Copy link
Contributor Author

Blizzara commented Mar 8, 2025

Sure, I think let’s merge this in first, then I can followup with the removals

@Weijun-H Weijun-H merged commit f47ea73 into apache:main Mar 8, 2025
24 checks passed
@Weijun-H
Copy link
Member

Weijun-H commented Mar 8, 2025

Thanks @Blizzara and @alamb 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate core Core DataFusion crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecate ScalarUDFImpl::invoke_batch and move everything to ScalarUDFImpl::invoke_with_args

4 participants