Skip to content

Conversation

gabotechs
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Fixing empty count() aggregation functions provided through Substrait

What changes are included in this PR?

Adding back this check https://github.com/apache/datafusion/pull/14824/files#diff-474e53672159d74dae38992a914a74eba81b8350ebe161f11d755f06414ed7b4 and testing it

Are these changes tested?

yes, by a new test and by extending previously existing tests

Are there any user-facing changes?

yes, users will be able to feed empty count() functions through the Substrait interface


let args = from_substrait_func_args(consumer, &f.arguments, input_schema).await?;

// deal with situation that count(*) got no arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just explain in the comment why we need count() to have arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty much just rolled back whatever was removed in #14824, I would like some input first from @jayzhan211 about why was this removed and if it might be breaking something else that the tests are not catching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove it because I don't why we need to convert count() to count(1), is there any reason we need count(1) in subtrait?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is reason, it would be nice to document the reason besides the conversion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that DataFusion does not support aggregate functions with no arguments:

pub fn build(self) -> Result<AggregateFunctionExpr> {
let Self {
fun,
args,
alias,
human_display,
schema,
ordering_req,
ignore_nulls,
is_distinct,
is_reversed,
} = self;
if args.is_empty() {
return internal_err!("args should not be empty");
}

I imagine that this is the reason why it was there before.

If you do a similar operation and analyze DataFusion's logical plan (link), a similar Int64(1) argument is injected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment explaining that

@gabotechs gabotechs force-pushed the fix-empty-count-substrait branch from cc45d06 to 18602af Compare March 21, 2025 10:11
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


let args = from_substrait_func_args(consumer, &f.arguments, input_schema).await?;

// Datafusion does not support aggregate functions with no arguments, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jayzhan211 jayzhan211 merged commit 1c38aff into apache:main Mar 22, 2025
27 checks passed
@jayzhan211
Copy link
Contributor

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty aggregation functions coming from substrait

4 participants