Skip to content

count_all() aggregations cannot be aliased #16795

@BlakeOrth

Description

@BlakeOrth

Describe the bug

Attempting to apply an alias to a count_all() aggregation results in an Internal Error with the following message:

called `Result::unwrap()` on an `Err` value: Internal("Invalid aggregate expression 'Alias(Alias { expr: AggregateFunction(AggregateFunction { func: AggregateUDF { inner: Count { name: \"count\", signature: Signature { type_signature: OneOf([VariadicAny, Nullary]), volatility: Immutable } } }, params: AggregateFunctionParams { args: [Literal(Int64(1), None)], distinct: false, filter: None, order_by: None, null_treatment: None } }), relation: None, name: \"count(*)\", metadata: None })'")

Presumably this is because count_all() already applies an alias as count(*). It looks like it's currently possible to work around this restriction by using count(lit(1)) in place of count_all().

To Reproduce

Tested using DataFusion 48.0.1, found in DataFusion 47.0.0

#[cfg(test)]
mod tests {
    use arrow::{
        array::StringArray,
        datatypes::{DataType, Field, Schema},
        record_batch::RecordBatch,
    };
    use datafusion::{functions_aggregate::count::count_all, prelude::SessionContext};
    use std::sync::Arc;

    #[tokio::test]
    async fn alias_count_all() {
        // Create a simple RecordBatch with a single row
        let schema = Schema::new(vec![Field::new("id", DataType::Utf8, false)]);
        let id_array = StringArray::from(vec!["test_id_1"]);
        let record_batch =
            RecordBatch::try_new(Arc::new(schema), vec![Arc::new(id_array)]).unwrap();

        // Create DataFusion context and DataFrame from the record batch
        let ctx = SessionContext::new();
        let df = ctx.read_batch(record_batch).unwrap();

        // Add count_all() aggregation and alias it as "TOTAL_COUNT"
        let df_with_count = df
            .aggregate(vec![], vec![count_all().alias("TOTAL_COUNT")])
            .unwrap();

        // Verify the column exists and is aliased as expected
        let results = df_with_count.collect().await.unwrap();
        assert_eq!(results.len(), 1);
        let batch = &results[0];
        let schema = batch.schema();
        let field = schema.field(0);
        assert_eq!(
            field.name(),
            "TOTAL_COUNT",
            "Column should be aliased to TOTAL_COUNT"
        );
    }
}

Expected behavior

Aliasing count_all() aggregations should result in the user supplied alias, not a runtime error.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions