Skip to content

Conversation

@kazantsev-maksim
Copy link
Contributor

Which issue does this PR close?

Closes #18058

Rationale for this change

When adding the bitmap_count function to Comet, we get the following error - org.apache.comet.CometNativeException: Error from DataFusion: bitmap_count expects Binary/BinaryView/FixedSizeBinary/LargeBinary as argument, got Dictionary(Int32, Binary).

Are these changes tested?

Added new UT

@github-actions github-actions bot added the spark label Oct 24, 2025
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider other key & value types? Since this only implements for i32 x Binary

Comment on lines 104 to 119
Dictionary(k, v) if k.as_ref() == &DataType::Int32 && v.as_ref() == &Binary => {
let dict_array = as_dictionary_array::<Int32Type>(input_array);
let binary_array = as_binary_array(dict_array.values())?;

let result: Int64Array = dict_array
.keys()
.iter()
.map(|key| {
key.and_then(|k| {
binary_count_ones(Some(binary_array.value(k as usize)))
})
})
.collect();

Ok(result)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Dictionary(k, v) if k.as_ref() == &DataType::Int32 && v.as_ref() == &Binary => {
let dict_array = as_dictionary_array::<Int32Type>(input_array);
let binary_array = as_binary_array(dict_array.values())?;
let result: Int64Array = dict_array
.keys()
.iter()
.map(|key| {
key.and_then(|k| {
binary_count_ones(Some(binary_array.value(k as usize)))
})
})
.collect();
Ok(result)
}
Dictionary(k, v) if k.as_ref() == &DataType::Int32 && v.as_ref() == &Binary => {
let dict_array = as_dictionary_array::<Int32Type>(input_array);
let array = dict_array.downcast_dict::<BinaryArray>().unwrap();
Ok(array
.into_iter()
.map(binary_count_ones)
.collect::<Int64Array>())
}

Using TypedDictionaryArray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

}

#[test]
fn test_dictionary_encoded_bitmap_count_invoke() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add this as an SLT instead? Can make use of arrow_cast(value, 'Dictionary(...)') to ensure the values are dictionaries

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 25, 2025
@Jefffrey Jefffrey added this pull request to the merge queue Oct 27, 2025
Merged via the queue into apache:main with commit f870dcd Oct 27, 2025
28 checks passed
@Jefffrey
Copy link
Contributor

Thanks @kazantsev-maksim

tobixdev pushed a commit to tobixdev/datafusion that referenced this pull request Nov 2, 2025
apache#18273)

## Which issue does this PR close?

Closes apache#18058

## Rationale for this change

When adding the bitmap_count function to Comet, we get the following
error - org.apache.comet.CometNativeException: Error from DataFusion:
bitmap_count expects Binary/BinaryView/FixedSizeBinary/LargeBinary as
argument, got Dictionary(Int32, Binary).

## Are these changes tested?

Added new UT

---------

Co-authored-by: Kazantsev Maksim <mn.kazantsev@gmail.com>
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
apache#18273)

## Which issue does this PR close?

Closes apache#18058

## Rationale for this change

When adding the bitmap_count function to Comet, we get the following
error - org.apache.comet.CometNativeException: Error from DataFusion:
bitmap_count expects Binary/BinaryView/FixedSizeBinary/LargeBinary as
argument, got Dictionary(Int32, Binary).

## Are these changes tested?

Added new UT

---------

Co-authored-by: Kazantsev Maksim <mn.kazantsev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SparkBitmapCount does not support dictionary encoded input

2 participants