Allocate a buffer of the correct length for ScalarValue::FixedSizeBinary in ScalarValue::to_array_of_size#18903
Conversation
…ary in ScalarValue::to_array_of_size
|
Hmm seems this change causes problems with /// Create a new [`FixedSizeBinaryArray`] from the provided parts, returning an error on failure
///
/// # Errors
///
/// * `size < 0`
/// * `values.len() / size != nulls.len()`
pub fn try_new(
size: i32,
values: Buffer,
nulls: Option<NullBuffer>,
) -> Result<Self, ArrowError> {
let data_type = DataType::FixedSizeBinary(size);
let s = size.to_usize().ok_or_else(|| {
ArrowError::InvalidArgumentError(format!("Size cannot be negative, got {size}"))
})?;
let len = values.len() / s; <--- crashI don't see any way to create a new We maybe could create a Any preferences on how we should proceed? As I am unsure whether we even want to fix this issue on the DataFusion side (or wait for an arrow-rs update), I'll wait another opinion. |
This does seem like a bug; for reference, pyarrow allows it: >>> import pyarrow as pa
>>> pa.binary(0)
FixedSizeBinaryType(fixed_size_binary[0])
>>> pa.array([], type=pa.binary(0))
<pyarrow.lib.FixedSizeBinaryArray object at 0x102030b80>
[]
>>> pa.array([], type=pa.binary(0)).type
FixedSizeBinaryType(fixed_size_binary[0])It would be nice to upstream a proper fix to arrow-rs and then can wait for it here; otherwise if we need a fix for DataFusion now we can do a workaround as you describe:
|
|
@Jefffrey Thanks for checking against pyarrow! Here is a PR that fixes the issue in arrow-rs. I also found out that we can create zero-sized item arrays using the builder API. I'll give that a try. |
…SizeBinaryArray in `ScalarValue::to_array_of_size`
|
Should be good to merge once the conflict is fixed fyi @tobixdev |
|
Thanks for letting me know, I didn't see it. I'll fix it today or tomorrow. 👍 |
|
@Jefffrey Should be good to go. 👍 |
|
Thanks @tobixdev |
…ary in ScalarValue::to_array_of_size (apache#18903) ## Which issue does this PR close? - Closes apache#18870. We should open a follow-up issue that tracks re-use `FixedSizeBinaryArray::new_null` after upgrading arrow-rs. With this PR, we could backport it to apache#18843 if we wish. Alternatively, we could also wait for a fix by arrow-rs. ## Rationale for this change As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use `FixedSizeBinaryArray::new_null`. ## What changes are included in this PR? - A test. (`assert_eq!(result.as_fixed_size_binary().values().len(), 10);` returned 0 before) - Directly allocate the values buffer in DataFusion. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.
…DF 51Branch (#19017) ## Which issue does this PR close? Backports #18903 to DF 51 branch ## Rationale for this change Fix the bug in a future 51.1.0 release (#18843) ## What changes are included in this PR? Same as in #18903 ## Are these changes tested? Yes. ## Are there any user-facing changes? Yes, fix the original bug and create a non-zero sized values buffer in the case that exhibits the bug.
|
@Jefffrey Thanks for the review! Should we track an issue for re-using |
Raising an issue to track this refactor sounds reasonable |
…rs Upgrade (#19801) ## Which issue does this PR close? - Closes #19085 ## Rationale for this change Use idiomatic way of creating a fixed size binary null array. ## What changes are included in this PR? Basically reverting the workaround from #18903 as the issue has been fixed in arrow-rs. ## Are these changes tested? Yes, test introduced in #18903 . ## Are there any user-facing changes? No
Which issue does this PR close?
ScalarValue::to_array_of_sizeDoes Not Correctly Allocate a Values Buffer forScalarValue::FixedSizeBinary#18870.We should open a follow-up issue that tracks re-use
FixedSizeBinaryArray::new_nullafter upgrading arrow-rs.With this PR, we could backport it to #18843 if we wish.
Alternatively, we could also wait for a fix by arrow-rs.
Rationale for this change
As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use
FixedSizeBinaryArray::new_null.What changes are included in this PR?
assert_eq!(result.as_fixed_size_binary().values().len(), 10);returned 0 before)Are these changes tested?
Yes
Are there any user-facing changes?
Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.