Allocate a zeroed buffer for FixedSizeBinaryArray::null#8901
Allocate a zeroed buffer for FixedSizeBinaryArray::null#8901alamb merged 1 commit intoapache:mainfrom
Conversation
|
|
||
| let null_array = FixedSizeBinaryArray::new_null(4, 3); | ||
| assert_eq!(null_array.len(), 3); | ||
| assert_eq!(null_array.values().len(), 12); |
There was a problem hiding this comment.
This reported 0 before this change.
| Self { | ||
| data_type: DataType::FixedSizeBinary(size), | ||
| value_data: MutableBuffer::new(capacity).into(), | ||
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), |
There was a problem hiding this comment.
MutableBuffer::new_null is in bits.
| Self { | ||
| data_type: DataType::FixedSizeBinary(size), | ||
| value_data: MutableBuffer::new(capacity).into(), | ||
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), |
There was a problem hiding this comment.
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), | |
| // MutableBuffer::new_null is in bits. | |
| value_data: MutableBuffer::new_null(capacity_in_bytes * 8).into(), |
etseidl
left a comment
There was a problem hiding this comment.
I'm not so familiar with the arrow side of the house, but this seems like correct behavior. It matches what a primitive array would do, and the value_data bytes are already allocated. The only downside I can think of is the extra memset, but that would be done for primitive arrays as well IIUC.
|
I also think we need to prioritize correctness over performance (obviously 😆 ) -- there isn't any obvious way to make this faster that I could see 🤔 |
|
🚀 |
I think zeroing is not required for null values based on the arrow specification. But I guess not doing it has some foot guns and UB issues that we want to avoid. Curious if there is a take on that from arrow-rs. 🤔 Thanks for merging that! I'll make a follow-up PR that adds the small comment. |
Allocating anything in rust is going to zero out the bytes anyways. Unless there is a compelling performance benchmark that shows avoiding the (potentially) extra zeroing makes a non trivial difference I think we should opt for safe |
) # Which issue does this PR close? - Follow-up on #8901 # Rationale for this change It's non-obvious why the number "8" appears here. # What changes are included in this PR? Name the number such that it's more obvious that this is a conversion from bytes to bits. @alamb I can also include the [suggested comment](#8901 (comment)) if you prefer it. I thought the constant may have a lesser risk of becoming outdated without being noticed when changes to `MutableBuffer::new_null` happen. # Are these changes tested? - No behavior changes # Are there any user-facing changes? - No
…ary in ScalarValue::to_array_of_size (#18903) ## Which issue does this PR close? - Closes #18870. We should open a follow-up issue that tracks re-use `FixedSizeBinaryArray::new_null` after upgrading arrow-rs. With this PR, we could backport it to #18843 if we wish. Alternatively, we could also wait for a fix by arrow-rs. ## Rationale for this change As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use `FixedSizeBinaryArray::new_null`. ## What changes are included in this PR? - A test. (`assert_eq!(result.as_fixed_size_binary().values().len(), 10);` returned 0 before) - Directly allocate the values buffer in DataFusion. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.
…ary in ScalarValue::to_array_of_size (apache#18903) ## Which issue does this PR close? - Closes apache#18870. We should open a follow-up issue that tracks re-use `FixedSizeBinaryArray::new_null` after upgrading arrow-rs. With this PR, we could backport it to apache#18843 if we wish. Alternatively, we could also wait for a fix by arrow-rs. ## Rationale for this change As we will have to wait for another arrow-rs update to get the fix for the issue apache/arrow-rs#8901 , we have a temporaray fix that basically calls the same operations from DataFusion's code. Once the fix is in the arrow-rs codebase, we can re-use `FixedSizeBinaryArray::new_null`. ## What changes are included in this PR? - A test. (`assert_eq!(result.as_fixed_size_binary().values().len(), 10);` returned 0 before) - Directly allocate the values buffer in DataFusion. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes, the buffer will now be allocated, zeroed, and set to the expected length. Same as in the arrow-rs fix.
Which issue does this PR close?
FixedSizeBinaryArray::new_nullDoes Not Properly Set the Length of the Values Buffer #8900 .Rationale for this change
This causes the values buffer to have the expected length after creating the null array.
What changes are included in this PR?
Use
MutableBuffer::new_nullinstead ofMutableBuffer::newAre these changes tested?
Yes, additional constructor test
Are there any user-facing changes?
Yes, the buffer will now be correctly initialized when calling
FixedSizeBinaryArray::new_null