Skip to content

Conversation

rtadepalli
Copy link
Contributor

What's Changed

Move splitAndTransferValidityBuffer up to BaseValueVector.

This PR is not touching the implementation of this function in StructVector -- that is not being derived from BaseValueVector so some amount of duplication is probably fine.

Closes #79

This comment has been minimized.

* @param target target vector
*/
protected void splitAndTransferValidityBuffer(
int startIndex, int length, BaseValueVector target) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting BaseValueVector and not ValueVector here to keep the API (int, int, vector). If I accept ValueVector, then there there needs to be some lambda wrangling to pass in the vector-specific allocateValidityBuffer somehow, or that function needs to be added to ValueVector, both of which don't seem that great. From what I can tell everything except StructVector seems to be descend from BaseValueVector so chose this as part of the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that UnionVector doesn't extend it either (which is good - as it doesn't have a validity buffer)

* @param length number of elements to be copied
* @param target target vector
*/
protected void sliceAndTransferValidityBuffer(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is the one being used by the vectors inside the complex/ directory. BaseValueVector#validityBuffer is protected, so overridden implementations there can't call target.validityBuffer to set a new value. Vectors in the top level org/apache/arrow/vector directory can add their overridden implementations. If at some point some other vector in complex/ wants to have a different version of this function, then I am afraid that vector will need to override splitAndTransferValidityBuffer entirely. Presumably don't want to expose a public setValidityBuffer on BaseValueVector.

@lidavidm lidavidm added the enhancement PRs that add or improve features. label Jun 2, 2025
@github-actions github-actions bot added this to the 18.4.0 milestone Jun 2, 2025
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

* @param target target vector
*/
protected void splitAndTransferValidityBuffer(
int startIndex, int length, BaseValueVector target) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that UnionVector doesn't extend it either (which is good - as it doesn't have a validity buffer)

*
* @param byteSizeTarget desired size of the buffer
*/
protected void allocateValidityBuffer(final long byteSizeTarget) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an abstract class - mark this abstract?

Comment on lines +867 to +868
final ArrowBuf slicedValidityBuffer = validityBuffer.slice(firstByteSource, byteSizeTarget);
target.validityBuffer = transferBuffer(slicedValidityBuffer, target.allocator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this only slightly differs from the 'base' implementation. However I think this one is more correct...I didn't dig too deep but it's possible the 'simple' one that just retains the buffer was a later/incorrect reimplementation or because they were all copy-pasted at some point only some types were adjusted to use transferBuffer

*/
private void allocateValidityBuffer(final int validityBufferSize) {
@Override
protected void allocateValidityBuffer(final long validityBufferSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given many of these are very similar maybe they should delegate to a base implementation with super.allocateValidityBuffer then call any vector-specific logic at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, there is a slight behavior change though -- the validity buffer in BaseFixedWidthVector was not zeroing out its elements post allocation, it is now. Should be fine though I am thinking.

@rtadepalli rtadepalli requested a review from lidavidm June 3, 2025 00:11
@rtadepalli
Copy link
Contributor Author

Resolved all conflicts.

@lidavidm lidavidm merged commit acbc138 into apache:main Jun 5, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement PRs that add or improve features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java] Create a utility function for validity buffer based split and transfer usage in Vector module

2 participants