Skip to content

Conversation

@jduo
Copy link
Member

@jduo jduo commented Sep 28, 2023

Rationale for this change

Make vector validation code more consistent with C++. Add missing checks and have the entry point
be the same so that the code is easier to read/write when working with both languages.

What changes are included in this PR?

Make vector validation more consistent with Array::Validate() in C++:

  • Add validate() and validateFull() instance methods to vectors.
  • Validate that VarCharVector and LargeVarCharVector contents are valid UTF-8.
  • Validate that DecimalVector and Decimal256Vector contents fit within the supplied precision and scale.
  • Validate that NullVectors contain only nulls.
  • Validate that FixedSizeBinaryVector values have the correct length.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@jduo
Copy link
Member Author

jduo commented Sep 28, 2023

The NullVector and FixedSizeBinaryVector checks may not really be valuable. It doesn't look like it's possible to get these vectors in a state where these checks can fail.

@github-actions github-actions bot added the awaiting review Awaiting review label Sep 28, 2023
Comment on lines 100 to 101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldThrow is a bit of an awkward API; it seems like there's not that much logic here, and we could just duplicate it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added checkPrecisionAndScaleNoThrow() to replace this overload.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Sep 28, 2023
Make vector validation more consistent with Array::Validate() in C++:
* Add validate() and validateFull() instance methods to vectors.
* Validate that VarCharVector and LargeVarCharVector contents are
  valid UTF-8.
* Validate that DecimalVector and Decimal256Vector contents fit
  within the supplied precision and scale.
* Validate that NullVectors contain only nulls.
* Validate that FixedSizeBinaryVector values have the correct
  length.
@jduo jduo force-pushed the 37702-java-validation branch from 16ed64e to 055924d Compare September 28, 2023 20:37
@lidavidm lidavidm merged commit a004102 into apache:main Sep 29, 2023
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Sep 29, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit a004102.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

@jduo jduo deleted the 37702-java-validation branch October 23, 2023 15:24
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…che#37942)

### Rationale for this change
Make vector validation code more consistent with C++. Add missing checks and have the entry point
be the same so that the code is easier to read/write when working with both languages.

### What changes are included in this PR?
Make vector validation more consistent with Array::Validate() in C++:
* Add validate() and validateFull() instance methods to vectors.
* Validate that VarCharVector and LargeVarCharVector contents are valid UTF-8.
* Validate that DecimalVector and Decimal256Vector contents fit within the supplied precision and scale.
* Validate that NullVectors contain only nulls.
* Validate that FixedSizeBinaryVector values have the correct length.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#37702

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…che#37942)

### Rationale for this change
Make vector validation code more consistent with C++. Add missing checks and have the entry point
be the same so that the code is easier to read/write when working with both languages.

### What changes are included in this PR?
Make vector validation more consistent with Array::Validate() in C++:
* Add validate() and validateFull() instance methods to vectors.
* Validate that VarCharVector and LargeVarCharVector contents are valid UTF-8.
* Validate that DecimalVector and Decimal256Vector contents fit within the supplied precision and scale.
* Validate that NullVectors contain only nulls.
* Validate that FixedSizeBinaryVector values have the correct length.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#37702

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…che#37942)

### Rationale for this change
Make vector validation code more consistent with C++. Add missing checks and have the entry point
be the same so that the code is easier to read/write when working with both languages.

### What changes are included in this PR?
Make vector validation more consistent with Array::Validate() in C++:
* Add validate() and validateFull() instance methods to vectors.
* Validate that VarCharVector and LargeVarCharVector contents are valid UTF-8.
* Validate that DecimalVector and Decimal256Vector contents fit within the supplied precision and scale.
* Validate that NullVectors contain only nulls.
* Validate that FixedSizeBinaryVector values have the correct length.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#37702

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java] Add validation functionality

2 participants