Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI check for full validation mode #1546

Merged
merged 10 commits into from
Apr 14, 2022
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 12, 2022

Which issue does this PR close?

Closes #1544
Closes #1239

Rationale for this change

We have a full validation mode that validates the full contents of creating Arrays but it is (on purpose) not called in many places in arrow for performance reasons: https://docs.rs/arrow/11.1.0/arrow/array/struct.ArrayData.html#method.validate_full

This leads to two possible issues:

  1. Lack of test coverage of arrays constructed within arrow
  2. Possible Lack of test coverage of the validation routine itself

What changes are included in this PR?

  1. Add a force_validate feature flag that forced the validation check for all array creations (defaults to off)
  2. Add a new CI check that runs with this flag on (see example failure)

Found / filed new tickets or existing tickets:

Are there any user-facing changes?

New optional feature force_validate


// Provide a force_validate mode
#[cfg(feature = "force_validate")]
new_self.validate_full().unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the validation call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb marked this pull request as ready for review April 12, 2022 14:43
@alamb alamb requested a review from tustvold April 12, 2022 14:43
@codecov-commenter
Copy link

Codecov Report

Merging #1546 (19dc05a) into master (68038f5) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

❗ Current head 19dc05a differs from pull request most recent head 58e3c28. Consider uploading reports for the commit 58e3c28 to get more accurate results

@@            Coverage Diff             @@
##           master    #1546      +/-   ##
==========================================
- Coverage   82.82%   82.82%   -0.01%     
==========================================
  Files         190      190              
  Lines       54941    54943       +2     
==========================================
- Hits        45507    45506       -1     
- Misses       9434     9437       +3     
Impacted Files Coverage Δ
arrow/src/array/array_binary.rs 92.93% <ø> (ø)
arrow/src/array/array_boolean.rs 93.18% <ø> (ø)
arrow/src/array/array_list.rs 95.52% <ø> (ø)
arrow/src/array/array_primitive.rs 94.85% <ø> (ø)
arrow/src/array/data.rs 82.98% <0.00%> (-0.15%) ⬇️
arrow/src/datatypes/datatype.rs 66.40% <0.00%> (-0.40%) ⬇️
arrow/src/datatypes/field.rs 53.79% <0.00%> (-0.31%) ⬇️
parquet/src/encodings/encoding.rs 93.56% <0.00%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68038f5...58e3c28. Read the comment docs.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this also closes #1239

Edit: Once we bash out the failing tests, we could look to enable this in the DataFusion CI as well for even more coverage 😄


// Provide a force_validate mode
#[cfg(feature = "force_validate")]
new_self.validate_full().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -1692,6 +1692,9 @@ mod tests {
}

#[test]
// Fails when validation enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least we know there is a problem. So it is progress 🤷 we'll get it fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to fix it at #1567

@alamb
Copy link
Contributor Author

alamb commented Apr 14, 2022

I will try and polish this up shortly

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better now 😄

@alamb alamb merged commit 3fed612 into apache:master Apr 14, 2022
@alamb alamb deleted the alamb/test_validation branch April 14, 2022 17:35
@alamb alamb added the development-process Related to development process of arrow-rs label Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate development-process Related to development process of arrow-rs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CI check for full validation mode Add test_validate feature flag + CI check
4 participants