Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix undefined behavor in GenericStringArray::from_iter_values #1145

Merged
merged 4 commits into from
Jan 10, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 8, 2022

Which issue does this PR close?

Closes #1144

Rationale for this change

If an iterator produces more values than its declared upper bound, it can result in undefined behavior

What changes are included in this PR?

Fix the bug by using the correct length

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 8, 2022
@alamb alamb force-pushed the alamb/fix_undefined branch from 38a0af9 to db22383 Compare January 8, 2022 12:16
@alamb alamb changed the title Alamb/fix undefined Fix undefined behavor in GenericStringArray::from_iter_values Jan 8, 2022
@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2022

I need to port my tests for this but I think the code is fine. I ran out of time today to keep working on this

@codecov-commenter
Copy link

codecov-commenter commented Jan 8, 2022

Codecov Report

Merging #1145 (13d6bac) into master (719096b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1145   +/-   ##
=======================================
  Coverage   82.55%   82.56%           
=======================================
  Files         169      169           
  Lines       50456    50497   +41     
=======================================
+ Hits        41655    41693   +38     
- Misses       8801     8804    +3     
Impacted Files Coverage Δ
arrow/src/array/array_string.rs 97.40% <100.00%> (+0.32%) ⬆️
arrow/src/util/test_util.rs 92.04% <100.00%> (+1.13%) ⬆️
arrow/src/datatypes/field.rs 53.79% <0.00%> (-0.31%) ⬇️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (-0.23%) ⬇️
arrow/src/array/transform/mod.rs 85.56% <0.00%> (-0.14%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 719096b...13d6bac. Read the comment docs.


// iterator size hint may not be correct so compute the actual number of offsets
let actual_len = match offsets.len() {
0 => 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always push at least one offset (zero), so this match should not be needed. Looks good otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jhorstmann -- I'll change this to assert len > 0 and try and polish up this PR with tests tomorrow

@@ -187,8 +187,13 @@ impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
offsets.push(length_so_far);
values.extend_from_slice(s.as_bytes());
}

// iterator size hint may not be correct so compute the actual number of offsets
assert!(!offsets.is_empty()); // wrote at least one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the fix -- the rest of the PR is tests

@alamb alamb merged commit 35b04ca into apache:master Jan 10, 2022
@alamb alamb deleted the alamb/fix_undefined branch January 10, 2022 21:47
@alamb alamb added the security label Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undefined behavior for GenericStringArray::from_iter_values if reported iterator upper bound is incorrect
4 participants