Commit 77ca6dc
fix: bug when struct nullability determined from
# Which issue does this PR close?
- Closes #8404
# Rationale for this change
A regression was reported in issue #8404 which was introduced in
#7585. This PR resolves the
issue.
# What changes are included in this PR?
The root cause of the issue was that the behaviour of
`ByteArrayDictionaryReader` is to return a new empty length array of
values if the record reader has already been consumed. The problem was
that the repetition and definition level buffers were not being advanced
in this early return case.
https://github.com/apache/arrow-rs/blob/521f219e308613811aeae11300bf7a7b0fb5ec29/parquet/src/arrow/array_reader/byte_array_dictionary.rs#L167-L183
The `StructArrayReader` reads the repetition and definition levels from
the first child to determine the nullability of the struct array. When
we returned the empty values buffer for the child, without advancing the
repetition and definition buffers, the `StructArrayReader` a length
mismatch between the empty child array and the non-empty nullability
bitmask, and this produces the error.
https://github.com/apache/arrow-rs/blob/521f219e308613811aeae11300bf7a7b0fb5ec29/parquet/src/arrow/array_reader/struct_array.rs#L137-L170
The fix is simple, always have `ByteArrayDictionaryReader` advance the
repetition and definition level buffers when `consume_next_batch` is
called.
# Are these changes tested?
Yes, a new unit test was added
`test_read_nullable_structs_with_binary_dict_as_first_child_column`,
which before the changes introduced in this PR would replicate the
issue.
# Are there any user-facing changes?
No
---------
Co-authored-by: Ed Seidl <etseidl@live.com>
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>Dict<_, ByteArray>> column (#8573)1 parent 615a144 commit 77ca6dc
File tree
2 files changed
+62
-4
lines changed- parquet/src/arrow
- array_reader
- arrow_reader
2 files changed
+62
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
168 | 172 | | |
169 | 173 | | |
170 | 174 | | |
| |||
175 | 179 | | |
176 | 180 | | |
177 | 181 | | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | 182 | | |
182 | 183 | | |
183 | 184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1160 | 1160 | | |
1161 | 1161 | | |
1162 | 1162 | | |
1163 | | - | |
| 1163 | + | |
1164 | 1164 | | |
1165 | 1165 | | |
1166 | 1166 | | |
| |||
2360 | 2360 | | |
2361 | 2361 | | |
2362 | 2362 | | |
| 2363 | + | |
| 2364 | + | |
| 2365 | + | |
| 2366 | + | |
| 2367 | + | |
| 2368 | + | |
| 2369 | + | |
| 2370 | + | |
| 2371 | + | |
| 2372 | + | |
| 2373 | + | |
| 2374 | + | |
| 2375 | + | |
| 2376 | + | |
| 2377 | + | |
| 2378 | + | |
| 2379 | + | |
| 2380 | + | |
| 2381 | + | |
| 2382 | + | |
| 2383 | + | |
| 2384 | + | |
| 2385 | + | |
| 2386 | + | |
| 2387 | + | |
| 2388 | + | |
| 2389 | + | |
| 2390 | + | |
| 2391 | + | |
| 2392 | + | |
| 2393 | + | |
| 2394 | + | |
| 2395 | + | |
| 2396 | + | |
| 2397 | + | |
| 2398 | + | |
| 2399 | + | |
| 2400 | + | |
| 2401 | + | |
| 2402 | + | |
| 2403 | + | |
| 2404 | + | |
| 2405 | + | |
| 2406 | + | |
| 2407 | + | |
| 2408 | + | |
| 2409 | + | |
| 2410 | + | |
| 2411 | + | |
| 2412 | + | |
| 2413 | + | |
| 2414 | + | |
| 2415 | + | |
| 2416 | + | |
| 2417 | + | |
| 2418 | + | |
| 2419 | + | |
2363 | 2420 | | |
2364 | 2421 | | |
2365 | 2422 | | |
| |||
0 commit comments