Commit 5c20cef
Gabor Szadovszky
PARQUET-1217: Incorrect handling of missing values in Statistics
In parquet-format every value in Statistics is optional while parquet-mr does not properly handle these scenarios:
- null_count is set but min/max or min_value/max_value are not: filtering may fail with NPE or incorrect filtering occurs
fix: check if min/max is set before comparing to the related values
- null_count is not set: filtering handles null_count as if it would be 0 -> incorrect filtering may occur
fix: introduce new method in Statistics object to check if num_nulls is set; check if num_nulls is set by the new method before using its value for filtering
Author: Gabor Szadovszky <gabor.szadovszky@cloudera.com>
Closes #458 from gszadovszky/PARQUET-1217 and squashes the following commits:
9d14090 [Gabor Szadovszky] Updates according to rdblue's comments
116d1d3 [Gabor Szadovszky] PARQUET-1217: Updates according to zi's comments
c264b50 [Gabor Szadovszky] PARQUET-1217: fix handling of unset nullCount
2ec2fb1 [Gabor Szadovszky] PARQUET-1217: Incorrect handling of missing values in Statistics
This change is based on b82d962 but is not a clean cherry-pick.1 parent d59b32a commit 5c20cef
File tree
8 files changed
+240
-42
lines changed- parquet-column/src
- main/java/org/apache/parquet/column/statistics
- test/java/org/apache/parquet/column/statistics
- parquet-hadoop/src
- main/java/org/apache/parquet
- filter2/statisticslevel
- format/converter
- test/java/org/apache/parquet
- filter2/statisticslevel
- format/converter
- hadoop
8 files changed
+240
-42
lines changedLines changed: 65 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
34 | 72 | | |
35 | 73 | | |
36 | 74 | | |
| |||
67 | 105 | | |
68 | 106 | | |
69 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
70 | 119 | | |
71 | 120 | | |
72 | 121 | | |
| |||
172 | 221 | | |
173 | 222 | | |
174 | 223 | | |
| 224 | + | |
175 | 225 | | |
| 226 | + | |
176 | 227 | | |
177 | 228 | | |
178 | 229 | | |
| |||
221 | 272 | | |
222 | 273 | | |
223 | 274 | | |
224 | | - | |
| 275 | + | |
225 | 276 | | |
226 | 277 | | |
227 | 278 | | |
228 | 279 | | |
229 | 280 | | |
230 | 281 | | |
231 | 282 | | |
232 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
233 | 287 | | |
| 288 | + | |
234 | 289 | | |
235 | 290 | | |
236 | 291 | | |
| |||
241 | 296 | | |
242 | 297 | | |
243 | 298 | | |
244 | | - | |
| 299 | + | |
245 | 300 | | |
246 | 301 | | |
247 | 302 | | |
| |||
251 | 306 | | |
252 | 307 | | |
253 | 308 | | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
254 | 316 | | |
255 | 317 | | |
256 | 318 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| |||
Lines changed: 40 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
44 | 43 | | |
45 | 44 | | |
46 | 45 | | |
| |||
122 | 121 | | |
123 | 122 | | |
124 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
125 | 128 | | |
126 | 129 | | |
127 | 130 | | |
| |||
133 | 136 | | |
134 | 137 | | |
135 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
136 | 144 | | |
137 | 145 | | |
138 | 146 | | |
| |||
166 | 174 | | |
167 | 175 | | |
168 | 176 | | |
169 | | - | |
| 177 | + | |
170 | 178 | | |
171 | 179 | | |
172 | 180 | | |
173 | 181 | | |
174 | 182 | | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
175 | 188 | | |
176 | 189 | | |
177 | 190 | | |
| |||
201 | 214 | | |
202 | 215 | | |
203 | 216 | | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
204 | 222 | | |
205 | 223 | | |
206 | 224 | | |
| |||
232 | 250 | | |
233 | 251 | | |
234 | 252 | | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
235 | 258 | | |
236 | 259 | | |
237 | 260 | | |
| |||
263 | 286 | | |
264 | 287 | | |
265 | 288 | | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
266 | 294 | | |
267 | 295 | | |
268 | 296 | | |
| |||
294 | 322 | | |
295 | 323 | | |
296 | 324 | | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
297 | 330 | | |
298 | 331 | | |
299 | 332 | | |
| |||
355 | 388 | | |
356 | 389 | | |
357 | 390 | | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
358 | 396 | | |
359 | 397 | | |
360 | 398 | | |
| |||
Lines changed: 8 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
340 | | - | |
| 340 | + | |
| 341 | + | |
341 | 342 | | |
342 | 343 | | |
343 | 344 | | |
| |||
347 | 348 | | |
348 | 349 | | |
349 | 350 | | |
350 | | - | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
351 | 356 | | |
352 | | - | |
353 | 357 | | |
354 | | - | |
| 358 | + | |
355 | 359 | | |
356 | 360 | | |
357 | 361 | | |
| |||
0 commit comments