Commit 7f8e952
PARQUET-642: Improve performance of ByteBuffer based read / write paths
While trying out the newest Parquet version, we noticed that the changes to start using ByteBuffers: 6b605a4 and 6b24a1d (mostly avro but a couple of ByteBuffer changes) caused our jobs to slow down a bit.
Read overhead: 4-6% (in MB_Millis)
Write overhead: 6-10% (MB_Millis).
Seems like this seems to be due to the encoding / decoding of Strings in the [Binary class](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/io/api/Binary.java):
[toStringUsingUTF8()](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/io/api/Binary.java#L388) - for reads
[encodeUTF8()](https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/io/api/Binary.java#L236) - for writes
With these changes we see around 5% improvement in MB_Millis while running the job on our Hadoop cluster.
Added some microbenchmark details to the jira.
Note that I've left the behavior the same for the avro write path - it still uses CharSequence and the Charset based encoders.
Author: Piyush Narang <pnarang@twitter.com>
Closes #347 from piyushnarang/bytebuffer-encoding-fix-pr and squashes the following commits:
43c5bdd [Piyush Narang] Keep avro on char sequence
2d50c8c [Piyush Narang] Update Binary approach
9e58237 [Piyush Narang] Proof of concept fixes1 parent 9c40a7b commit 7f8e952
File tree
2 files changed
+53
-22
lines changed- parquet-avro/src/main/java/org/apache/parquet/avro
- parquet-column/src/main/java/org/apache/parquet/io/api
2 files changed
+53
-22
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
364 | 364 | | |
365 | 365 | | |
366 | 366 | | |
367 | | - | |
| 367 | + | |
368 | 368 | | |
369 | 369 | | |
370 | 370 | | |
| |||
Lines changed: 52 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
214 | 215 | | |
215 | 216 | | |
216 | 217 | | |
217 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
218 | 240 | | |
219 | 241 | | |
220 | 242 | | |
| |||
226 | 248 | | |
227 | 249 | | |
228 | 250 | | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
235 | 257 | | |
236 | 258 | | |
237 | 259 | | |
| |||
386 | 408 | | |
387 | 409 | | |
388 | 410 | | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
395 | | - | |
396 | | - | |
397 | | - | |
398 | | - | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
399 | 431 | | |
400 | 432 | | |
401 | 433 | | |
| |||
555 | 587 | | |
556 | 588 | | |
557 | 589 | | |
558 | | - | |
559 | | - | |
| 590 | + | |
560 | 591 | | |
561 | 592 | | |
562 | | - | |
563 | | - | |
| 593 | + | |
| 594 | + | |
564 | 595 | | |
565 | 596 | | |
566 | 597 | | |
| |||
0 commit comments