Reduced reallocations when reading from IPC (`~12%`) #1105

ritchie46 · 2022-06-25T12:55:13Z

We used data.clear() which sets the length to 0 and then use data.resize(len, 0), so that every bytes was overwritten with zero before reading.

However, we can simply keep the vec at the maximum length and write to a slice, growing if needed. The bytes that are already initialized don't have to be reset because we will overwrite them in the subsequent read operation.

I want to add the same for parquet in a following PR.

codecov · 2022-06-25T13:00:00Z

Codecov Report

Merging #1105 (3fe8248) into main (b679b06) will increase coverage by 0.00%.
The diff coverage is 90.28%.

@@           Coverage Diff            @@
##             main    #1105    +/-   ##
========================================
  Coverage   81.33%   81.34%            
========================================
  Files         366      367     +1     
  Lines       35337    35458   +121     
========================================
+ Hits        28742    28842   +100     
- Misses       6595     6616    +21

Impacted Files	Coverage Δ
src/io/ipc/read/file_async.rs	`60.75% <53.84%> (+0.07%)`	⬆️
src/io/readbuf.rs	`84.00% <84.00%> (ø)`
src/io/ipc/read/stream_async.rs	`77.44% <92.30%> (+0.09%)`	⬆️
src/io/flight/mod.rs	`67.00% <100.00%> (+0.33%)`	⬆️
src/io/ipc/append/mod.rs	`91.11% <100.00%> (+0.20%)`	⬆️
src/io/ipc/read/array/binary.rs	`92.30% <100.00%> (+0.37%)`	⬆️
src/io/ipc/read/array/dictionary.rs	`86.66% <100.00%> (+0.62%)`	⬆️
src/io/ipc/read/array/fixed_size_binary.rs	`86.79% <100.00%> (+0.51%)`	⬆️
src/io/ipc/read/array/fixed_size_list.rs	`87.27% <100.00%> (+0.48%)`	⬆️
src/io/ipc/read/array/list.rs	`77.02% <100.00%> (+0.97%)`	⬆️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b679b06...3fe8248. Read the comment docs.

ritchie46 · 2022-06-25T13:46:42Z

The failed test doesn't seem to be related.

jorgecarleitao · 2022-06-25T21:15:29Z

Thanks, this is awesome!

On my github it shows a diff that seems related to a missing rebase?

ritchie46 · 2022-06-26T07:09:48Z

Will have to change a few functions to that we make can know the tot read length. Before this was the Vec's length. With this change, the length of the Vec can be longer than the read length. I will ping when ready.

src/io/ipc/read/reader.rs

examples/ipc_file_read.rs

src/io/ipc/read/read_basic.rs

jorgecarleitao

I love it.

Left some minor comments, but otherwise ready to go.

Any benches worth mentioning?

src/io/ipc/read/common.rs

src/io/ipc/read/deserialize.rs

src/io/ipc/read/file_async.rs

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>

…e_ipc

ritchie46 · 2022-06-27T08:44:24Z

src/io/ipc/read/common.rs

+            // exponential growing strategy
+            // benchmark showed it was ~5% faster
+            // in reading lz4 yellow-trip dataset
+            self.data = vec![0; length * 2];


Exponential growing strategy can prevent expensive reallocation. This was 5% faster on the whole file read.

ritchie46 · 2022-06-27T08:47:04Z

I love it.

Left some minor comments, but otherwise ready to go.

Any benches worth mentioning?

Yes! This is reading lz4 compressed yellow trip taxi dataset.

main

 Performance counter stats for './target/release/memcheck':

          3.009,99 msec task-clock                #    1,000 CPUs utilized          
                15      context-switches          #    4,983 /sec                   
                 0      cpu-migrations            #    0,000 /sec                   
           957.156      page-faults               #  317,993 K/sec                  
    10.602.766.501      cycles                    #    3,523 GHz                    
    16.565.436.781      instructions              #    1,56  insn per cycle         
     2.804.597.000      branches                  #  931,764 M/sec                  
        38.538.003      branch-misses             #    1,37% of all branches        

       3,010826723 seconds time elapsed

       1,885070000 seconds user
       1,124638000 seconds sys

linear (or no?) growing strategy

 
 Performance counter stats for './target/release/memcheck':

          2.824,92 msec task-clock                #    1,000 CPUs utilized          
                 8      context-switches          #    2,832 /sec                   
                 0      cpu-migrations            #    0,000 /sec                   
           820.540      page-faults               #  290,465 K/sec                  
    10.279.655.892      cycles                    #    3,639 GHz                    
    16.038.558.971      instructions              #    1,56  insn per cycle         
     2.714.267.088      branches                  #  960,830 M/sec                  
        38.377.349      branch-misses             #    1,41% of all branches        

       2,825607071 seconds time elapsed

       1,824543000 seconds user
       1,000298000 seconds sys

Exponential growing strategy (same as vec)

  Performance counter stats for './target/release/memcheck':

          2.678,32 msec task-clock                #    0,996 CPUs utilized          
                51      context-switches          #   19,042 /sec                   
                 3      cpu-migrations            #    1,120 /sec                   
           711.114      page-faults               #  265,507 K/sec                  
     9.706.936.972      cycles                    #    3,624 GHz                    
    15.635.366.640      instructions              #    1,61  insn per cycle         
     2.645.151.012      branches                  #  987,616 M/sec                  
        38.408.838      branch-misses             #    1,45% of all branches        

       2,687799246 seconds time elapsed

       1,835066000 seconds user
       0,843570000 seconds sys

ritchie46 · 2022-06-27T08:48:05Z

I will clean this up later today and let you know when its ready.

ritchie46 · 2022-06-27T10:19:48Z

Good to go on my side. 👍

joshuataylor · 2022-06-28T04:23:50Z

This is amazing, when using latest arrow2 branch against Polars, with lazy IPC Stream against 241 Streaming Arrow files totalling 1gb dataset dropped over 10% to scan! 🎉

edit: Further testing and benchmarking shows a 27% reduction 😮

ritchie46 force-pushed the improve_ipc branch from 17bac3c to 3aeecbb Compare June 25, 2022 13:02

ritchie46 marked this pull request as draft June 26, 2022 07:08

jorgecarleitao reviewed Jun 26, 2022

View reviewed changes

src/io/ipc/read/reader.rs Outdated Show resolved Hide resolved

ritchie46 marked this pull request as ready for review June 26, 2022 13:21

ritchie46 added 2 commits June 26, 2022 15:21

IPC: don't reassign all bytes before overwriting them

653956e

use ReadBuffer abstraction

ede4df3

ritchie46 force-pushed the improve_ipc branch from 654dd39 to ede4df3 Compare June 26, 2022 13:23

ritchie46 commented Jun 26, 2022

View reviewed changes

src/io/ipc/read/reader.rs Show resolved Hide resolved

jorgecarleitao reviewed Jun 26, 2022

View reviewed changes

examples/ipc_file_read.rs Show resolved Hide resolved

ritchie46 marked this pull request as draft June 26, 2022 14:14

reuse scratch when reading dictionaries

53d1b54

ritchie46 force-pushed the improve_ipc branch from 562435e to 53d1b54 Compare June 27, 2022 07:35

ritchie46 marked this pull request as ready for review June 27, 2022 07:35

ritchie46 commented Jun 27, 2022

View reviewed changes

src/io/ipc/read/read_basic.rs Show resolved Hide resolved

ritchie46 force-pushed the improve_ipc branch from ecb07f2 to 081a69b Compare June 27, 2022 08:20

jorgecarleitao approved these changes Jun 27, 2022

View reviewed changes

make sure that scratch is used in decompressing data

5bed494

ritchie46 force-pushed the improve_ipc branch from 081a69b to 5bed494 Compare June 27, 2022 08:23

ritchie46 and others added 5 commits June 27, 2022 10:27

Commit suggestion src/io/ipc/read/file_async.rs

311eee8

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>

Commit suggestion src/io/ipc/read/common.rs

0c5972a

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>

Commit suggestion src/io/ipc/read/common.rs

33e91e7

Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>

use exponential growing strategy

023273e

Merge branch 'improve_ipc' of github.com:ritchie46/arrow2 into improv…

7e61b32

…e_ipc

ritchie46 commented Jun 27, 2022

View reviewed changes

ritchie46 changed the title ~~IPC: don't reassign all bytes before overwriting them~~ IPC: don't reassign all bytes before overwriting them 12% performance gain Jun 27, 2022

ritchie46 changed the title ~~IPC: don't reassign all bytes before overwriting them 12% performance gain~~ IPC: don't reassign all bytes before overwriting them ~12% performance gain Jun 27, 2022

ritchie46 added 2 commits June 27, 2022 11:29

move ReadBuffer to io, as it is more general than IPC

62a3d80

fix integration tests

3fe8248

jorgecarleitao added the enhancement An improvement to an existing feature label Jun 27, 2022

jorgecarleitao changed the title ~~IPC: don't reassign all bytes before overwriting them ~12% performance gain~~ Reduced reallocations when reading from IPC (~12%) Jun 27, 2022

jorgecarleitao merged commit 09817a4 into jorgecarleitao:main Jun 27, 2022

ritchie46 deleted the improve_ipc branch June 27, 2022 11:48

ritchie46 mentioned this pull request Jun 28, 2022

pl.scan_parquet().head().collect() uses huge ram on 4GB file pola-rs/polars#3818

Closed

jorgecarleitao mentioned this pull request Jun 29, 2022

Reduced allocations of reading bitmaps from IPC #1126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced reallocations when reading from IPC (`~12%`) #1105

Reduced reallocations when reading from IPC (`~12%`) #1105

ritchie46 commented Jun 25, 2022

codecov bot commented Jun 25, 2022 •

edited

Loading

ritchie46 commented Jun 25, 2022

jorgecarleitao commented Jun 25, 2022

ritchie46 commented Jun 26, 2022

jorgecarleitao left a comment

ritchie46 Jun 27, 2022

ritchie46 commented Jun 27, 2022 •

edited

Loading

ritchie46 commented Jun 27, 2022

ritchie46 commented Jun 27, 2022

joshuataylor commented Jun 28, 2022 •

edited

Loading

Reduced reallocations when reading from IPC (~12%) #1105

Reduced reallocations when reading from IPC (~12%) #1105

Conversation

ritchie46 commented Jun 25, 2022

codecov bot commented Jun 25, 2022 • edited Loading

Codecov Report

ritchie46 commented Jun 25, 2022

jorgecarleitao commented Jun 25, 2022

ritchie46 commented Jun 26, 2022

jorgecarleitao left a comment

Choose a reason for hiding this comment

ritchie46 Jun 27, 2022

Choose a reason for hiding this comment

ritchie46 commented Jun 27, 2022 • edited Loading

main

linear (or no?) growing strategy

Exponential growing strategy (same as vec)

ritchie46 commented Jun 27, 2022

ritchie46 commented Jun 27, 2022

joshuataylor commented Jun 28, 2022 • edited Loading

Reduced reallocations when reading from IPC (`~12%`) #1105

Reduced reallocations when reading from IPC (`~12%`) #1105

codecov bot commented Jun 25, 2022 •

edited

Loading

ritchie46 commented Jun 27, 2022 •

edited

Loading

joshuataylor commented Jun 28, 2022 •

edited

Loading