Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Reduced reallocations when reading from IPC (~12%) #1105

Merged
merged 11 commits into from
Jun 27, 2022

Conversation

ritchie46
Copy link
Collaborator

We used data.clear() which sets the length to 0 and then use data.resize(len, 0), so that every bytes was overwritten with zero before reading.

However, we can simply keep the vec at the maximum length and write to a slice, growing if needed. The bytes that are already initialized don't have to be reset because we will overwrite them in the subsequent read operation.

I want to add the same for parquet in a following PR.

@codecov
Copy link

codecov bot commented Jun 25, 2022

Codecov Report

Merging #1105 (3fe8248) into main (b679b06) will increase coverage by 0.00%.
The diff coverage is 90.28%.

@@           Coverage Diff            @@
##             main    #1105    +/-   ##
========================================
  Coverage   81.33%   81.34%            
========================================
  Files         366      367     +1     
  Lines       35337    35458   +121     
========================================
+ Hits        28742    28842   +100     
- Misses       6595     6616    +21     
Impacted Files Coverage Δ
src/io/ipc/read/file_async.rs 60.75% <53.84%> (+0.07%) ⬆️
src/io/readbuf.rs 84.00% <84.00%> (ø)
src/io/ipc/read/stream_async.rs 77.44% <92.30%> (+0.09%) ⬆️
src/io/flight/mod.rs 67.00% <100.00%> (+0.33%) ⬆️
src/io/ipc/append/mod.rs 91.11% <100.00%> (+0.20%) ⬆️
src/io/ipc/read/array/binary.rs 92.30% <100.00%> (+0.37%) ⬆️
src/io/ipc/read/array/dictionary.rs 86.66% <100.00%> (+0.62%) ⬆️
src/io/ipc/read/array/fixed_size_binary.rs 86.79% <100.00%> (+0.51%) ⬆️
src/io/ipc/read/array/fixed_size_list.rs 87.27% <100.00%> (+0.48%) ⬆️
src/io/ipc/read/array/list.rs 77.02% <100.00%> (+0.97%) ⬆️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b679b06...3fe8248. Read the comment docs.

@ritchie46
Copy link
Collaborator Author

The failed test doesn't seem to be related.

@jorgecarleitao
Copy link
Owner

Thanks, this is awesome!

On my github it shows a diff that seems related to a missing rebase?

@ritchie46 ritchie46 marked this pull request as draft June 26, 2022 07:08
@ritchie46
Copy link
Collaborator Author

Will have to change a few functions to that we make can know the tot read length. Before this was the Vec's length. With this change, the length of the Vec can be longer than the read length. I will ping when ready.

@ritchie46 ritchie46 marked this pull request as ready for review June 26, 2022 13:21
@ritchie46 ritchie46 marked this pull request as draft June 26, 2022 14:14
@ritchie46 ritchie46 marked this pull request as ready for review June 27, 2022 07:35
Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it.

Left some minor comments, but otherwise ready to go.

Any benches worth mentioning?

src/io/ipc/read/common.rs Outdated Show resolved Hide resolved
src/io/ipc/read/common.rs Outdated Show resolved Hide resolved
src/io/ipc/read/common.rs Outdated Show resolved Hide resolved
src/io/ipc/read/common.rs Outdated Show resolved Hide resolved
src/io/ipc/read/deserialize.rs Outdated Show resolved Hide resolved
src/io/ipc/read/file_async.rs Outdated Show resolved Hide resolved
ritchie46 and others added 5 commits June 27, 2022 10:27
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
// exponential growing strategy
// benchmark showed it was ~5% faster
// in reading lz4 yellow-trip dataset
self.data = vec![0; length * 2];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exponential growing strategy can prevent expensive reallocation. This was 5% faster on the whole file read.

@ritchie46 ritchie46 changed the title IPC: don't reassign all bytes before overwriting them IPC: don't reassign all bytes before overwriting them 12% performance gain Jun 27, 2022
@ritchie46 ritchie46 changed the title IPC: don't reassign all bytes before overwriting them 12% performance gain IPC: don't reassign all bytes before overwriting them ~12% performance gain Jun 27, 2022
@ritchie46
Copy link
Collaborator Author

ritchie46 commented Jun 27, 2022

I love it.

Left some minor comments, but otherwise ready to go.

Any benches worth mentioning?

Yes! This is reading lz4 compressed yellow trip taxi dataset.

main

 Performance counter stats for './target/release/memcheck':

          3.009,99 msec task-clock                #    1,000 CPUs utilized          
                15      context-switches          #    4,983 /sec                   
                 0      cpu-migrations            #    0,000 /sec                   
           957.156      page-faults               #  317,993 K/sec                  
    10.602.766.501      cycles                    #    3,523 GHz                    
    16.565.436.781      instructions              #    1,56  insn per cycle         
     2.804.597.000      branches                  #  931,764 M/sec                  
        38.538.003      branch-misses             #    1,37% of all branches        

       3,010826723 seconds time elapsed

       1,885070000 seconds user
       1,124638000 seconds sys

linear (or no?) growing strategy

 
 Performance counter stats for './target/release/memcheck':

          2.824,92 msec task-clock                #    1,000 CPUs utilized          
                 8      context-switches          #    2,832 /sec                   
                 0      cpu-migrations            #    0,000 /sec                   
           820.540      page-faults               #  290,465 K/sec                  
    10.279.655.892      cycles                    #    3,639 GHz                    
    16.038.558.971      instructions              #    1,56  insn per cycle         
     2.714.267.088      branches                  #  960,830 M/sec                  
        38.377.349      branch-misses             #    1,41% of all branches        

       2,825607071 seconds time elapsed

       1,824543000 seconds user
       1,000298000 seconds sys

Exponential growing strategy (same as vec)

  Performance counter stats for './target/release/memcheck':

          2.678,32 msec task-clock                #    0,996 CPUs utilized          
                51      context-switches          #   19,042 /sec                   
                 3      cpu-migrations            #    1,120 /sec                   
           711.114      page-faults               #  265,507 K/sec                  
     9.706.936.972      cycles                    #    3,624 GHz                    
    15.635.366.640      instructions              #    1,61  insn per cycle         
     2.645.151.012      branches                  #  987,616 M/sec                  
        38.408.838      branch-misses             #    1,45% of all branches        

       2,687799246 seconds time elapsed

       1,835066000 seconds user
       0,843570000 seconds sys

@ritchie46
Copy link
Collaborator Author

I will clean this up later today and let you know when its ready.

@ritchie46
Copy link
Collaborator Author

Good to go on my side. 👍

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Jun 27, 2022
@jorgecarleitao jorgecarleitao changed the title IPC: don't reassign all bytes before overwriting them ~12% performance gain Reduced reallocations when reading from IPC (~12%) Jun 27, 2022
@jorgecarleitao jorgecarleitao merged commit 09817a4 into jorgecarleitao:main Jun 27, 2022
@ritchie46 ritchie46 deleted the improve_ipc branch June 27, 2022 11:48
@joshuataylor
Copy link
Contributor

joshuataylor commented Jun 28, 2022

This is amazing, when using latest arrow2 branch against Polars, with lazy IPC Stream against 241 Streaming Arrow files totalling 1gb dataset dropped over 10% to scan! 🎉

edit: Further testing and benchmarking shows a 27% reduction 😮

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants