Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added cow APIs (2x-10x vs non-cow) #1061

Merged
merged 2 commits into from
Jun 10, 2022
Merged

Added cow APIs (2x-10x vs non-cow) #1061

merged 2 commits into from
Jun 10, 2022

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Jun 9, 2022

This PR is the culmination of a number of great ideas from @ritchie46, @sundy-li, @houqp, and myself over the last 8 months or so. Thanks a lot everyone for all the fun and learnings so far :)

It contains a set of APIs over Buffer, Bitmap, MutableBitmap, PrimitiveArray and BooleanArray that enable operations in-place (i.e. without allocations).

In summary, this PR adds support for clone-on-write over BooleanArray and PrimitiveArray, both on values and validity, allowing composite expressions evaluation to take at most 1 allocation while preserving their SIMD nature.

Benches

Apply a transformation to the values of a f32 array vs returning a new array:

apply_mul 2^20          time:   [96.842 us 97.118 us 97.409 us]                           
mul 2^20                time:   [183.67 us 184.35 us 185.01 us]

Apply a transformation to the values of a boolean array vs creating a new array:

mutablebitmap not 2^20  time:   [5.0870 us 5.1058 us 5.1258 us]
bitmap not 2^20         time:   [22.902 us 22.970 us 23.046 us]

mutablebitmap and 2^20  time:   [5.7941 us 5.8125 us 5.8294 us]
bitmap and 2^20         time:   [56.047 us 56.255 us 56.493 us]

#1042 will make this easier to use when the trait object Array is used. This PR motivates that change.

@jorgecarleitao jorgecarleitao added the feature A new feature label Jun 9, 2022
@codecov
Copy link

codecov bot commented Jun 9, 2022

Codecov Report

Merging #1061 (9e656b0) into main (6608071) will increase coverage by 0.07%.
The diff coverage is 81.90%.

@@            Coverage Diff             @@
##             main    #1061      +/-   ##
==========================================
+ Coverage   81.32%   81.40%   +0.07%     
==========================================
  Files         363      365       +2     
  Lines       34649    34968     +319     
==========================================
+ Hits        28179    28466     +287     
- Misses       6470     6502      +32     
Impacted Files Coverage Δ
src/bitmap/utils/mod.rs 100.00% <ø> (ø)
src/types/bit_chunk.rs 100.00% <ø> (ø)
src/bitmap/immutable.rs 81.81% <66.66%> (-0.62%) ⬇️
src/bitmap/utils/chunks_exact_mut.rs 75.00% <75.00%> (ø)
src/buffer/immutable.rs 96.00% <80.00%> (-1.15%) ⬇️
src/bitmap/assign_ops.rs 81.45% <81.45%> (ø)
src/array/boolean/mod.rs 87.91% <88.88%> (+0.13%) ⬆️
src/array/primitive/mod.rs 84.39% <93.33%> (+0.70%) ⬆️
src/bitmap/mutable.rs 97.51% <100.00%> (+0.01%) ⬆️
src/io/parquet/write/row_group.rs 92.85% <0.00%> (-2.46%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6608071...9e656b0. Read the comment docs.

@jorgecarleitao jorgecarleitao changed the title Added cow APIs (2x-10x on arithmetics and boolean logic vs non-cow) Added cow APIs (2x-10x vs non-cow) Jun 9, 2022
@ritchie46
Copy link
Collaborator

That's really promising! So that gives some room for the memcpy slowdown.

@sundy-li
Copy link
Collaborator

Impressive!

@jorgecarleitao jorgecarleitao merged commit 4e1dc00 into main Jun 10, 2022
@jorgecarleitao jorgecarleitao deleted the arithemtics_assign branch June 10, 2022 15:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants