Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #275

Merged
merged 1 commit into from
Aug 12, 2021

Conversation

ritchie46
Copy link
Collaborator

Already make this PR to start the discussion. I implemented the strength reduction algorithm discussed in #259 for division of u64 arrays and the performance increases are almost 10x!

I benchmarked a divsion by 4 (which is a single bit shift) and a divison by a prime number.

Gnuplot not found, using plotters backend
divide_scalar 4         time:   [41.871 us 41.883 us 41.899 us]                             
                        change: [-89.393% -89.388% -89.382%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

divide_scalar prime     time:   [71.806 us 71.821 us 71.845 us]                                
                        change: [-82.082% -82.035% -81.993%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

@jorgecarleitao, I used some unsafe to convince the compiler of the proper types. How do you want to go about this. We could also create some extra traits to do the conversion.

@codecov
Copy link

codecov bot commented Aug 11, 2021

Codecov Report

Merging #275 (7a6f439) into main (4c18c0a) will increase coverage by 0.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #275      +/-   ##
==========================================
+ Coverage   77.14%   77.24%   +0.10%     
==========================================
  Files         254      254              
  Lines       20648    20733      +85     
==========================================
+ Hits        15929    16016      +87     
+ Misses       4719     4717       -2     
Impacted Files Coverage Δ
src/compute/arithmetics/mod.rs 51.87% <ø> (+0.75%) ⬆️
src/compute/arithmetics/basic/div.rs 96.52% <100.00%> (+2.00%) ⬆️
src/compute/arithmetics/basic/rem.rs 96.52% <100.00%> (+2.00%) ⬆️
src/io/ipc/read/read_basic.rs 88.50% <0.00%> (+0.13%) ⬆️
src/compute/nullif.rs 1.81% <0.00%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c18c0a...7a6f439. Read the comment docs.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great!

I think that for now we should live with the transmute. I think that the new scalar API will enable us to write this function as div_scalar(&dyn Array, &dyn Scalar) -> Box<dyn Array>, which will allow us to avoid this transmute and specialize directly based on DataType.

u16 and u32 is also supported by strength division.

src/compute/arithmetics/basic/div.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
benches/arithmetic_kernels.rs Outdated Show resolved Hide resolved
@ritchie46
Copy link
Collaborator Author

I think that for now we should live with the transmute. I think that the new scalar API will enable us to write this function as div_scalar(&dyn Array, &dyn Scalar) -> Box, which will allow us to avoid this transmute and specialize directly based on DataType.

Great! than I will implement the other types as well.

@ritchie46 ritchie46 changed the title WIP: PoC strength_reduce division strength_reduce division/remainder Aug 11, 2021
@ritchie46
Copy link
Collaborator Author

@jorgecarleitao I updated the PR. It is now implemented for rem_scalar and div_scalar.

There is a bit repetition in the code. Because branches work on different concrete types it seems not possible to write a generic function for that. A macro would be possible but, but given the usage of unsafe, I chose for repetition as that's more readable.

@jorgecarleitao
Copy link
Owner

Do we have a test hitting the branch?

@ritchie46
Copy link
Collaborator Author

Do we have a test hitting the branch?

Will write that tomorrow.

@ritchie46
Copy link
Collaborator Author

ritchie46 commented Aug 12, 2021

@jorgecarleitao all branches are ran in tests now. I think its good to go.

@ritchie46 ritchie46 closed this Aug 12, 2021
@ritchie46 ritchie46 reopened this Aug 12, 2021
@jorgecarleitao jorgecarleitao changed the title strength_reduce division/remainder Specialize division and remainder of unsigned using strength_reduce Aug 12, 2021
@jorgecarleitao jorgecarleitao merged commit cf04021 into jorgecarleitao:main Aug 12, 2021
@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Aug 12, 2021
@jorgecarleitao jorgecarleitao changed the title Specialize division and remainder of unsigned using strength_reduce Improve performance of rem_scalar/div_scalar for integer types (4x-10x) Aug 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants