Improve performance of `rem_scalar/div_scalar` for integer types (4x-10x) #275

ritchie46 · 2021-08-11T13:37:24Z

Already make this PR to start the discussion. I implemented the strength reduction algorithm discussed in #259 for division of u64 arrays and the performance increases are almost 10x!

I benchmarked a divsion by 4 (which is a single bit shift) and a divison by a prime number.

Gnuplot not found, using plotters backend
divide_scalar 4         time:   [41.871 us 41.883 us 41.899 us]                             
                        change: [-89.393% -89.388% -89.382%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

divide_scalar prime     time:   [71.806 us 71.821 us 71.845 us]                                
                        change: [-82.082% -82.035% -81.993%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

@jorgecarleitao, I used some unsafe to convince the compiler of the proper types. How do you want to go about this. We could also create some extra traits to do the conversion.

codecov · 2021-08-11T13:47:38Z

Codecov Report

Merging #275 (7a6f439) into main (4c18c0a) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #275      +/-   ##
==========================================
+ Coverage   77.14%   77.24%   +0.10%     
==========================================
  Files         254      254              
  Lines       20648    20733      +85     
==========================================
+ Hits        15929    16016      +87     
+ Misses       4719     4717       -2

Impacted Files	Coverage Δ
src/compute/arithmetics/mod.rs	`51.87% <ø> (+0.75%)`	⬆️
src/compute/arithmetics/basic/div.rs	`96.52% <100.00%> (+2.00%)`	⬆️
src/compute/arithmetics/basic/rem.rs	`96.52% <100.00%> (+2.00%)`	⬆️
src/io/ipc/read/read_basic.rs	`88.50% <0.00%> (+0.13%)`	⬆️
src/compute/nullif.rs	`1.81% <0.00%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c18c0a...7a6f439. Read the comment docs.

jorgecarleitao

Overall looks great!

I think that for now we should live with the transmute. I think that the new scalar API will enable us to write this function as div_scalar(&dyn Array, &dyn Scalar) -> Box<dyn Array>, which will allow us to avoid this transmute and specialize directly based on DataType.

u16 and u32 is also supported by strength division.

src/compute/arithmetics/basic/div.rs

Cargo.toml

benches/arithmetic_kernels.rs

ritchie46 · 2021-08-11T17:44:41Z

I think that for now we should live with the transmute. I think that the new scalar API will enable us to write this function as div_scalar(&dyn Array, &dyn Scalar) -> Box, which will allow us to avoid this transmute and specialize directly based on DataType.

Great! than I will implement the other types as well.

ritchie46 · 2021-08-11T18:09:14Z

@jorgecarleitao I updated the PR. It is now implemented for rem_scalar and div_scalar.

There is a bit repetition in the code. Because branches work on different concrete types it seems not possible to write a generic function for that. A macro would be possible but, but given the usage of unsafe, I chose for repetition as that's more readable.

jorgecarleitao · 2021-08-11T19:00:51Z

Do we have a test hitting the branch?

ritchie46 · 2021-08-11T19:04:28Z

Do we have a test hitting the branch?

Will write that tomorrow.

ritchie46 · 2021-08-12T08:37:12Z

@jorgecarleitao all branches are ran in tests now. I think its good to go.

jorgecarleitao reviewed Aug 11, 2021

View reviewed changes

src/compute/arithmetics/basic/div.rs Outdated Show resolved Hide resolved

Cargo.toml Outdated Show resolved Hide resolved

benches/arithmetic_kernels.rs Outdated Show resolved Hide resolved

ritchie46 force-pushed the strength_reduce branch from b603528 to 4eb3adf Compare August 11, 2021 18:06

ritchie46 changed the title ~~WIP: PoC strength_reduce division~~ strength_reduce division/remainder Aug 11, 2021

strenght_reduce division/remainder

7a6f439

ritchie46 force-pushed the strength_reduce branch from 4eb3adf to 7a6f439 Compare August 12, 2021 08:33

ritchie46 closed this Aug 12, 2021

ritchie46 reopened this Aug 12, 2021

jorgecarleitao changed the title ~~strength_reduce division/remainder~~ Specialize division and remainder of unsigned using strength_reduce Aug 12, 2021

jorgecarleitao merged commit cf04021 into jorgecarleitao:main Aug 12, 2021

jorgecarleitao added the enhancement An improvement to an existing feature label Aug 12, 2021

jorgecarleitao mentioned this pull request Aug 13, 2021

Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #259

Closed

jorgecarleitao changed the title ~~Specialize division and remainder of unsigned using strength_reduce~~ Improve performance of rem_scalar/div_scalar for integer types (4x-10x) Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `rem_scalar/div_scalar` for integer types (4x-10x) #275

Improve performance of `rem_scalar/div_scalar` for integer types (4x-10x) #275

ritchie46 commented Aug 11, 2021

codecov bot commented Aug 11, 2021 •

edited

Loading

jorgecarleitao left a comment

ritchie46 commented Aug 11, 2021

ritchie46 commented Aug 11, 2021

jorgecarleitao commented Aug 11, 2021

ritchie46 commented Aug 11, 2021

ritchie46 commented Aug 12, 2021 •

edited

Loading

Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #275

Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #275

Conversation

ritchie46 commented Aug 11, 2021

codecov bot commented Aug 11, 2021 • edited Loading

Codecov Report

jorgecarleitao left a comment

Choose a reason for hiding this comment

ritchie46 commented Aug 11, 2021

ritchie46 commented Aug 11, 2021

jorgecarleitao commented Aug 11, 2021

ritchie46 commented Aug 11, 2021

ritchie46 commented Aug 12, 2021 • edited Loading

Improve performance of `rem_scalar/div_scalar` for integer types (4x-10x) #275

Improve performance of `rem_scalar/div_scalar` for integer types (4x-10x) #275

codecov bot commented Aug 11, 2021 •

edited

Loading

ritchie46 commented Aug 12, 2021 •

edited

Loading