Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of derangement/subfactorial with iterative implementation #146

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

FedericoStra
Copy link

@FedericoStra FedericoStra commented Dec 10, 2023

This PR changes the implementation of derangement, hence also subfactorial,
to use the recursive formula !n = (n-1) * (!(n-1) + !(n-2)) presented here.

For values such as subfactorial(100) I get a huge speed-up, like ~10x.

…plementation

Use the recursive formula
  !n = (n-1) * (!(n-1) + !(n-2))
presented here: https://en.wikipedia.org/wiki/Derangement#Counting_derangements
Copy link

codecov bot commented Dec 10, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (e6888be) 96.97% compared to head (afc071d) 96.87%.

Files Patch % Lines
src/factorials.jl 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   96.97%   96.87%   -0.10%     
==========================================
  Files           7        7              
  Lines         728      737       +9     
==========================================
+ Hits          706      714       +8     
- Misses         22       23       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@FedericoStra
Copy link
Author

With this change I pass from

julia> @benchmark subfactorial(100)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  185.527 μs … 47.124 ms  ┊ GC (min … max):  0.00% … 60.54%
 Time  (median):     189.635 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   267.191 μs ±  1.711 ms  ┊ GC (mean ± σ):  14.89% ±  2.31%

  ▆█▇▆▆▅▄▃▂▃▃▃▃▂▁▁▁▂▁▁▁                                        ▂
  █████████████████████▇▇▇▅▆▇▅▄▆▆█▇█▇▇▇▇▇▆▆▅▅▃▅▄▄▃▄▄▃▁▄▁▁▄▃▃▁▃ █
  186 μs        Histogram: log(frequency) by time       273 μs <

 Memory estimate: 81.06 KiB, allocs estimate: 2931.

to

julia> @benchmark subfactorial(100)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  12.113 μs …  49.734 ms  ┊ GC (min … max):  0.00% … 60.39%
 Time  (median):     15.437 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   29.177 μs ± 762.966 μs  ┊ GC (mean ± σ):  25.19% ±  0.96%

          ▂▅▆█▂  ▂▄     ▄▅▁
  ▁▁▁▁▁▁▂▄█████▆▅███▅▅▃▅███▆▅▅▄▃▃▂▂▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  12.1 μs         Histogram: frequency by time         24.2 μs <

 Memory estimate: 14.09 KiB, allocs estimate: 499.

…rsive formula and inplace computations

Use the simpler formula

!n = n * !(n-1) + (-1)^n

and use inplace operations on `BigInt`s to avoid allocations.
@FedericoStra
Copy link
Author

Changes

  • I replaced the recursive formula with the simpler !n = n * !(n-1) + (-1)^n.
  • I replaced all operations on BigInts with in-place functions from Base.GMP.MPZ to reduce allocations.

Consequences
The performance improved by another factor ~10x:

julia> @benchmark subfactorial(100)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.427 μs … 137.132 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.480 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.538 μs ±   1.549 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▆▆▄▂                                                      ▂
  ██████▆▃▃▃▅██▇▆▆▅▃▄▄▄▁▃▄▄▃▁▄▃▃▁▄▄▁▁▄▁▁▃▁▁▃▄▁▃▁▁▁▁▃▃▁▁▃▁▁▁▃▅ █
  1.43 μs      Histogram: log(frequency) by time      3.05 μs <

 Memory estimate: 112 bytes, allocs estimate: 11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant