Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to sgrproj HBD performance #3116

Merged
merged 3 commits into from
Jan 28, 2023

Conversation

shssoichiro
Copy link
Collaborator

sgrproj_sum_finish in particular was identified as a hotspot in HBD encoding. This changeset attempts to improve the performance of it and surrounding code by:

  • Moving bounds checks out of a hot loop
  • Use builtin saturating_sub method which is more efficient than performing multiple casts
  • Use generics to allow compile-time computation of math dependent on bit-depth

Copy link
Collaborator

@lu-zero lu-zero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a pass and move const generics further up.

@barrbrain barrbrain merged commit afe02e3 into xiph:master Jan 28, 2023
@shssoichiro shssoichiro deleted the sgrproj-perf branch January 28, 2023 16:26
shssoichiro added a commit to shssoichiro/rav1e that referenced this pull request Jan 30, 2023
This is a followup from xiph#3116 which expands this optimization
to as many places in the encoder as we can reasonably utilize it.
By using generics, there are places where the compiler is able to
simplify math operations at compile time as well as areas where the
compiler is able to remove branches so that we only branch on bit depth
at the highest level of the code (and therefore the fewest number of
times).

Based on hyperfine benchmarking, this results in a 1-2% speedup across
the encoding process, although it does increase the final binary size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants