Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fdr get conf optimization #311

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

ypicchi-arm
Copy link

Speeds up FDR for NEON by vectorizing the loads in get_conf_stride.
Also included the domain mask unflipping. I remember you've done something similar in your experimental FDR branch so I left it as a separate commit if you want to snip it out.

The domain mask was being flipped, then unfliped, while never using the
flipped state. This patch remove this unecessary flipping.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
get_conf_stride_1 loads 16 consecutive bytes and apply a mask and shift.
We can do that easily in a vectorized way instead. This speeds up fdr
by around 5%.
get_conf_stride_2 also benefits from it, but with less data, the
overhead of vectorisation limit most of the gain.

Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant