Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm pan tweaks #426

Merged
merged 1 commit into from
Sep 20, 2020
Merged

Arm pan tweaks #426

merged 1 commit into from
Sep 20, 2020

Conversation

paulfd
Copy link
Member

@paulfd paulfd commented Sep 18, 2020

Apparently ARM takes a substantial hit to repeatedly go from float to int to float, which hurts in table lookups. In fact, data interpolation and panning takes up more than half the processing time of ARM rendering.

This patch sets NEON as a required element for ARM builds (not too unreasonable I think) and uses an accelerated codepath if present. Speedup on raspi 3 is about 50% for the panning/width process, and 25% overall.

Note that performance is still not that great on such a device 🙂 But we'll get somewhere. I think having shortcuts for modifiers at the voice level would be the most evident thing to do. We could interrogate the mod matrix somehow with a canShortcut(ModKey target) or something: if no event happened during the block then you can apply a single value and you're done. This would probably also help the performance on all platforms.

@paulfd
Copy link
Member Author

paulfd commented Sep 19, 2020

[...] I think having shortcuts for modifiers at the voice level would be the most evident thing to do. We could interrogate the mod matrix somehow with a canShortcut(ModKey target) or something: if no event happened during the block then you can apply a single value and you're done. This would probably also help the performance on all platforms.

Note that while this would help for "most" cases it could lead to bad worst cases behavior when there is an event in the block, so it's not a perfect solution either.

@paulfd paulfd marked this pull request as draft September 19, 2020 08:54
@jpcima
Copy link
Collaborator

jpcima commented Sep 19, 2020

Apparently ARM takes a substantial hit to repeatedly go from float to int to float, which hurts in table lookups.

Yes because this function is implemented in software without the flags.

May we implement simde so we can get free NEON implementations off the existing SSE code?
this does not stop us to identify the slow functions and do NEON specializations as we need.

@paulfd
Copy link
Member Author

paulfd commented Sep 19, 2020

Yes sure, let's do it.

Honestly I'm also debating whether the templated versions are useful (are we gonna use double ever?) and whether we should strip every functions but the ones we actually use to really target important parts to eventually translate...

@jpcima
Copy link
Collaborator

jpcima commented Sep 19, 2020

Honestly I'm also debating whether the templated versions are useful (are we gonna use double ever?)

Then let's close the debate right away, no they arent :)

@paulfd
Copy link
Member Author

paulfd commented Sep 19, 2020

This is turning into more than a small change to help ARM xD

@paulfd paulfd marked this pull request as ready for review September 20, 2020 18:41
@paulfd paulfd requested a review from jpcima September 20, 2020 18:41
@paulfd
Copy link
Member Author

paulfd commented Sep 20, 2020

The CI fails randomly. I'll work on cleaning up SIMD overall (again? one day it'll be the last) and integrate simde, looks like a nice project !

@paulfd
Copy link
Member Author

paulfd commented Sep 20, 2020

--------------------------------------------------------------------
Benchmark                          Time             CPU   Iterations
--------------------------------------------------------------------
PanFixture/PanScalar/16          912 ns          828 ns       844968
PanFixture/PanScalar/64         3637 ns         3284 ns       213258
PanFixture/PanScalar/256       14165 ns        13130 ns        53320
PanFixture/PanScalar/1024      55621 ns        52835 ns        13212
PanFixture/PanScalar/4096     217907 ns       213866 ns         3270
PanFixture/PanSIMD/16            381 ns          374 ns      1874263
PanFixture/PanSIMD/64           1471 ns         1448 ns       482756
PanFixture/PanSIMD/256          6077 ns         5675 ns       124197
PanFixture/PanSIMD/1024        23838 ns        23063 ns        30241
PanFixture/PanSIMD/4096       101443 ns        97869 ns         7152
PanFixture/PanSfizz/16           363 ns          345 ns      2037612
PanFixture/PanSfizz/64          1150 ns         1131 ns       618596
PanFixture/PanSfizz/256         4600 ns         4393 ns       159567
PanFixture/PanSfizz/1024       18466 ns        17890 ns        39011
PanFixture/PanSfizz/4096       87131 ns        79387 ns         8910

For future reference, on a raspi 3.

@paulfd paulfd merged commit 6114644 into sfztools:develop Sep 20, 2020
@paulfd paulfd deleted the arm-pan-tweaks branch September 20, 2020 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants