Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Theora] NEON support #3

Closed
flibitijibibo opened this issue Sep 4, 2018 · 13 comments · Fixed by #25
Closed

[Theora] NEON support #3

flibitijibibo opened this issue Sep 4, 2018 · 13 comments · Fixed by #25
Assignees

Comments

@flibitijibibo
Copy link
Member

flibitijibibo commented Sep 4, 2018

This one's mostly for @0x0ade.

In the new statically linked Theorafile, we're using Theora version 1.1.1, the most recent stable release. However, the latest Git revision has some ARM32 optimizations inside, not sure if it's usable on AArch64 though:

https://github.com/xiph/theora/tree/master/lib/arm

For whatever reason they have not done a minor point release in 10 years (!) and it doesn't seem like they're going to do it ever, so it seems we'll have to integrate the ARM changes ourselves (which we can do now that we have our own copy of the source). If it's anything like the FAudio optimizations, this should be a HUGE performance boost for ARM devices.

@flibitijibibo flibitijibibo changed the title [Theora] ARM asm support [Theora] ARM64 asm support Jul 29, 2020
@flibitijibibo flibitijibibo changed the title [Theora] ARM64 asm support [Theora] NEON support Sep 3, 2022
@flibitijibibo
Copy link
Member Author

The code is in, just need to wedge it into the Xcode project.

@linnaea
Copy link
Contributor

linnaea commented Jan 3, 2023

ARM64 changed the NEON/ASIMD register layout and naming so the ASM code for v7 wouldn't be usable.

I ported some of the heavier encoding funtion to use NEON intrinsics in C, which should be usable on both v7 and A64. You can find them here. It should be easier to integrate than assembly.

My own testing on Linux with AmLogic S905X3 and GCC 7.5 showed encoding speed of a 1360x768 4:2:2 source increase from 3fps to 6fps with just the 3 SATD functions ported.

I don't have a M1 Mac or Qualcomm Surface so I haven't tested on Windows or Mac.

Judging from the profiler report I don't think there's much room for improvement other than making libtheora itself multi-threaded, which would be a huge undertaking.

@flibitijibibo
Copy link
Member Author

Makes sense to me - I wonder if upstream is still around to take something like this in?

Not sure when I'll get to play with this but it's definitely something I'd like to have in Theorafile, thanks for the heads up!

@linnaea
Copy link
Contributor

linnaea commented Jan 3, 2023

Upstream is probably not going to take this. Their focus has shifted to Daala around 2014 which has since been developed into AV1.

@flibitijibibo
Copy link
Member Author

Might be worth filing a PR just for visibility - would also be easier for third parties like us to integrate since we can add the URL to our lib README.

@flibitijibibo
Copy link
Member Author

flibitijibibo commented Jan 22, 2023

Self-assigning this. Since the above intrinsics work applies cleanly to upstream and we recently bumped to upstream, I'll just go ahead and update to that source and test on the Switch compiler. Long as Switch is still happy, I'm happy!

@flibitijibibo flibitijibibo self-assigned this Jan 22, 2023
@flibitijibibo
Copy link
Member Author

Intrinsics work has been imported:

69e0a48

Once I get the Switch project files updated this is good to go.

@flibitijibibo
Copy link
Member Author

NX project files are updated (privately of course), forgot about Xcode though - will wait on that before closing.

@flibitijibibo
Copy link
Member Author

Latest commit has the Xcode updates!

@flibitijibibo
Copy link
Member Author

Reopening for juuust a moment. @linnaea, we may have found a visual bug caused by the arm-intrinsics, hoping to have a sample video to share soon. (CC @kiddkaffeine)

@flibitijibibo flibitijibibo reopened this Feb 1, 2023
@kiddkaffeine
Copy link
Contributor

Hey @linnaea, so I tested a number of game videos. This does not appear on all, but some get some odd artifacts when NEON is enabled. I'm not sure what distinguishes them.

Here's a sample, uploaded to Google Drive as these seem to be too large to host here:

https://drive.google.com/file/d/1LUJ-VZtrTsyH7tAgfol-y4Mb9rfVe47M/view?usp=sharing
AN000_screenshot.zip

Notice in this screenshot, there are some odd squares thoughout this frame of the video.

@linnaea
Copy link
Contributor

linnaea commented Feb 2, 2023

Looks like I swapped two shift instructions in loop filter, will submit a pull request

@flibitijibibo
Copy link
Member Author

This is merged, thanks for the quick fix! Will continue reporting if we see anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants