-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avfilter/tonemap: add simd implementation for sse and neon #401
Conversation
Currently only reinhard, linear and none has simd implmentation, all other methods will fallback to scaler implementation. Reinhard is the preferred way on CPU because it is fast and produces subjectively satisfactory outputs as the result tend to look brighter. Test result with 4K HEVC 10bit HLG input, encoding with libx264 veryfast and reinhard method: Apple M1 Max: tonemap.neon: 44fps tonemap.c: 35fps Intel Core i9-12900: tonemap.sse: 40fps tonemap.c: 32fps Both resulted in ~25% perf gain.
AVX implementation was also attempted but there is no measurable perf gain. I dropped that draft to simply the logic. |
These intrinsics requires armv8 cpu
I have a draft of an improved sw tonemap filter, but it doesn't have intrinsics/assembly support yet. If you're interested you can test it out and see how it performs now vs zscale+tonemap combo. |
zscale does color space conversion and linearization very fast as it is already using SIMD-optimized LUT so the scaler filter can hardly beat that. What we can do with that draft is to implement dovi reshaping and use that for dovi inputs, and we may even only implement the reshaping part so that we can pipe it into zscale for linearization and then do tonemap with this filter. The dovi reshaping part has a lot of simd optimization opportunities as there are a lot of matrix operations. Compute power of floats is also a time-consuming task which means an SIMD optimized LUT is a must for CPU. This is also the reason why BT2390 is not an easy task on CPU. |
What ffmpeg command did you use to test zscale+tonemap? |
Full command:
On some processor and input video combination, you need to reduce the
It is also fine. This PR does not add LUT either, it just computes multiple pixels with SIMD at the same time and that's why reinhard is used. We can do the same with dovi reshaping. |
Closed in favor of #407 |
Currently only reinhard, linear and none has simd
implmentation, all other methods will fallback to scaler implementation.
Reinhard is the preferred way on CPU because it is fast and produces subjectively satisfactory outputs as the result tend to look brighter.
Test result with 4K HEVC 10bit HLG input, encoding with libx264 veryfast and reinhard method:
Apple M1 Max:
tonemap.neon: 44fps
tonemap.c: 35fps
Intel Core i9-12900:
tonemap.sse: 40fps
tonemap.c: 32fps
Both resulted in ~25% perf gain.
Changes
Issues