-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v22 slower than v21? #47
Comments
Only if AMD changed the way they signal AVX2 support. I don't think they did? Because of the parameters you used, neither Super nor Degrain1 are using the new AVX2 code, which means it's Analyse that got slower. Do you see a difference between v21 and v22 when you run Analyse on a 16 bit clip? |
The difference seems to be consistent. Analyse 16-bits, v22 26.22 fps Which functionalities in MSuper or MDegrainx should be optimized? I could test them as well. |
Degrain with 8 bit clips, Super with sharp=0 or 2. |
Same thing with those, v21 is faster. sharp=2, v22 55.97 fps |
Just for fun, I checked what x264 shows: So at least it's working properly. |
Is v22 compiled with Visual Studio faster than v21? See attached. |
Yes, it seems to be faster. Compared to those first tests with 8-bit Analyse and 16-bit degraining, I got 60.43 fps as the result. |
Tried compiling with GCC 9 on Linux. v22 is running faster than v21 for me. Maybe the issue is related to MinGW and cross-compilation. Script from Doom9 thread:
Profiler results. Units are perf "cycles" events, which is a proxy for time. In this script, the AVX2 code is offering negligible speedup, because the bulk of the compute is not in SIMD code anyway, due to the
|
Which compiler flags did you use? (And Autotools or Meson?) |
Default autotools build ( |
Hmm. The default with Makefile.am is -O2. Meson defaults to -O3. I compiled the v22 and v23 DLLs using Meson. (I don't know about the older ones.) Perhaps that's what makes it slower? |
I did some test with the above script and for me r22 and r23 are slightly faster than r21 (~4%). GCC 10 builds are ~10% bigger than GCC 9 but just a tiny bit faster (~2%). On my zen2 CPU I used |
@Boulder08 Here is v23 compiled with -O2 instead of -O3. That's the only difference. Please test again. |
2500 frames of a test script of analysis and degraining in 16 bits: So it was definitely slower. |
As I measured here: https://forum.doom9.org/showthread.php?p=1910541#post1910541 , the new version with speed improvements seems to be slower than the previous one. Are the CPU instruction sets properly detected? I noticed that the part doing the job is quite old and may not be up to it with these new-gen AMD Ryzens (I'm running a 3900X).
The text was updated successfully, but these errors were encountered: