Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.

The Final Release: aom-av1-psy version 1.0.0!

Latest
Compare
Choose a tag to compare
@BlueSwordM BlueSwordM released this 07 Sep 06:48
· 68 commits to Endless_Possibility since this release

Versioning
aom-av1-psy 1.0.0: a34b5e3
aomenc-av1 3.4.0 Git: https://aomedia.googlesource.com/aom/+/18768f2bb5a7edf6b8a0fcb5ad8bdd86d80caba6

Binary information

Static Windows binaries have been provided for the 3 main branches: Full Build, Full Build Alpha, and Endless Possibility, optimized in Haswell, Skylake and znver 1 variants, as well as having a generic Full Build for systems that don't satisfy these requirements for some reason.

Static Linux binaries of the 3 main branches with libvmaf included have been attached below, and have a minimum CPU requirement of a Haswell/Zen1 processor or newer. It is recommended to build it yourself if you can to attain the highest level of optimizations possible.

Announcement: moving on from aomenc, to aom-av1-psy, and ending up in rav1e

This has been the last aom-av1-psy version I've personally worked on, which means that after this final release, only maintenance updates will be provided by periodic code rebases from mainline aomenc-av1. It's honestly been a pleasure working with so many good people, learning so much about general video encoding and even improving my programming skills further, but it's just too much for me at this point to work on this codebase with no direct support.

I'll be moving on to rav1e development afterwards, as the people are much better to work with, rav1e has a much better codebase, and it is far more flexible in its encoder design, which will allow me to do stuff I had never been able to do before :)

Full Build changes over mainline aomenc

  1. The main deeper psycho-visual changes included in the base aom-av1-psy flavor in the form of --tune-content=psy/animation content tunes.
    These changes include:
  • Adding a low luma bias for the variance aq-mode in aomenc(--aq-mode=1) to decrease the quantizer used in lower brightness partitions, increasing quality in such scenarios.
  • Reducing the default alt-ref frame temporal filtering strength from 5 to 2 for increase quality consistency and fidelity.
  • Adjusting chroma-deltaq so that it applies a -1 negative quantizer offset when dealing with chroma subsampled streams(4:2:0/4:2:2) to increase chroma quality. Stock behavior is preserved for 4:4:4 streams(positive quantizer offset) and has been boosted further as it has proven to increase psycho-visual benefits without harming chroma performance negatively(likely due to superior AV1 chroma handling)
  • For the --tune-content=psy tune, default superblock size has been changed from dynamic to a static 64x64, offering improved spatio-temporal AQ resolution(deltaq), improved encode and decode threading, as well as higher quality for higher bitrates at resolutions around 1080p.
    Note that around 6 megapixels, this gets disabled, as larger SBs start being quite beneficial around that mark.
  • For the --tune-content=psy tune, only the SGR restoration gets turned on, leading to less false positives around over-filtering, making keeping restoration filtering on mostly a non issue at high quality.
  • Keyframe-filtering=1 fixes. You can now use it without worrying about wildly lower quality keyframes.
  • For the --tune-content=psy tune, pixel domain operations only. That does mean a specific setting becomes overriden(--dist-metric=qm-psnr), but the benefit in doing operations in the pixel domain in terms of quality is larger than not being able to use frequency driven PSNR for in-block distortion optimization.
  • Higher quality intra mode searches, at all quality levels except for RT, but the psy content tune isn't meant for low latency RT anyway.
  1. Addtion of new psy driven RD and QP tunes: ipq(Image Perceptual Quality), vmaf_psy_qp(QP adjustment tune based on VMAF's motion analysis), and ipq_vmaf_psy(RD adjustment based on ipq and QP adjustment based on VMAF's motion analysis). The fastest is ipq, followed by vmaf_psy_qp and ipq_vmaf_psy in last(although negligeably behind vmaf_psy_qp realistically speaking). The speed differences mainly come from threading differences. In my opinion, ipq and ipq_vmaf_psy are the most balanced overall, although some people prefer vmaf_psy_qp for their own reasons. Using all of them is better than the default PSNR-HVS/"SSIM"/processing VMAF RD tunes, and the vmaf_qp based ones are a lot faster than the raw vmaf_without_preprocessing.

  2. Better deltaq-mode=5 for HDR in both luma and chroma terms: Better tuned overall for chroma and luma HDR deltaq has been enabled.

  3. Max lag-in-frames(LIF) has been increased from 48 to 128: Most benefits come from increasing LIFs to 64. After that, any benefit becomes rather small while RAM usage goes up considerably for every 32 frame mini GOP you store(65 LIFs consume a lot more than 64 LIFs). My recommendation would be to stay around 64, 96 if you've got the RAM and slight increase in patience, and >96 is not recommended at all and is considered to be in full placebo territory.

  4. No 1st-pass pruning: We removed some of the speed pruning in the 1st pass analysis, since we found they barely provided any benefits. They were likely necessary back in 2018-2020, but not anymore in 2022. Gives a slight general quality bump for faster presets.

  5. Making the RD multipler and sharpness deadzone changer setting `--sharpness=X work after June 2021: Basically, this setting's main purpose is to change the RD multiplier up to increase RD block distortion calculations(across blocks, not in-block), making stuff sharper/letting more distortion/artifacts pass through. Before June 2021, this worked as expected. After June 2021 with a faithful patch, any setting above sharpness=1 became practically useless, as sharpness 1>7 were all bit-exact. We reverted this change entirely. Was this done voluntarily? No idea.

  6. We added a no screen content content tune as well as disable screen content tool detection for the psy content tune, since they aren't all that useful outside of specific cases and can make some content perform worse with SC tools enabled. This stays on for the animation tune.

  7. All-frames SB RD adjustment: A new change that greatly improved intra performance was the addition of every superblock now being able to be influenced by RD optimizations decisions. We ported this change to video coding for all frames, and saw a good improvement in general performance, so we decided to keep it as is.

  8. Making chroma loop restoration enabled for more presets and making it quantizer based and adding chroma mixed photon noise.

  9. Fixing temporal filtering bugs and mistakes:

  • We rebalanced the calculation of temporal filtering strength by changing the equation from having a denominator of 4 to 5(arnr-strength/4 to arnr-strength/5) and making arnr-strength=6 work by increasing the max filtering strength ceiling. Before this change, arnr-strength=4=5=6 :)
  • Bug fixes regarding temporal-filtering application with certain edge cases.
  • Better CDEF for the --tune-content=animation on all good presets.
  1. Proper aom-av1-psy name in the binary, a keyframe adjustment factor was added to increase keyframe quality for better referencing, a good positive change overall, especially with better keyframe filtering aiding results further. Also, slightly better vmaf threading.

  2. BETTER DEFAULTS: CPU-3, 10-bit, min-kf interval of 12 frames and max-kf interval of 240 frames, default chroma-deltaq, no external denoiser for internal grain synthesis. This should get more people to use better settings by default and make command lines shorter, leading to less excessive cargo culting.

Full Build Alpha changes over Full Build

  1. All-frame AQ1: What it does is that it enables aq-mode=1 segmentation calculations to run on every frame of a mini GOP instead of only every allowed frame.

Although not considered alpha anymore, it is still a change that changes rate control and quality of scenes enough to still have it be considered alpha. However, the quality increase in difficult scenes without having to rely on upping the base quantizer is quite nice to have, so it balances out quality and makes it more consistent overall.

  1. More consistent temporal filtering adjustment: We changed the calculation to force the encoder to add fewer frames in its alt-ref temporal filtering process(MCTF for alt-refs, or Motion Compensated Temporal Filtering for alt-ref frames), making encoding quality more consistent between scenes of slightly varying complexity.

Endless Possibility changes over Full Build Alpha

  1. Even slightly better defaults: Max mini GOP frame pyramid height has been changed from 5 depths to 4 depths. This increases processing speed and increase quality consistency, particularly in grainy harsh scenes.

  2. Quality/quantizer adaptive CDEF: So back in 2018, the default CDEF strength selection algorithm was changed from daala-dist to MSE and in 2020, was removed entirely from the aomenc codebase. This was not a good change: MSE is a very poor indicator for good psycho-visual compression performance, and as such, CDEF strength calculations became quite literally worse psycho-visually speaking, preferring stronger strenghts and more blurring over a better balance, as edges that do not suffer from mild/severe edge artifacting were now getting a bit blurred for no reason.

We initially did not see this as being a problem, but as our quality standards become better, our knowledge and experience became better, and the tools rapidly became better by our own accord, we started seeing the limitations of such a poorly made PissNR driven choice,

That's when I had the idea: what if we limited CDEF strengths based on the quantizer? Higher quantizer = more compression = bigger primary/secondary filter strength search space, and lower quantizer = less compression = smaller filter strength search space. Not only would limiting the search space decrease the rate of false positives and fidelity, but this would also increase the speed, since a smaller search space is always faster to search in, and this gets better since at higher quality levels, encoding load is increased, so this offers a good balance.

I decided to implement a rough guide of my ideas in the Endless Possibility branch. It worked, and offered decent benefits, but the code infrastructure offered by aomenc made testing difficult and offered limited variability options without refactoring a big part of the CDEF code base. This made testing with a lot of data an actual pain, and the limitations made it tough to make it better.

I did manage to implement some of these changes, but they are not flexible regarding CDEF strength adjustment(a power curve would be a lot more suitable), especially at different resolutions and since testing is quite difficult, only a part of these changes have been applied as I wanted them. They do exist, and help out quality and fidelity at the higher end, but they're not great. I hope to implement a pareto-optimal implementation in rav1e, with a metric that doesn't want to make me gauge my eyes out.

Relevant commit: c7e14fd

  1. Adding quantizer sharpness based controls: --sharness=X controls RD across blocks. This new setting --quant-sharpness=X controls in-block distortion variation, or internal block quantization sharpness. Use this setting carefully, and enjoy.

  2. A true alpha change and a new thing for a video encoder: mbtree/cutree/temporal-RDO strength controls: In this code branch, you can now control temporal-RDO strength. Test it out for yourself, and see how increasing/decreasing the strength changes the output in motion...

  3. The addition of dq-modulate for deltaq-mode=2: deltaq-mode=2 has 2 ways to analyze spatial complexity: wavelets and variance. I know, no activity masking, but be happy this still exists. Variance is more consistent(but has a bug when alt-ref filtering strength approaches 0) and wavelet analysis has higher quality potential, but may be more inconsistent. Slower presets have less issues in the latter regard.

New settings for each branch

  • Full Build and Full Build Alpha: --tune-content=psy/animation, --tune=ipq/ipq_vmaf_psy/vmaf_psy_qp.
  • Endless Possibility: --dq-modulate=0/1, --tpl-strength=0-1000, --quant-sharpness=0-7.

That's about it! Enjoy the new builds and happy encoding!