Use psnr to compare frames #662

dvrogozh · 2025-04-29T03:18:15Z

With attempts to support XPU acceleration (Intel GPUs) and aarch64 we see the need to generalize frames comparison in a way which would work across different platforms and implementations of color space conversion algorithms. In media domain such comparisons are being achieved by using metrics such as PSNR or SSIM. This commit introduces such comparison for all the platforms. Commit brings dependency from torcheval in tests which is used to calculate PSNR.

CC: @traversaro, @scotts, @NicolasHug

This commit brings dependency from torcheval in tests. Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

NicolasHug · 2025-04-29T08:29:36Z

Thank you for the PR @dvrogozh , some high-level comments:

Let's not modify the existing checks for the platforms we natively support. Some of our checks are dumb-and-simple and there is value to that
Instead of adding a new metric to account for other architectures, we should try to solve the more general problem of allowing out-of-core repos to provide their own frame comparison function. I'm not sure yet how to enable this, but we should let each platform decide individually, rather than have a centralized function within torchcodec.
If possible, we should try not to rely on an extra dependency (torcheval). It's OK if we need to vendor some of the code especially if it's just for one metric. But, with 1 and 2, I don't think we'll need to add any new metric within torchcodec itself.

Have you considered relying on this check for your XPU work? It is more permissive than what we currently do on linux:

torchcodec/test/utils.py

Lines 54 to 56 in ed13ac5

    
           # Asserts that at least `percentage`% of the values are within the absolute tolerance. 
        
           # Percentage is expected in [0, 100] (actually, [60, 100]) 
        
           def assert_tensor_close_on_at_least(actual_tensor, ref_tensor, *, percentage, atol):

dvrogozh · 2025-04-30T21:23:09Z

Thank you for your comments, @NicolasHug. This PR touches important topic on the approach used in torchcodec tests to validate correctness of video frames. I understand your motivation to use straight forward and simple bit exactness metrics. This indeed easiest thing to do. And that would work for most of decoders (at least most important once like avc, hevc, av1, etc.) - those who don't use floating point operations and are required to be conformant. The problems comes with video processing algorithms such as color conversion or scaling and further multiples when you start consider video encoding. That's basically a given fact that you can't use simple bit-to-bit exactness metric across different implementation having single truth baseline (taken from cpu or cuda or any other processor). To properly compare video frames you must use specific metrics such as PSNR, SSIM, MS-SSIM. My concern with the current torchcodec tests design is that the project will step into issues on a long run when code base and features will grow potentially spanning across multiple implementations on different platforms. Even as of today you handle video frames comparison with ~5 different paths, 2 for cuda, 2 for cpus and non-cuda, 1 for even more complex cases (assert_tensor_close_on_at_least you pointed me to). Add to that device specific checks in individual tests - as in test_get_frame_at_pts I caught working on this PR:

torchcodec/test/test_ops.py

Line 113 in ceeeb11

with pytest.raises(AssertionError):

I afraid that going forward there might be real issue with project sustainability.

By intent I did this PR in a blunt way from the sense of how I would approach this task on the day 0. Project is eventually past the day 0 and I totally understand desire to preserve current baseline checks. I think however we can still discuss this story and come to the better solution and help project maintenance in the future. I would like to propose the following:

Rely on standard media metrics (PSNR or SSIM) as a design principal for torchcodec tests in cases when bit exactness can not be guaranteed per-design
In addition to above, strengthen the checks for particular cases. Such as bit exactness for cpu x86_64 or minor deviations in case of cuda
- In my mind, to this item would fall your proposal on "allowing out-of-core repos to provide their own frame comparison function"

In this way, 1st item will give us a baseline which all implementations of torchcodec should pass. Further, 2nd item allows to strengthen the requirements for particular cases. The assert_frames_equal in this case would look something like:

def assert_frames_equal(input, outpur, psnr_threshold):
  psnr = calculate(input, output)
  assert psnr >= psnr_threshold

  if input.device.type == "cpu":
    torch.testing.assert_close(*args, **kwargs, atol=0, rtol=0)
  elif input.device.type == "cuda":
    ...

If possible, we should try not to rely on an extra dependency (torcheval)

That's implementation details :). There is indeed a question whether torcheval is the way to go with PSNR/SSIM metrics - we need to have understanding on the future of this project to make a decision to rely on it. I hoped you might have some insight on that? We can consider pulling in PSNR implementation directly to torchcodec test utils or rely on other library for such calculations. That's the point to discuss.

Have you considered relying on this check for your XPU work?

That would be reinventing the wheel, i.e. attempt to define a metric while they already exist and used for media for decades - PSNR/SSIM/MS-SSIM.

NicolasHug · 2025-05-02T08:48:27Z

Thanks for your input @dvrogozh

I think we need to answer to distinct problems:

What should TorchCodec use for frame comparison, for the "builtin" architectures it supports (x64, macos, etc.)
How do we allow out-of-core backends and other repos to define their own frame comparison function.

Regarding problem 1:

I would be happy to include PSNR (and potentially the other metrics you mentioned) as an additional check, but I wouldn't make it a replacement. The reason I want to keep our existing checks is that PSNR is complementary to our current checks, but it's not inherently stricter, more complete, or generally better. PSNR is derived from MSE, while our existing checks e.g. the MacOS are bounding the max abs diff. It asserts a different property from PSNR (and it is very intuitive), so there's value in having both.

I'd love to review a PR that adds PSNR to the checks we run in assert_frames_equal (we should write our own implem though, it seems simple enough).

Regarding problem 2:

I think this is the actual problem we need to solve in order to enable your XPU out-of-core work. I think it's distinct and largely orthogonal to problem 1. I'd love to brainstorm ideas with you on this.

dvrogozh · 2025-05-09T16:31:32Z

@NicolasHug : I am fine with that direction. I will rework this PR once #636 will land and I'll rebase #558 with XPU support - this will allow me to verify changes via XPU path.

Use psnr to compare frames

ceeeb11

This commit brings dependency from torcheval in tests. Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 29, 2025

dvrogozh mentioned this pull request Apr 29, 2025

Aarch64 support #569

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use psnr to compare frames #662

Use psnr to compare frames #662

Uh oh!

dvrogozh commented Apr 29, 2025

Uh oh!

NicolasHug commented Apr 29, 2025

Uh oh!

dvrogozh commented Apr 30, 2025

Uh oh!

NicolasHug commented May 2, 2025

Uh oh!

dvrogozh commented May 9, 2025

Uh oh!

Uh oh!

Use psnr to compare frames #662

Are you sure you want to change the base?

Use psnr to compare frames #662

Uh oh!

Conversation

dvrogozh commented Apr 29, 2025

Uh oh!

NicolasHug commented Apr 29, 2025

Uh oh!

dvrogozh commented Apr 30, 2025

Uh oh!

NicolasHug commented May 2, 2025

Uh oh!

dvrogozh commented May 9, 2025

Uh oh!

Uh oh!