`struct LossLess`: Make `Rav1dFrameHeader_segmentation::lossless` a bit array #1231

kkysen · 2024-06-20T03:03:42Z

Part of fn decode_coefs is slow #1180.

I'm working on trying to fix #1180; this is one of a bunch of steps.

include/dav1d/headers.rs

randomPoison

If I'm understanding this correctly, it seems like the main optimization here is using a bitmask operations (LossLess::get) over an array index operation. It's not clear to me that this would necessarily be more efficient though. Have you profiled this and observed a clear performance improvement?

include/dav1d/headers.rs

kkysen · 2024-06-20T19:38:20Z

If I'm understanding this correctly, it seems like the main optimization here is using a bitmask operations (LossLess::get) over an array index operation. It's not clear to me that this would necessarily be more efficient though. Have you profiled this and observed a clear performance improvement?

No, I didn't profile this one individually yet. It does remove the bounds checks, though I guess that's also solved by SegmentId and InRange later on. It does also use less memory, which I think is useful as well.

randomPoison · 2024-06-20T20:49:38Z

It does remove the bounds checks, though I guess that's also solved by SegmentId and InRange later on. It does also use less memory, which I think is useful as well.

Given that you have other changes that target bounds checks for the lossless array, the only real advantage here is the memory reduction. That may still be a nice optimization to have, depending on how many of these are expected to be in memory at once, i.e. how many frame headers there are at a given time. I'm assuming not many, so this doesn't really seem like a worthwhile change to me, but I could be wrong about that. @fbossen might be the best person to weigh in on if this is a reasonable optimization.

kkysen · 2024-06-21T21:10:43Z

Given that you have other changes that target bounds checks for the lossless array, the only real advantage here is the memory reduction. That may still be a nice optimization to have, depending on how many of these are expected to be in memory at once, i.e. how many frame headers there are at a given time. I'm assuming not many, so this doesn't really seem like a worthwhile change to me, but I could be wrong about that. @fbossen might be the best person to weigh in on if this is a reasonable optimization.

It's not just about less overall memory, for which memory reductions would be quite small, but about what could be cached.

@fbossen, what do you think?

…ate and access it through this `fn`.

…lass` to avoid an `.unwrap()`. This optimizes it enough to have it be inlined.

…flows on multiplication) and pre-slice `levels`.

This moves the first `mag` initialization to inside the `match`, since LLVM fails to optimize out the bounds checks if an identical `match` is done before the `mag` initialization.

…it array.

…mized as much as possible.

fbossen · 2024-06-24T13:45:06Z

I've run a bunch of tests and it seems this PR leads to a small performance regression. The potential memory saving here is too small to have a positive impact.

kkysen requested review from randomPoison and fbossen June 20, 2024 03:03

kkysen force-pushed the kkysen/fn-loop_restoration_filter-Fn-call-safe branch from 42b9e02 to 8ada302 Compare June 20, 2024 05:40

kkysen force-pushed the kkysen/struct-LossLess-bit-array branch from 9973a68 to f9b1f96 Compare June 20, 2024 05:40

CrazyboyQCD reviewed Jun 20, 2024

View reviewed changes

include/dav1d/headers.rs Outdated Show resolved Hide resolved

randomPoison reviewed Jun 20, 2024

View reviewed changes

include/dav1d/headers.rs Outdated Show resolved Hide resolved

Base automatically changed from kkysen/fn-loop_restoration_filter-Fn-call-safe to main June 21, 2024 20:54

kkysen force-pushed the kkysen/struct-LossLess-bit-array branch from f9b1f96 to 2720efd Compare June 21, 2024 21:04

kkysen added 11 commits June 23, 2024 17:56

enum TxfmSize: Make a real enum.

68b21c9

fn BlockSize::dimensions: Make static dav1d_block_dimensions priv…

47326c0

…ate and access it through this `fn`.

fn get_lo_ctx: match on ctx_offsets (Option) instead of `tx_c…

9d00ae6

…lass` to avoid an `.unwrap()`. This optimizes it enough to have it be inlined.

fn get_lo_ctx: Make stride arg a u8 (necessary to prohibit over…

a5c6209

…flows on multiplication) and pre-slice `levels`.

fn get_lo_ctx: Make x, y args u32s instead of usizes.

7884bb9

fn get_lo_ctx: Make return type u8.

914cc32

fn decode_coefs: Make ctx var a u8.

d82bd8c

fn get_lo_ctx: Make mag var a u32.

ef22dde

fn get_lo_ctx: Pre-index levels to bounds check all at once.

242065b

This moves the first `mag` initialization to inside the `match`, since LLVM fails to optimize out the bounds checks if an identical `match` is done before the `mag` initialization.

struct Lossless: Make Rav1dFrameHeader_segmentation::lossless a b…

2581c0d

…it array.

fn Lossless::from_array: Add a comment explaining why it's not opti…

8db6ed1

…mized as much as possible.

kkysen force-pushed the kkysen/struct-LossLess-bit-array branch from 2720efd to 8db6ed1 Compare June 24, 2024 05:29

rinon added the performance label Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`struct LossLess`: Make `Rav1dFrameHeader_segmentation::lossless` a bit array #1231

`struct LossLess`: Make `Rav1dFrameHeader_segmentation::lossless` a bit array #1231

kkysen commented Jun 20, 2024

randomPoison left a comment

kkysen commented Jun 20, 2024

randomPoison commented Jun 20, 2024

kkysen commented Jun 21, 2024

fbossen commented Jun 24, 2024

struct LossLess: Make Rav1dFrameHeader_segmentation::lossless a bit array #1231

Are you sure you want to change the base?

struct LossLess: Make Rav1dFrameHeader_segmentation::lossless a bit array #1231

Conversation

kkysen commented Jun 20, 2024

randomPoison left a comment

Choose a reason for hiding this comment

kkysen commented Jun 20, 2024

randomPoison commented Jun 20, 2024

kkysen commented Jun 21, 2024

fbossen commented Jun 24, 2024

`struct LossLess`: Make `Rav1dFrameHeader_segmentation::lossless` a bit array #1231

`struct LossLess`: Make `Rav1dFrameHeader_segmentation::lossless` a bit array #1231