Reduce cost of bounds checks in transforms #43

kornelski · 2025-02-11T03:00:38Z

According to llvm-mca estimate, the 4x3 transform function

before:

uOps Per Cycle: 2.70
IPC: 1.97
Block RThroughput: 159.3

after:

uOps Per Cycle: 3.48
IPC: 2.69
Block RThroughput: 279.8

src/chain.rs

jrmuizel · 2025-02-12T20:30:01Z

src/chain.rs

@@ -159,7 +159,7 @@ impl ModularTransform for XYZtoLAB {

 struct ClutOnly {
    clut: Box<[f32]>,
-    grid_size: u16,
+    grid_size: u8,


Was the motivation for this? Avoiding the casts?

That's the natural size for it, as read from the input.

I hoped it would also explain to LLVM that grid_size.pow(4) can't overflow, but unfortunately that didn't have any effect.

jrmuizel · 2025-02-13T01:48:15Z

Is this code showing up in profiles for you? It should only be used when building a lookup table during transform creation. The actual color transformation should be using a fast path.

jrmuizel · 2025-02-13T01:53:54Z

The first 3 commits are queued for landing. https://bugzilla.mozilla.org/show_bug.cgi?id=1947889

The other 3 seem ok but I'd like to understand the motivation a little better before landing them.

chunks_exact(3).nth() has a fast path, and needs 1 check for 3 pixels

kornelski · 2025-02-15T02:22:04Z

Yes, on small images (<640px) the time to build the lookup table takes more time than the transformation itself. Reduction in bounds checks also made the code a bit smaller.

jrmuizel · 2025-02-18T03:56:35Z

I've landed the grid size patch. The other two commits cause:

---- gtest::gtest::v4_output stdout ----
thread 'gtest::gtest::v4_output' panicked at src/chain.rs:1109:5:
assertion failed: false

when running cargo test --all-features

jrmuizel · 2025-02-18T03:58:03Z

I'll make it so that it runs tests in github actions when I get a chance.

awxkee reviewed Feb 11, 2025

View reviewed changes

src/chain.rs Outdated Show resolved Hide resolved

kornelski force-pushed the clut-opt branch 4 times, most recently from b8d408f to 7585ddd Compare February 11, 2025 13:09

jrmuizel reviewed Feb 12, 2025

View reviewed changes

kornelski added 3 commits February 14, 2025 23:06

Keep grid_size range small

7281fb6

Reduce bounds checks in CLUTs

82dd8dd

chunks_exact(3).nth() has a fast path, and needs 1 check for 3 pixels

Avoid panicking bounds check in lut_interp_linear_float

d5661ec

kornelski force-pushed the clut-opt branch from 7585ddd to d5661ec Compare February 14, 2025 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce cost of bounds checks in transforms #43

Reduce cost of bounds checks in transforms #43

kornelski commented Feb 11, 2025 •

edited

Loading

jrmuizel Feb 12, 2025 •

edited

Loading

kornelski Feb 15, 2025

jrmuizel commented Feb 13, 2025

jrmuizel commented Feb 13, 2025

kornelski commented Feb 15, 2025

jrmuizel commented Feb 18, 2025

jrmuizel commented Feb 18, 2025

Reduce cost of bounds checks in transforms #43

Are you sure you want to change the base?

Reduce cost of bounds checks in transforms #43

Conversation

kornelski commented Feb 11, 2025 • edited Loading

jrmuizel Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

kornelski Feb 15, 2025

Choose a reason for hiding this comment

jrmuizel commented Feb 13, 2025

jrmuizel commented Feb 13, 2025

kornelski commented Feb 15, 2025

jrmuizel commented Feb 18, 2025

jrmuizel commented Feb 18, 2025

kornelski commented Feb 11, 2025 •

edited

Loading

jrmuizel Feb 12, 2025 •

edited

Loading