Multisampled path rendering #324

raphlinus · 2023-05-18T14:47:27Z

This is a working branch for progress on #270. In its current state, it demonstrates 16x multisampling with good performance, but has a number of rough edges and problems, including artifacts and no stroke rendering.

This is currently done by inelegant code in path_coarse, just to get things working. The idea is to replace it with logic based on line crossings later.

As of this commit, it does line rasterization on the path segments, but none of the prefix sum work to do filling. The solid tile fill is temporarily disabled to help visualize the work-in-progress fill logic.

This was an attempt to save shared memory. It has no effect on performance.

As of this commit, it correctly computes the 16-wide prefix sum of winding number deltas.

This commit adds in the y_edge calculation (using a loop rather than a prefix sum) and backdrop, rendering paths basically correctly. Some robustness artifacts are visible, but they seem to be the same as when clipping to tile was added and the area-based fine rasterizer was used.

This commit implements 8 samples per pixel, a calculation of winding number for each of those samples, and a resolve to an alpha value. However, though most of the mechanism should be in place, there are correctness issues, and the antialiasing displays a number of artifacts. I need to do a deeper analysis why; it's not clear whether there are conceptual issues with edge crossing logic, errors getting those ideas translated to shader code, or some combination. But I'm posting this checkpoint for the curious, and while I work on tools to explore the render logic more deeply.

This gets rid of most artifacts, but numerical stability is not yet perfect. Another round of being very careful with predicates in edge cases is in order.

Do a single atomic bump to process y_edge, followed by a prefix sum resolve. Now that we have a barrier and shared memory traffic, it would probably be better to handle packed_w in a similar fashion, and even more so for 16x MSAA.

First cut at 16x.

The number of segments for counting is likely to be quite a bit smaller than a workgroup size, so dynamically adjust the number of steps in both prefix sum and subsequent binary search. Note that this doesn't work correctly on Metal on wgpu 0.15 due to a naga bug.

Store contiguous segments in their own buffer rather than interleaving with ptcl. Also adds an indirect dispatch command.

Shaders are written, wiring up has been coded, but hasn't been tested yet. In addition, path_count needs to do more aggressive clip to bounding box, though the current state should more or less render correctly. WIP, it renders paths but has artifacts.

They were being suppressed later in the pipeline, but somehow causing interference earlier, which manifested as flickering artifacts.

Avoid storing cubics in intermediate cubic structure.

Artifacts in the top row of tiles in the viewport were caused by not having a valid last_z value, as it was only computed inside the viewport y region.

This commit gets the load_balanced case working (but does not enable it). In testing, it's a slight performance regression (though it might be an improvement for other workloads), so it's likely we won't keep this.

It adds complexity and has no benefit. We need to do more aggressive path culling in path_count, so keeping things simple is a win. This is something we might revisit later. Special-casing one-tile lines is probably a win (and this case can be detected in flattening). Then load balancing *may* be a win for the remaning lines, but that is not clear.

This commit clips the line to the bounding box in the path_count stage, only iterating tiles within the bounding box. As a consequence, the path_tiling stage need not deal with invalid SegmentCount values. The clipping technique is non-obvious. Usually the coordinates are clipped against the edges of the bounding box. However, here the goal is to determine the contiguous range of indices that fall inside the bbox. Care is taken to be numerically robust. A linear solve (division) is used to find an index that is close to the boundary (either i or i+1 is correct). Then the index-to-coordinate logic is used to decide between those two choices. An additional wrinkle is bumping backdrop when all or a portion of the line is to the left of the bounding box. This is done in a separate loop.

This reduces artifacts but doesn't resolve all of them.

This should be correct but has artifacts.

This commit has a draft of all CPU stages needed to fill paths. It is missing clips, other paint types than solid colors, and blends. In addition, all of the code exists only as entry points, with no integration glue to wire into a pipeline. Because the code is untested, no doubt a bunch of bugs exist. Checkpoint commit to prepare for next step, which is beginning to wire the shader stages.

WIP

Trying to wire up CPU shaders, but finding it's a mess. At this checkpoint, it still runs the GPU pipeline. About to do some experimentation which might break things.

This commit wires up CPU and GPU handling of resources, passing the former to the shader function. The main thing missing is deferred upload of buffers so the output of a CPU stage can be used as the input to GPU. Again it does not regress a GPU-only pipeline. There is some refactoring of external resources, as the idea of a transient resource map unifies an external resource and a buffer upload.

As of this commit, a CPU shader can be inserted into the pipeline. This just hard-codes the pathtag_reduce shader; as we get more stages working we'll have a way to switch them in and out (possibly dynamically for A/B comparisons).

A few simple CPU shaders work; flatten doesn't. Checkpoint before starting work on that. The immediate problem is that Clear materializes the buffer (in this case, bump) on the GPU. The best strategy would be to defer the materialization (and thus the clear), as it's hard to know whether it should be CPU or GPU.

The size of LineSoup was underestimated because padding wasn't being taken into account, and also more careful guards were needed to avoid out-of-bounds access.

Working CPU stages through coarse. However, CPU flattening is not meeting accuracy bounds.

This more or less gets the pipeline working, only fine remains.

Correct math for delta bump for left-clipped lines. Make fine run without multisampling again. WIP, want to get flattening problems fixed also.

This commit adds clipping to the CPU shader pipeline, but there are artifacts in the cardioid test scene. Detective work is needed, as these artifacts seem to be present in the GPU pipeline as well (though not necessarily the same). A clue: the artifacts are flickery with coarse on GPU, stable with coarse on CPU.

Basic image rendering is working, with two caveats: * A lot of the (third test) scene is not rendered * There is an assertion failure in path_tiling WIP, investigating

This fixes an immediate problem with horizontal lines. It's not clear yet whether it's a complete solution.

Implement gradients in CPU pipeline. Note: also fix bug in Vec2::dot that affected flattening accuracy.

Derive y_edge from coordinates (relative to tile edge) rather than from edge crossings. This is more correct when a line endpoint is on a tile edge. Does not fix all artifacts, but is an improvement.

The current pipeline only works with fills, not strokes, and unexpected behavior can occur. Strokes are simply disabled in the main encoding pipeline, but were still possible through glyphs. Also adds some asserts, which we might not keep.

Tiles that don't make it through coarse rasterization because of the zero-clip optimization must be fully ignored. Previously the count of such tiles was being misinterpreted as an index for the allocation slice, causing valid data to be overwritten. This patch uses the sign bit to disambiguate the two cases, and explicitly discards tiles with the wrong sign.

Handle various edge cases according to epsilon theory. Also add test SVG file which is useful for hitting those cases. While this handles all known artifacts, there are some rough edges. For one, area antialiasing still exhibits some artifacts. Another issue is that the bounding box culling is now slightly more lenient on the Rust side than the WGSL one (edges aligned to the left edge of the tile are now included). The WGSL path hasn't been exercised as carefully as the Rust one.

raphlinus · 2023-10-16T16:07:43Z

I'm closing this as all important work has now been merged.

raphlinus added 30 commits February 13, 2023 13:06

Contiguous segment storage for fills

c6c24e5

Clip line segments to tile boundaries

af43e2e

This is currently done by inelegant code in path_coarse, just to get things working. The idea is to replace it with logic based on line crossings later.

Start work on multisampled path rendering

1bde89a

As of this commit, it does line rasterization on the path segments, but none of the prefix sum work to do filling. The solid tile fill is temporarily disabled to help visualize the work-in-progress fill logic.

Move line parameter calculation to inner loop

286a289

This was an attempt to save shared memory. It has no effect on performance.

Start work on winding number calculation

1f9b7e2

As of this commit, it correctly computes the 16-wide prefix sum of winding number deltas.

Fix artifacts

d2d4b3f

This gets rid of most artifacts, but numerical stability is not yet perfect. Another round of being very careful with predicates in edge cases is in order.

Do prefix sum of y_edge

2322e63

Do a single atomic bump to process y_edge, followed by a prefix sum resolve. Now that we have a barrier and shared memory traffic, it would probably be better to handle packed_w in a similar fashion, and even more so for 16x MSAA.

MSAA with 16x supersampling

d3183b3

First cut at 16x.

Merge branch 'main' into multi

865da04

Move to separate storage for flat segments

9bda4ac

Store contiguous segments in their own buffer rather than interleaving with ptcl. Also adds an indirect dispatch command.

Suppress encoding of strokes

f1f83f7

They were being suppressed later in the pipeline, but somehow causing interference earlier, which manifested as flickering artifacts.

Fuse pathseg and flatten stages

d6d43cb

Avoid storing cubics in intermediate cubic structure.

Fix for "top row" issue

c63d931

Artifacts in the top row of tiles in the viewport were caused by not having a valid last_z value, as it was only computed inside the viewport y region.

Fix load_balanced case in path_count

277dc09

This commit gets the load_balanced case working (but does not enable it). In testing, it's a slight performance regression (though it might be an improvement for other workloads), so it's likely we won't keep this.

Numerical robustness improvements

6abea54

This reduces artifacts but doesn't resolve all of them.

Checkpoint robust clipping

5e1f11b

This should be correct but has artifacts.

Start wiring up CPU shaders

7843010

WIP

Checkpoint

c51fe12

Trying to wire up CPU shaders, but finding it's a mess. At this checkpoint, it still runs the GPU pipeline. About to do some experimentation which might break things.

Working CPU shader

b1066ea

As of this commit, a CPU shader can be inserted into the pipeline. This just hard-codes the pathtag_reduce shader; as we get more stages working we'll have a way to switch them in and out (possibly dynamically for A/B comparisons).

Working flatten stage

dcb99f8

The size of LineSoup was underestimated because padding wasn't being taken into account, and also more careful guards were needed to avoid out-of-bounds access.

Checkpoint

810e3f2

Working CPU stages through coarse. However, CPU flattening is not meeting accuracy bounds.

raphlinus added 10 commits July 27, 2023 14:44

Checkpoint up to path_tiling

18beb4f

This more or less gets the pipeline working, only fine remains.

Fixes

d21b49f

Correct math for delta bump for left-clipped lines. Make fine run without multisampling again. WIP, want to get flattening problems fixed also.

Image checkpoint

6a72917

Basic image rendering is working, with two caveats: * A lot of the (third test) scene is not rendered * There is an assertion failure in path_tiling WIP, investigating

Possible CPU-side numerical robustness fix

4926637

This fixes an immediate problem with horizontal lines. It's not clear yet whether it's a complete solution.

Gradient checkpoint

7d45a3d

Implement gradients in CPU pipeline. Note: also fix bug in Vec2::dot that affected flattening accuracy.

Fix some text rendering artifacts

68a632e

Derive y_edge from coordinates (relative to tile edge) rather than from edge crossings. This is more correct when a line endpoint is on a tile edge. Does not fix all artifacts, but is an improvement.

Workaround to disable stroked glyphs

0969bac

The current pipeline only works with fills, not strokes, and unexpected behavior can occur. Strokes are simply disabled in the main encoding pipeline, but were still possible through glyphs. Also adds some asserts, which we might not keep.

raphlinus closed this Oct 16, 2023

raphlinus deleted the multi branch November 2, 2023 17:52

DJMcNab mentioned this pull request Oct 3, 2024

Plan for multisampled path rendering #270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multisampled path rendering #324

Multisampled path rendering #324

raphlinus commented May 18, 2023

raphlinus commented Oct 16, 2023

Multisampled path rendering #324

Multisampled path rendering #324

Conversation

raphlinus commented May 18, 2023

raphlinus commented Oct 16, 2023