Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multisampled path rendering #324

Closed
wants to merge 40 commits into from
Closed

Multisampled path rendering #324

wants to merge 40 commits into from

Conversation

raphlinus
Copy link
Contributor

This is a working branch for progress on #270. In its current state, it demonstrates 16x multisampling with good performance, but has a number of rough edges and problems, including artifacts and no stroke rendering.

This is currently done by inelegant code in path_coarse, just to get things working. The idea is to replace it with logic based on line crossings later.
As of this commit, it does line rasterization on the path segments, but none of the prefix sum work to do filling.

The solid tile fill is temporarily disabled to help visualize the work-in-progress fill logic.
This was an attempt to save shared memory. It has no effect on performance.
As of this commit, it correctly computes the 16-wide prefix sum of winding number deltas.
This commit adds in the y_edge calculation (using a loop rather than a prefix sum) and backdrop, rendering paths basically correctly. Some robustness artifacts are visible, but they seem to be the same as when clipping to tile was added and the area-based fine rasterizer was used.
This commit implements 8 samples per pixel, a calculation of winding number for each of those samples, and a resolve to an alpha value. However, though most of the mechanism should be in place, there are correctness issues, and the antialiasing displays a number of artifacts. I need to do a deeper analysis why; it's not clear whether there are conceptual issues with edge crossing logic, errors getting those ideas translated to shader code, or some combination.

But I'm posting this checkpoint for the curious, and while I work on tools to explore the render logic more deeply.
This gets rid of most artifacts, but numerical stability is not yet perfect. Another round of being very careful with predicates in edge cases is in order.
Do a single atomic bump to process y_edge, followed by a prefix sum resolve.

Now that we have a barrier and shared memory traffic, it would probably be better to handle packed_w in a similar fashion, and even more so for 16x MSAA.
First cut at 16x.
The number of segments for counting is likely to be quite a bit smaller than a workgroup size, so dynamically adjust the number of steps in both prefix sum and subsequent binary search.

Note that this doesn't work correctly on Metal on wgpu 0.15 due to a naga bug.
Store contiguous segments in their own buffer rather than interleaving with ptcl.

Also adds an indirect dispatch command.
Shaders are written, wiring up has been coded, but hasn't been tested yet.

In addition, path_count needs to do more aggressive clip to bounding box, though the current state should more or less render correctly.

WIP, it renders paths but has artifacts.
They were being suppressed later in the pipeline, but somehow causing interference earlier, which manifested as flickering artifacts.
Avoid storing cubics in intermediate cubic structure.
Artifacts in the top row of tiles in the viewport were caused by not having a valid last_z value, as it was only computed inside the viewport y region.
This commit gets the load_balanced case working (but does not enable it). In testing, it's a slight performance regression (though it might be an improvement for other workloads), so it's likely we won't keep this.
It adds complexity and has no benefit. We need to do more aggressive path culling in path_count, so keeping things simple is a win.

This is something we might revisit later. Special-casing one-tile lines is probably a win (and this case can be detected in flattening). Then load balancing *may* be a win for the remaning lines, but that is not clear.
This commit clips the line to the bounding box in the path_count stage, only iterating tiles within the bounding box.

As a consequence, the path_tiling stage need not deal with invalid SegmentCount values.

The clipping technique is non-obvious. Usually the coordinates are clipped against the edges of the bounding box. However, here the goal is to determine the contiguous range of indices that fall inside the bbox.

Care is taken to be numerically robust. A linear solve (division) is used to find an index that is close to the boundary (either i or i+1 is correct). Then the index-to-coordinate logic is used to decide between those two choices.

An additional wrinkle is bumping backdrop when all or a portion of the line is to the left of the bounding box. This is done in a separate loop.
This reduces artifacts but doesn't resolve all of them.
This should be correct but has artifacts.
This commit has a draft of all CPU stages needed to fill paths. It is missing clips, other paint types than solid colors, and blends. In addition, all of the code exists only as entry points, with no integration glue to wire into a pipeline.

Because the code is untested, no doubt a bunch of bugs exist. Checkpoint commit to prepare for next step, which is beginning to wire the shader stages.
Trying to wire up CPU shaders, but finding it's a mess. At this checkpoint, it still runs the GPU pipeline. About to do some experimentation which might break things.
This commit wires up CPU and GPU handling of resources, passing the former to the shader function. The main thing missing is deferred upload of buffers so the output of a CPU stage can be used as the input to GPU.

Again it does not regress a GPU-only pipeline. There is some refactoring of external resources, as the idea of a transient resource map unifies an external resource and a buffer upload.
As of this commit, a CPU shader can be inserted into the pipeline. This just hard-codes the pathtag_reduce shader; as we get more stages working we'll have a way to switch them in and out (possibly dynamically for A/B comparisons).
A few simple CPU shaders work; flatten doesn't. Checkpoint before starting work on that.

The immediate problem is that Clear materializes the buffer (in this case, bump) on the GPU. The best strategy would be to defer the materialization (and thus the clear), as it's hard to know whether it should be CPU or GPU.
The size of LineSoup was underestimated because padding wasn't being taken into account, and also more careful guards were needed to avoid out-of-bounds access.
Working CPU stages through coarse. However, CPU flattening is not meeting accuracy bounds.
raphlinus added 10 commits July 27, 2023 14:44
This more or less gets the pipeline working, only fine remains.
Correct math for delta bump for left-clipped lines.

Make fine run without multisampling again.

WIP, want to get flattening problems fixed also.
This commit adds clipping to the CPU shader pipeline, but there are artifacts in the cardioid test scene.

Detective work is needed, as these artifacts seem to be present in the GPU pipeline as well (though not necessarily the same). A clue: the artifacts are flickery with coarse on GPU, stable with coarse on CPU.
Basic image rendering is working, with two caveats:

* A lot of the (third test) scene is not rendered

* There is an assertion failure in path_tiling

WIP, investigating
This fixes an immediate problem with horizontal lines. It's not clear yet whether it's a complete solution.
Implement gradients in CPU pipeline. Note: also fix bug in Vec2::dot that affected flattening accuracy.
Derive y_edge from coordinates (relative to tile edge) rather than from edge crossings. This is more correct when a line endpoint is on a tile edge.

Does not fix all artifacts, but is an improvement.
The current pipeline only works with fills, not strokes, and unexpected behavior can occur. Strokes are simply disabled in the main encoding pipeline, but were still possible through glyphs.

Also adds some asserts, which we might not keep.
Tiles that don't make it through coarse rasterization because of the zero-clip optimization must be fully ignored. Previously the count of such tiles was being misinterpreted as an index for the allocation slice, causing valid data to be overwritten.

This patch uses the sign bit to disambiguate the two cases, and explicitly discards tiles with the wrong sign.
Handle various edge cases according to epsilon theory. Also add test SVG file which is useful for hitting those cases.

While this handles all known artifacts, there are some rough edges. For one, area antialiasing still exhibits some artifacts. Another issue is that the bounding box culling is now slightly more lenient on the Rust side than the WGSL one (edges aligned to the left edge of the tile are now included). The WGSL path hasn't been exercised as carefully as the Rust one.
@raphlinus
Copy link
Contributor Author

I'm closing this as all important work has now been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant