Off-screen clip mask generation #556

kvark · 2016-11-11T20:24:20Z

This is the second major step towards #498
Edit: now actually includes the third step as well (removal of *_clip shaders).

The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new ClipBatcher.

It supports everything that we currently support, plus the actual handling of arbitrary number of clips.

This change is

frewsxcv · 2016-11-11T20:32:12Z

Cargo.lock

@@ -314,7 +314,7 @@ dependencies = [

 [[package]]
 name = "ipc-channel"
-version = "0.6.0"
+version = "0.5.1"


Seems unfortunate this is downgraded

That's actually not me, it's Glenn - bf3ac6f
not sure how it got in my PR though

There's a bug in ipc-channel 0.6 preventing us from landing the update in Servo, so we had to revert the update in WR (for now).

The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.

kvark · 2016-11-12T02:44:29Z

webrender/src/clip_stack.rs

+    // ResourceCache allocates/load the actual data
+    // will be simplified after the TextureCache upgrade
+    pub image: Option<ImageMask>,
+    pub device_rect: DeviceRect,


One problem that I see with this change is the computation of device_rect. Since layers are supposed to move, this rectangle needs to be re-computed, but there is not tracking of the layers at this point. Is there an example where the layers state is being cached and updated upon any movement? Alternatively, I can re-compute the rectangle every frame.

Doesn't this get re-computed every frame anyway, since the generate() call is done during cull_layers()?

generate() is currently only done if metadata.clip_cache_info.is_none()

glennw

@kvark Not a full review yet, just some nits from a quick scan. I'll take a more in-depth look today. The general idea seems great though :)

glennw · 2016-11-14T00:23:50Z

Cargo.lock

@@ -314,7 +314,7 @@ dependencies = [

 [[package]]
 name = "ipc-channel"
-version = "0.6.0"
+version = "0.5.1"


The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.

glennw · 2016-11-14T00:24:08Z

webrender/Cargo.toml

@@ -18,7 +18,7 @@ byteorder = "0.5"
 euclid = "0.10"
 fnv="1.0"
 gleam = "0.2"
-ipc-channel = "0.6"
+ipc-channel = "0.5"


glennw · 2016-11-14T00:24:40Z

webrender/res/clip_shared.glsl

+    int render_task_index;
+    int layer_index;
+    int data_index;
+    int pad;


The pad field can be removed since this is a VS only structure.

glennw · 2016-11-14T00:28:37Z

webrender/res/cs_clip_clear.fs.glsl

+ * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
+
+void main(void) {
+    oFragColor = vec4(1, 1, 1, 1);


1 -> 1.0 for strict GLESv3 compilers.

glennw · 2016-11-14T00:30:37Z

webrender/src/clip_stack.rs

+#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)]
+pub struct ClipAddressRange {
+    pub start: GpuStoreAddress, // start GPU address
+    pub count: u32, // number of items, not bytes


Perhaps rename to item_count?

glennw · 2016-11-14T00:33:29Z

webrender/src/render_backend.rs

@@ -36,7 +36,7 @@ pub struct RenderBackend {
    next_namespace_id: IdNamespace,

    resource_cache: ResourceCache,
-    dummy_resources: DummyResources,
+    _dummy_resources: DummyResources,


Can this be removed completely?

glennw · 2016-11-14T00:34:09Z

webrender/src/renderer.rs

                                &projection);
        }

+        // Draw the clip items into the tiled alpha mask.
+        if true {


Remove condition

glennw · 2016-11-14T00:35:56Z

webrender/src/tiling.rs

@@ -567,11 +645,14 @@ struct CompileTileContext<'a> {
 struct RenderTargetContext<'a> {
    layer_store: &'a [StackingContext],
    prim_store: &'a PrimitiveStore,
+    resource_cache: &'a ResourceCache,
+    _clip_stack: &'a ClipRegionStack,


removed now

glennw · 2016-11-14T00:46:21Z

@kvark What was the reason for having to cover the entire tile with each clip primitive again? I think we'll probably need to find a solution that doesn't require this - I think the performance overhead will probably be too high otherwise. I'll pull the branch today and do some performance tests though to check that.

kvark · 2016-11-14T01:44:58Z

The name became slightly misleading. The clip tile is not the tile a clip belongs to, it's actually just representing the screen space boundary of thr intersection between all clip instances, so the performance is not expected to suffer.

On Nov 13, 2016, at 19:46, Glenn Watson notifications@github.com wrote:

@kvark What was the reason for having to cover the entire tile with each clip primitive again? I think we'll probably need to find a solution that doesn't require this - I think the performance overhead will probably be too high otherwise. I'll pull the branch today and do some performance tests though to check that.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

glennw · 2016-11-14T01:50:33Z

webrender/res/clip_shared.glsl

+    return cci;
+}
+
+// The transformed vertex function that always covers the whole tile with the primitive


Let's specify that the whole tile means the intersection of the clip instances here.

reworked the whole tile confusion, at last

glennw · 2016-11-14T01:55:51Z

Doing some testing with this today, and ran into something which I assume is a bug - unless I'm misunderstanding something:

I have a test case that contains a single div, with the follow style:

  position: absolute;
  top: 100px;
  left: 100px;
  width: 100px;
  height: 100px;
  background: red;
  border-radius: 32px;

With some logging to show what the renderer is submitting for clip cache draws, I see:

draw_target Some((TextureId { name: 1033, target: 35866 }, 0))
clear [CacheClipInstance { task_id: 12, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }, 
CacheClipInstance { task_id: 13, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }]
rect [CacheClipInstance { task_id: 13, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }]

draw_target None
clear [CacheClipInstance { task_id: 14, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }]

Which suggests that it's issuing two clear draw calls for the cache render, and also a clear on the main render target. I was expecting to see 1 clear + 1 rect in the render target, and no clip items in the main frame buffer target.

(I ensured the div is positioned completely inside one tile, just to rule out any bugs related to crossing tile boundaries).

glennw · 2016-11-14T02:03:39Z

Ah, those extra clears are related to the dummy mask item - I guess that always gets created at the moment, even if not needed?

kvark · 2016-11-14T02:13:05Z

Yeah, and it's also incomplete/buggy. I've been looking into fixing it, hopefully done by tomorrow. Note: it should be just one draw, I'll have a closer look tomorrow about the rest.

On Nov 13, 2016, at 21:03, Glenn Watson notifications@github.com wrote:

Ah, those extra clears are related to the dummy mask item - I guess that always gets created at the moment, even if not needed?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

glennw

This looks promising! Added a few comments, feel free to reply here or perhaps they might be easier to discuss on IRC / Vidyo during the week.

glennw · 2016-11-14T03:02:16Z

webrender/res/prim_shared.glsl

@@ -49,7 +52,6 @@ uniform sampler2DArray sCache;
 uniform sampler2D sLayers;
 uniform sampler2D sRenderTasks;
 uniform sampler2D sPrimGeometry;
-uniform sampler2D sClips;


We could also remove Clips from the TextureSampler enum, and associated code.

glennw · 2016-11-14T03:03:30Z

webrender/res/prim_shared.glsl

+
+float do_clip() {
+    // anything outside of the mask is considered transparent
+    bvec4 inside = lessThanEqual(


It's unfortunate to add the cost of this conditional to every shader - is it actually needed? Shouldn't we always be able to just sample from the clip mask?

We'd have to clamp to UV borders, and even then - we don't currently guarantee that those borders contain opaque data. So - possible, but non-trivial.
I don't see it as a big deal though, just a few comparisons with masking the fetch result.

glennw · 2016-11-14T03:04:02Z

webrender/res/cs_clip_clear.vs.glsl

+    vec2 final_pos = tile.screen_origin_task_origin.zw +
+                     tile.size_target_index.xy * aPosition.xy;
+
+    gl_Position = uTransform * vec4(final_pos, 0, 1);


Use float constants

glennw · 2016-11-14T03:04:11Z

webrender/res/cs_clip_image.fs.glsl

+    vec2 source_uv = clamped_mask_uv * vClipMaskUvRect.zw + vClipMaskUvRect.xy;
+    float clip_alpha = texture(sMask, source_uv).r; //careful: texture has type A8
+
+    oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha));


Use float constants

glennw · 2016-11-14T03:04:30Z

webrender/res/cs_clip_rectangle.fs.glsl

+
+    float clip_alpha = rounded_rect(local_pos);
+
+    oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha));


Use float constants

glennw · 2016-11-14T03:04:53Z

webrender/res/ps_gradient_clip.fs.glsl

@@ -11,6 +11,7 @@ void main(void) {
    vec2 local_pos = vPos;
 #endif

-    alpha = min(alpha, do_clip(local_pos));
+    //alpha = min(alpha, do_clip(local_pos));
+    alpha = min(alpha, do_clip());
    oFragColor = vColor * vec4(1, 1, 1, alpha);


Use float constants

glennw · 2016-11-14T03:07:02Z

webrender/src/clip_stack.rs

+}
+
+pub struct ClipRegionStack {
+    layers: HashMap<StackingContextIndex, LayerInfo>,


It would be good if we could get rid of this layers hash map - the LayerInfo struct doesn't really seem to store anything that's not already stored in the main Layer Store. Could we instead just pass in a borrow of the Layer Store when it's needed in generate(), including the current layer id? That way, we guarantee that there is one source of truth for the stacking context transform, that includes scroll offsets etc.

glennw · 2016-11-14T03:07:31Z

webrender/src/clip_stack.rs

+
+pub struct ClipRegionStack {
+    layers: HashMap<StackingContextIndex, LayerInfo>,
+    image_masks: Vec<ImageMask>,


image_masks seems to be unused - remove?

glennw · 2016-11-14T03:10:41Z

webrender/src/gpu_store.rs

@@ -5,7 +5,7 @@
 use renderer::MAX_VERTEX_TEXTURE_WIDTH;
 use std::mem;

-#[derive(Debug, Copy, Clone)]
+#[derive(Debug, Copy, Clone, Eq, Hash, PartialEq)]


I'm unsure about this - maybe it's necessary but it just doesn't seem quite right. Perhaps in the cache key we could store the ImageKey instead of the actual GpuStoreAddress?

we may be able to store ImageKey instead of the gpu address in the cache key, but there is also ClipAddressRange in there that uses GpuStoreAddress

glennw · 2016-11-14T03:12:28Z

webrender/src/clip_stack.rs

+        let image = match source {
+            &PrimitiveClipSource::NoClip => None,
+            &PrimitiveClipSource::Complex(rect, radius) => {
+                let address = clip_store.alloc(CLIP_DATA_GPU_SIZE);


Ideally we don't want to be calling alloc() on gpu stores during generate. The reason is that the gpu stores are not re-created during scrolling, so this will end up growing the gpu stores during scrolling. What most of the primitives do is a gpu store alloc when the prim is first added (if needed), and then just get a (mutable) reference to that address when they need to update that data. I think that would work OK in this case?

Ok, so this appears to be complicating things a bit (implementing now). The old generate logic is going to be split into 3 parts, starting with the GPU store allocation (part 1), and proceeding to actually filling the GPU stores during the primitives generation, if needed (part 2), and followed by the update of the device rectangle (part 3 - again, if needed).

bors-servo · 2016-11-14T08:02:57Z

☔ The latest upstream changes (presumably #552) made this pull request unmergeable. Please resolve the merge conflicts.

kvark · 2016-11-14T19:04:45Z

@glennw thank you for the fantastic review!
I've addressed most of your concerns (and hope to chat about the rest few). The ClipRegionStack is no more. You can see the code is much more even now (+889 −676 loc), and layers scrolling should be properly supported (running the tests ATM, unsure if it's covered though).
The shader side of things is also clear now - with the distinct ClipArea struct used instead of piggy-backing on the Tile. The dummy task renders nothing, is only added on the first opaque primitive for the tile, and works correctly.

kvark · 2016-11-14T21:38:46Z

Fixed the dummy task now some more and confirmed the mozilla test suite passes for servo.

bors-servo · 2016-11-14T22:44:08Z

☔ The latest upstream changes (presumably #554) made this pull request unmergeable. Please resolve the merge conflicts.

glennw · 2016-11-14T23:00:25Z

@kvark Thanks! I'll take another look at this today.

I did some testing, and in general the performance seems quite good. A good page to do some testing with is https://github.com/servo/servo - the performance goes off a cliff as you scroll around halfway down the page. The GPU profiler shows that it is spending the majority of its time generating clip masks, so that would be worth investigating. There also seems to be some clipping artifacts when scrolling on some of the buttons.

It's possible those slowdowns / artifacts are unrelated to this change, but I haven't noticed them previously.

glennw

Left a couple of minor comments. Let's look into the GH performance issue too and find out what is causing that to be slow. Then we can either add a follow up task to fix that, or fix it now if simple. Then this should be ready to merge.

glennw · 2016-11-15T04:41:31Z

webrender/src/tiling.rs

@@ -156,8 +157,13 @@ impl AlphaBatchHelpers for PrimitiveStore {
            PrimitiveKind::TextRun |
            PrimitiveKind::Image |
            PrimitiveKind::Gradient |
-            PrimitiveKind::BoxShadow => true,
-
+            PrimitiveKind::BoxShadow => {


Let's do this test outside the match statement, so it handles borders too.

glennw · 2016-11-15T04:43:03Z

webrender/src/tiling.rs

+                        if let Some(mask_task) = RenderTask::new_mask(self.rect, clip_info) {
+                            current_task.children.push(mask_task);
+                        } else {
+                            // The primitive has clip items but their intersection is empty


Ideally we'd catch this earlier, and not even mark the primitive as visible in this case. But we can do that as a follow up (let's just add a TODO comment here about it).

kvark · 2016-11-15T16:06:17Z

@glennw I checked the servo page and haven't found anything outstanding in terms of what the implementation is doing. There is quite a large of an area associated with the clips (~80% screen space), so that's where the slowdown can come from. Frankly, I do not observe the "going off the cliff" effect, the FPS drops from 55 to 51 when scrolled to the middle of the document.

Given that the rounded corner radius is typically about 2-3px, up to 10, and the rectangles are pretty big, I can think of a way to drastically reduce the time spent on clip generation:

make sure the target is cleared to 1.0 (opaque). This may require a separate target from the general RT cache since that one is cleared (intentionally) to 0.0.
draw only 4 corners instead of the whole thing.

I'm going to rebase and resolve the last concerns.

…tiple rounded cornered rectangles. Removal of the _clip shader variants.

glennw · 2016-11-16T00:12:21Z

@bors-servo r+

bors-servo · 2016-11-16T00:12:22Z

📌 Commit 3967c7d has been approved by glennw

bors-servo · 2016-11-16T00:12:24Z

⚡ Test exempted - status

Off-screen clip mask generation This is the second major step towards #498 Edit: now actually includes the third step as well (removal of `*_clip` shaders). The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new `ClipBatcher`. It supports everything that we currently support, plus the actual handling of arbitrary number of clips.  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/556)

kvark mentioned this pull request Nov 11, 2016

Support for transformed clip regions #498

Closed

4 tasks

frewsxcv reviewed Nov 11, 2016

View reviewed changes

kvark commented Nov 12, 2016

View reviewed changes

glennw requested changes Nov 14, 2016

View reviewed changes

glennw reviewed Nov 14, 2016

View reviewed changes

kvark force-pushed the clip_stack branch from 22e4212 to 9b9918c Compare November 14, 2016 19:00

kvark force-pushed the clip_stack branch from dbc575d to ec63c19 Compare November 14, 2016 21:56

glennw reviewed Nov 15, 2016

View reviewed changes

kvark force-pushed the clip_stack branch 2 times, most recently from 9f1255e to 2942ca4 Compare November 15, 2016 19:31

Mask generation in the off-screen cache textures with support for mul…

3967c7d

…tiple rounded cornered rectangles. Removal of the _clip shader variants.

kvark force-pushed the clip_stack branch from 2942ca4 to 3967c7d Compare November 15, 2016 19:51

glennw approved these changes Nov 16, 2016

View reviewed changes

bors-servo merged commit 3967c7d into servo:master Nov 16, 2016

bors-servo mentioned this pull request Nov 16, 2016

Add an API for providing external images. #561

Merged


		float clip_alpha = rounded_rect(local_pos);

		oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha));

Off-screen clip mask generation #556

Off-screen clip mask generation #556

Conversation

kvark commented Nov 11, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glennw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glennw commented Nov 14, 2016

kvark commented Nov 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glennw commented Nov 14, 2016 • edited Loading

glennw commented Nov 14, 2016

kvark commented Nov 14, 2016

glennw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors-servo commented Nov 14, 2016

kvark commented Nov 14, 2016

kvark commented Nov 14, 2016

bors-servo commented Nov 14, 2016

glennw commented Nov 14, 2016

glennw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kvark commented Nov 15, 2016

glennw commented Nov 16, 2016

bors-servo commented Nov 16, 2016

bors-servo commented Nov 16, 2016

kvark commented Nov 11, 2016 •

edited

Loading

glennw commented Nov 14, 2016 •

edited

Loading