Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-screen clip mask generation #556

Merged
merged 1 commit into from
Nov 16, 2016
Merged

Conversation

kvark
Copy link
Member

@kvark kvark commented Nov 11, 2016

This is the second major step towards #498
Edit: now actually includes the third step as well (removal of *_clip shaders).

The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new ClipBatcher.

It supports everything that we currently support, plus the actual handling of arbitrary number of clips.


This change is Reviewable

@kvark kvark mentioned this pull request Nov 11, 2016
4 tasks
@@ -314,7 +314,7 @@ dependencies = [

[[package]]
name = "ipc-channel"
version = "0.6.0"
version = "0.5.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unfortunate this is downgraded

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually not me, it's Glenn - bf3ac6f
not sure how it got in my PR though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug in ipc-channel 0.6 preventing us from landing the update in Servo, so we had to revert the update in WR (for now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.

// ResourceCache allocates/load the actual data
// will be simplified after the TextureCache upgrade
pub image: Option<ImageMask>,
pub device_rect: DeviceRect,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem that I see with this change is the computation of device_rect. Since layers are supposed to move, this rectangle needs to be re-computed, but there is not tracking of the layers at this point. Is there an example where the layers state is being cached and updated upon any movement? Alternatively, I can re-compute the rectangle every frame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this get re-computed every frame anyway, since the generate() call is done during cull_layers()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate() is currently only done if metadata.clip_cache_info.is_none()

Copy link
Member

@glennw glennw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kvark Not a full review yet, just some nits from a quick scan. I'll take a more in-depth look today. The general idea seems great though :)

@@ -314,7 +314,7 @@ dependencies = [

[[package]]
name = "ipc-channel"
version = "0.6.0"
version = "0.5.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.

@@ -18,7 +18,7 @@ byteorder = "0.5"
euclid = "0.10"
fnv="1.0"
gleam = "0.2"
ipc-channel = "0.6"
ipc-channel = "0.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

int render_task_index;
int layer_index;
int data_index;
int pad;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pad field can be removed since this is a VS only structure.

* file, You can obtain one at http://mozilla.org/MPL/2.0/. */

void main(void) {
oFragColor = vec4(1, 1, 1, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 -> 1.0 for strict GLESv3 compilers.

#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)]
pub struct ClipAddressRange {
pub start: GpuStoreAddress, // start GPU address
pub count: u32, // number of items, not bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename to item_count?

@@ -36,7 +36,7 @@ pub struct RenderBackend {
next_namespace_id: IdNamespace,

resource_cache: ResourceCache,
dummy_resources: DummyResources,
_dummy_resources: DummyResources,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed completely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

&projection);
}

// Draw the clip items into the tiled alpha mask.
if true {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, ok :)

@@ -567,11 +645,14 @@ struct CompileTileContext<'a> {
struct RenderTargetContext<'a> {
layer_store: &'a [StackingContext],
prim_store: &'a PrimitiveStore,
resource_cache: &'a ResourceCache,
_clip_stack: &'a ClipRegionStack,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed now

@glennw
Copy link
Member

glennw commented Nov 14, 2016

@kvark What was the reason for having to cover the entire tile with each clip primitive again? I think we'll probably need to find a solution that doesn't require this - I think the performance overhead will probably be too high otherwise. I'll pull the branch today and do some performance tests though to check that.

@kvark
Copy link
Member Author

kvark commented Nov 14, 2016

The name became slightly misleading. The clip tile is not the tile a clip belongs to, it's actually just representing the screen space boundary of thr intersection between all clip instances, so the performance is not expected to suffer.

On Nov 13, 2016, at 19:46, Glenn Watson notifications@github.com wrote:

@kvark What was the reason for having to cover the entire tile with each clip primitive again? I think we'll probably need to find a solution that doesn't require this - I think the performance overhead will probably be too high otherwise. I'll pull the branch today and do some performance tests though to check that.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

return cci;
}

// The transformed vertex function that always covers the whole tile with the primitive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's specify that the whole tile means the intersection of the clip instances here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworked the whole tile confusion, at last

@glennw
Copy link
Member

glennw commented Nov 14, 2016

Doing some testing with this today, and ran into something which I assume is a bug - unless I'm misunderstanding something:

I have a test case that contains a single div, with the follow style:

  position: absolute;
  top: 100px;
  left: 100px;
  width: 100px;
  height: 100px;
  background: red;
  border-radius: 32px;

With some logging to show what the renderer is submitting for clip cache draws, I see:

draw_target Some((TextureId { name: 1033, target: 35866 }, 0))
clear [CacheClipInstance { task_id: 12, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }, 
CacheClipInstance { task_id: 13, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }]
rect [CacheClipInstance { task_id: 13, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }]

draw_target None
clear [CacheClipInstance { task_id: 14, layer_index: 0, address: GpuStoreAddress(0), pad: 0 }]

Which suggests that it's issuing two clear draw calls for the cache render, and also a clear on the main render target. I was expecting to see 1 clear + 1 rect in the render target, and no clip items in the main frame buffer target.

(I ensured the div is positioned completely inside one tile, just to rule out any bugs related to crossing tile boundaries).

@glennw
Copy link
Member

glennw commented Nov 14, 2016

Ah, those extra clears are related to the dummy mask item - I guess that always gets created at the moment, even if not needed?

@kvark
Copy link
Member Author

kvark commented Nov 14, 2016

Yeah, and it's also incomplete/buggy. I've been looking into fixing it, hopefully done by tomorrow. Note: it should be just one draw, I'll have a closer look tomorrow about the rest.

On Nov 13, 2016, at 21:03, Glenn Watson notifications@github.com wrote:

Ah, those extra clears are related to the dummy mask item - I guess that always gets created at the moment, even if not needed?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Copy link
Member

@glennw glennw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising! Added a few comments, feel free to reply here or perhaps they might be easier to discuss on IRC / Vidyo during the week.

@@ -49,7 +52,6 @@ uniform sampler2DArray sCache;
uniform sampler2D sLayers;
uniform sampler2D sRenderTasks;
uniform sampler2D sPrimGeometry;
uniform sampler2D sClips;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also remove Clips from the TextureSampler enum, and associated code.


float do_clip() {
// anything outside of the mask is considered transparent
bvec4 inside = lessThanEqual(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate to add the cost of this conditional to every shader - is it actually needed? Shouldn't we always be able to just sample from the clip mask?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to clamp to UV borders, and even then - we don't currently guarantee that those borders contain opaque data. So - possible, but non-trivial.
I don't see it as a big deal though, just a few comparisons with masking the fetch result.

vec2 final_pos = tile.screen_origin_task_origin.zw +
tile.size_target_index.xy * aPosition.xy;

gl_Position = uTransform * vec4(final_pos, 0, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use float constants

vec2 source_uv = clamped_mask_uv * vClipMaskUvRect.zw + vClipMaskUvRect.xy;
float clip_alpha = texture(sMask, source_uv).r; //careful: texture has type A8

oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use float constants


float clip_alpha = rounded_rect(local_pos);

oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use float constants

@@ -11,6 +11,7 @@ void main(void) {
vec2 local_pos = vPos;
#endif

alpha = min(alpha, do_clip(local_pos));
//alpha = min(alpha, do_clip(local_pos));
alpha = min(alpha, do_clip());
oFragColor = vColor * vec4(1, 1, 1, alpha);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use float constants

}

pub struct ClipRegionStack {
layers: HashMap<StackingContextIndex, LayerInfo>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we could get rid of this layers hash map - the LayerInfo struct doesn't really seem to store anything that's not already stored in the main Layer Store. Could we instead just pass in a borrow of the Layer Store when it's needed in generate(), including the current layer id? That way, we guarantee that there is one source of truth for the stacking context transform, that includes scroll offsets etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


pub struct ClipRegionStack {
layers: HashMap<StackingContextIndex, LayerInfo>,
image_masks: Vec<ImageMask>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_masks seems to be unused - remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -5,7 +5,7 @@
use renderer::MAX_VERTEX_TEXTURE_WIDTH;
use std::mem;

#[derive(Debug, Copy, Clone)]
#[derive(Debug, Copy, Clone, Eq, Hash, PartialEq)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this - maybe it's necessary but it just doesn't seem quite right. Perhaps in the cache key we could store the ImageKey instead of the actual GpuStoreAddress?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may be able to store ImageKey instead of the gpu address in the cache key, but there is also ClipAddressRange in there that uses GpuStoreAddress

let image = match source {
&PrimitiveClipSource::NoClip => None,
&PrimitiveClipSource::Complex(rect, radius) => {
let address = clip_store.alloc(CLIP_DATA_GPU_SIZE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we don't want to be calling alloc() on gpu stores during generate. The reason is that the gpu stores are not re-created during scrolling, so this will end up growing the gpu stores during scrolling. What most of the primitives do is a gpu store alloc when the prim is first added (if needed), and then just get a (mutable) reference to that address when they need to update that data. I think that would work OK in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this appears to be complicating things a bit (implementing now). The old generate logic is going to be split into 3 parts, starting with the GPU store allocation (part 1), and proceeding to actually filling the GPU stores during the primitives generation, if needed (part 2), and followed by the update of the device rectangle (part 3 - again, if needed).

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #552) made this pull request unmergeable. Please resolve the merge conflicts.

@kvark
Copy link
Member Author

kvark commented Nov 14, 2016

@glennw thank you for the fantastic review!
I've addressed most of your concerns (and hope to chat about the rest few). The ClipRegionStack is no more. You can see the code is much more even now (+889 −676 loc), and layers scrolling should be properly supported (running the tests ATM, unsure if it's covered though).
The shader side of things is also clear now - with the distinct ClipArea struct used instead of piggy-backing on the Tile. The dummy task renders nothing, is only added on the first opaque primitive for the tile, and works correctly.

@kvark
Copy link
Member Author

kvark commented Nov 14, 2016

Fixed the dummy task now some more and confirmed the mozilla test suite passes for servo.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #554) made this pull request unmergeable. Please resolve the merge conflicts.

@glennw
Copy link
Member

glennw commented Nov 14, 2016

@kvark Thanks! I'll take another look at this today.

I did some testing, and in general the performance seems quite good. A good page to do some testing with is https://github.com/servo/servo - the performance goes off a cliff as you scroll around halfway down the page. The GPU profiler shows that it is spending the majority of its time generating clip masks, so that would be worth investigating. There also seems to be some clipping artifacts when scrolling on some of the buttons.

It's possible those slowdowns / artifacts are unrelated to this change, but I haven't noticed them previously.

Copy link
Member

@glennw glennw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of minor comments. Let's look into the GH performance issue too and find out what is causing that to be slow. Then we can either add a follow up task to fix that, or fix it now if simple. Then this should be ready to merge.

@@ -156,8 +157,13 @@ impl AlphaBatchHelpers for PrimitiveStore {
PrimitiveKind::TextRun |
PrimitiveKind::Image |
PrimitiveKind::Gradient |
PrimitiveKind::BoxShadow => true,

PrimitiveKind::BoxShadow => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this test outside the match statement, so it handles borders too.

if let Some(mask_task) = RenderTask::new_mask(self.rect, clip_info) {
current_task.children.push(mask_task);
} else {
// The primitive has clip items but their intersection is empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd catch this earlier, and not even mark the primitive as visible in this case. But we can do that as a follow up (let's just add a TODO comment here about it).

@kvark
Copy link
Member Author

kvark commented Nov 15, 2016

@glennw I checked the servo page and haven't found anything outstanding in terms of what the implementation is doing. There is quite a large of an area associated with the clips (~80% screen space), so that's where the slowdown can come from. Frankly, I do not observe the "going off the cliff" effect, the FPS drops from 55 to 51 when scrolled to the middle of the document.

Given that the rounded corner radius is typically about 2-3px, up to 10, and the rectangles are pretty big, I can think of a way to drastically reduce the time spent on clip generation:

  1. make sure the target is cleared to 1.0 (opaque). This may require a separate target from the general RT cache since that one is cleared (intentionally) to 0.0.
  2. draw only 4 corners instead of the whole thing.

I'm going to rebase and resolve the last concerns.

@kvark kvark force-pushed the clip_stack branch 2 times, most recently from 9f1255e to 2942ca4 Compare November 15, 2016 19:31
…tiple rounded cornered rectangles.

Removal of the _clip shader variants.
@glennw
Copy link
Member

glennw commented Nov 16, 2016

@bors-servo r+

@bors-servo
Copy link
Contributor

📌 Commit 3967c7d has been approved by glennw

@bors-servo
Copy link
Contributor

⚡ Test exempted - status

@bors-servo bors-servo merged commit 3967c7d into servo:master Nov 16, 2016
bors-servo pushed a commit that referenced this pull request Nov 16, 2016
Off-screen clip mask generation

This is the second major step towards #498
Edit: now actually includes the third step as well (removal of `*_clip` shaders).

The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new `ClipBatcher`.

It supports everything that we currently support, plus the actual handling of arbitrary number of clips.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/556)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants