-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Off-screen clip mask generation #556
Conversation
@@ -314,7 +314,7 @@ dependencies = [ | |||
|
|||
[[package]] | |||
name = "ipc-channel" | |||
version = "0.6.0" | |||
version = "0.5.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unfortunate this is downgraded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually not me, it's Glenn - bf3ac6f
not sure how it got in my PR though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bug in ipc-channel 0.6 preventing us from landing the update in Servo, so we had to revert the update in WR (for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.
// ResourceCache allocates/load the actual data | ||
// will be simplified after the TextureCache upgrade | ||
pub image: Option<ImageMask>, | ||
pub device_rect: DeviceRect, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem that I see with this change is the computation of device_rect
. Since layers are supposed to move, this rectangle needs to be re-computed, but there is not tracking of the layers at this point. Is there an example where the layers state is being cached and updated upon any movement? Alternatively, I can re-compute the rectangle every frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this get re-computed every frame anyway, since the generate() call is done during cull_layers()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate() is currently only done if metadata.clip_cache_info.is_none()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kvark Not a full review yet, just some nits from a quick scan. I'll take a more in-depth look today. The general idea seems great though :)
@@ -314,7 +314,7 @@ dependencies = [ | |||
|
|||
[[package]] | |||
name = "ipc-channel" | |||
version = "0.6.0" | |||
version = "0.5.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The revert to 0.5.1 is landed in WR master, so if you rebase all the changes related to ipc-channel should disappear.
@@ -18,7 +18,7 @@ byteorder = "0.5" | |||
euclid = "0.10" | |||
fnv="1.0" | |||
gleam = "0.2" | |||
ipc-channel = "0.6" | |||
ipc-channel = "0.5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
int render_task_index; | ||
int layer_index; | ||
int data_index; | ||
int pad; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pad field can be removed since this is a VS only structure.
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */ | ||
|
||
void main(void) { | ||
oFragColor = vec4(1, 1, 1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 -> 1.0 for strict GLESv3 compilers.
#[derive(Debug, Copy, Clone, Eq, PartialEq, Hash)] | ||
pub struct ClipAddressRange { | ||
pub start: GpuStoreAddress, // start GPU address | ||
pub count: u32, // number of items, not bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename to item_count?
@@ -36,7 +36,7 @@ pub struct RenderBackend { | |||
next_namespace_id: IdNamespace, | |||
|
|||
resource_cache: ResourceCache, | |||
dummy_resources: DummyResources, | |||
_dummy_resources: DummyResources, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
&projection); | ||
} | ||
|
||
// Draw the clip items into the tiled alpha mask. | ||
if true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, ok :)
@@ -567,11 +645,14 @@ struct CompileTileContext<'a> { | |||
struct RenderTargetContext<'a> { | |||
layer_store: &'a [StackingContext], | |||
prim_store: &'a PrimitiveStore, | |||
resource_cache: &'a ResourceCache, | |||
_clip_stack: &'a ClipRegionStack, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed now
@kvark What was the reason for having to cover the entire tile with each clip primitive again? I think we'll probably need to find a solution that doesn't require this - I think the performance overhead will probably be too high otherwise. I'll pull the branch today and do some performance tests though to check that. |
The name became slightly misleading. The clip tile is not the tile a clip belongs to, it's actually just representing the screen space boundary of thr intersection between all clip instances, so the performance is not expected to suffer.
|
return cci; | ||
} | ||
|
||
// The transformed vertex function that always covers the whole tile with the primitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's specify that the whole tile means the intersection of the clip instances here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworked the whole tile confusion, at last
Doing some testing with this today, and ran into something which I assume is a bug - unless I'm misunderstanding something: I have a test case that contains a single div, with the follow style:
With some logging to show what the renderer is submitting for clip cache draws, I see:
Which suggests that it's issuing two clear draw calls for the cache render, and also a clear on the main render target. I was expecting to see 1 clear + 1 rect in the render target, and no clip items in the main frame buffer target. (I ensured the div is positioned completely inside one tile, just to rule out any bugs related to crossing tile boundaries). |
Ah, those extra clears are related to the dummy mask item - I guess that always gets created at the moment, even if not needed? |
Yeah, and it's also incomplete/buggy. I've been looking into fixing it, hopefully done by tomorrow. Note: it should be just one draw, I'll have a closer look tomorrow about the rest.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks promising! Added a few comments, feel free to reply here or perhaps they might be easier to discuss on IRC / Vidyo during the week.
@@ -49,7 +52,6 @@ uniform sampler2DArray sCache; | |||
uniform sampler2D sLayers; | |||
uniform sampler2D sRenderTasks; | |||
uniform sampler2D sPrimGeometry; | |||
uniform sampler2D sClips; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also remove Clips from the TextureSampler enum, and associated code.
|
||
float do_clip() { | ||
// anything outside of the mask is considered transparent | ||
bvec4 inside = lessThanEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate to add the cost of this conditional to every shader - is it actually needed? Shouldn't we always be able to just sample from the clip mask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd have to clamp to UV borders, and even then - we don't currently guarantee that those borders contain opaque data. So - possible, but non-trivial.
I don't see it as a big deal though, just a few comparisons with masking the fetch result.
vec2 final_pos = tile.screen_origin_task_origin.zw + | ||
tile.size_target_index.xy * aPosition.xy; | ||
|
||
gl_Position = uTransform * vec4(final_pos, 0, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use float constants
vec2 source_uv = clamped_mask_uv * vClipMaskUvRect.zw + vClipMaskUvRect.xy; | ||
float clip_alpha = texture(sMask, source_uv).r; //careful: texture has type A8 | ||
|
||
oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use float constants
|
||
float clip_alpha = rounded_rect(local_pos); | ||
|
||
oFragColor = vec4(1, 1, 1, min(alpha, clip_alpha)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use float constants
@@ -11,6 +11,7 @@ void main(void) { | |||
vec2 local_pos = vPos; | |||
#endif | |||
|
|||
alpha = min(alpha, do_clip(local_pos)); | |||
//alpha = min(alpha, do_clip(local_pos)); | |||
alpha = min(alpha, do_clip()); | |||
oFragColor = vColor * vec4(1, 1, 1, alpha); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use float constants
} | ||
|
||
pub struct ClipRegionStack { | ||
layers: HashMap<StackingContextIndex, LayerInfo>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if we could get rid of this layers hash map - the LayerInfo struct doesn't really seem to store anything that's not already stored in the main Layer Store. Could we instead just pass in a borrow of the Layer Store when it's needed in generate(), including the current layer id? That way, we guarantee that there is one source of truth for the stacking context transform, that includes scroll offsets etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
pub struct ClipRegionStack { | ||
layers: HashMap<StackingContextIndex, LayerInfo>, | ||
image_masks: Vec<ImageMask>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image_masks seems to be unused - remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -5,7 +5,7 @@ | |||
use renderer::MAX_VERTEX_TEXTURE_WIDTH; | |||
use std::mem; | |||
|
|||
#[derive(Debug, Copy, Clone)] | |||
#[derive(Debug, Copy, Clone, Eq, Hash, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure about this - maybe it's necessary but it just doesn't seem quite right. Perhaps in the cache key we could store the ImageKey instead of the actual GpuStoreAddress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may be able to store ImageKey
instead of the gpu address in the cache key, but there is also ClipAddressRange
in there that uses GpuStoreAddress
let image = match source { | ||
&PrimitiveClipSource::NoClip => None, | ||
&PrimitiveClipSource::Complex(rect, radius) => { | ||
let address = clip_store.alloc(CLIP_DATA_GPU_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we don't want to be calling alloc() on gpu stores during generate. The reason is that the gpu stores are not re-created during scrolling, so this will end up growing the gpu stores during scrolling. What most of the primitives do is a gpu store alloc when the prim is first added (if needed), and then just get a (mutable) reference to that address when they need to update that data. I think that would work OK in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this appears to be complicating things a bit (implementing now). The old generate
logic is going to be split into 3 parts, starting with the GPU store allocation (part 1), and proceeding to actually filling the GPU stores during the primitives generation, if needed (part 2), and followed by the update of the device rectangle (part 3 - again, if needed).
☔ The latest upstream changes (presumably #552) made this pull request unmergeable. Please resolve the merge conflicts. |
@glennw thank you for the fantastic review! |
Fixed the dummy task now some more and confirmed the mozilla test suite passes for servo. |
☔ The latest upstream changes (presumably #554) made this pull request unmergeable. Please resolve the merge conflicts. |
@kvark Thanks! I'll take another look at this today. I did some testing, and in general the performance seems quite good. A good page to do some testing with is https://github.com/servo/servo - the performance goes off a cliff as you scroll around halfway down the page. The GPU profiler shows that it is spending the majority of its time generating clip masks, so that would be worth investigating. There also seems to be some clipping artifacts when scrolling on some of the buttons. It's possible those slowdowns / artifacts are unrelated to this change, but I haven't noticed them previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of minor comments. Let's look into the GH performance issue too and find out what is causing that to be slow. Then we can either add a follow up task to fix that, or fix it now if simple. Then this should be ready to merge.
@@ -156,8 +157,13 @@ impl AlphaBatchHelpers for PrimitiveStore { | |||
PrimitiveKind::TextRun | | |||
PrimitiveKind::Image | | |||
PrimitiveKind::Gradient | | |||
PrimitiveKind::BoxShadow => true, | |||
|
|||
PrimitiveKind::BoxShadow => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this test outside the match statement, so it handles borders too.
if let Some(mask_task) = RenderTask::new_mask(self.rect, clip_info) { | ||
current_task.children.push(mask_task); | ||
} else { | ||
// The primitive has clip items but their intersection is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd catch this earlier, and not even mark the primitive as visible in this case. But we can do that as a follow up (let's just add a TODO comment here about it).
@glennw I checked the servo page and haven't found anything outstanding in terms of what the implementation is doing. There is quite a large of an area associated with the clips (~80% screen space), so that's where the slowdown can come from. Frankly, I do not observe the "going off the cliff" effect, the FPS drops from 55 to 51 when scrolled to the middle of the document. Given that the rounded corner radius is typically about 2-3px, up to 10, and the rectangles are pretty big, I can think of a way to drastically reduce the time spent on clip generation:
I'm going to rebase and resolve the last concerns. |
9f1255e
to
2942ca4
Compare
…tiple rounded cornered rectangles. Removal of the _clip shader variants.
@bors-servo r+ |
📌 Commit 3967c7d has been approved by |
⚡ Test exempted - status |
Off-screen clip mask generation This is the second major step towards #498 Edit: now actually includes the third step as well (removal of `*_clip` shaders). The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new `ClipBatcher`. It supports everything that we currently support, plus the actual handling of arbitrary number of clips. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/556) <!-- Reviewable:end -->
This is the second major step towards #498
Edit: now actually includes the third step as well (removal of
*_clip
shaders).The PR makes all the clip masks to be generated via the cached rendering tasks (of the new kind). These task work on the area of intersection between all the clip items. The resulting draw calls are being batched via the new
ClipBatcher
.It supports everything that we currently support, plus the actual handling of arbitrary number of clips.
This change is