Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove layer integral offset snapping #17712

Merged
merged 3 commits into from
Apr 16, 2020
Merged

Conversation

liyuqian
Copy link
Contributor

@liyuqian liyuqian commented Apr 14, 2020

This fixes flutter/flutter#53288 and flutter/flutter#41654. It removes the problematic GetIntegralTransCTM, but preserves the rect round-out in RasterCacheResult::draw for performance considerations: the average frame raster time doesn't change much but the worst frame raster time significantly regressed if rect round-out is removed. That's probably because a new shader needs to be compiled to draw raster cache with fractional offsets.

Correctness test

The fixing of flutter/flutter#41654 is verified by running the unit test in flutter/flutter#41654 (comment) against the locally built engine. That unit test will be added to the framework repo once this PR is rolled into the framework.

This PR may change golden images so manual engine and google rolls may be required.

Performance test

The performance impact of this PR is measured by A/B testing the following framework devicelab test:

../../bin/cache/dart-sdk/bin/dart bin/run.dart --ab=10 --local-engine=android_profile -t complex_layout_scroll_perf__timeline_summary

We tested 3 variants: (1) an engine without any change (2) an engine with this PR that removes integral snapping (3) an engine with both this PR, and additionally removes the rect round-out in RasterCacheResult::draw.

The average_frame_rasterizer_time_millis of those 3 variants are as follows:

  1. 4.98 (1.07%) 5.06 (1.60%) 0.98x
  2. 5.02 (1.30%) 5.14 (1.24%) 0.98x
  3. 4.99 (1.78%) 5.15 (0.98%) 0.97x

The worst_frame_rasterizer_time_millis of those 3 variants are as follows:

  1. 21.59 (27.66%) 20.12 (6.18%) 1.07x
  2. 20.47 (7.70%) 21.88 (11.62%) 0.94x
  3. 19.81 (10.79%) 29.56 (12.06%) 0.67x

Here are full A/B test results:

  1. an engine without any change
═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	2.44 (4.12%)	2.64 (3.44%)	0.93x	
worst_frame_build_time_millis	23.65 (5.20%)	25.50 (9.14%)	0.93x	
90th_percentile_frame_build_time_millis	4.14 (10.95%)	4.41 (8.15%)	0.94x	
99th_percentile_frame_build_time_millis	22.01 (3.96%)	23.25 (4.40%)	0.95x	
average_frame_rasterizer_time_millis	4.98 (1.07%)	5.06 (1.60%)	0.98x	
worst_frame_rasterizer_time_millis	21.59 (27.66%)	20.12 (6.18%)	1.07x	
90th_percentile_frame_rasterizer_time_millis	6.92 (5.69%)	7.01 (4.53%)	0.99x	
99th_percentile_frame_rasterizer_time_millis	12.85 (18.10%)	11.64 (5.30%)	1.10x	
average_vsync_transitions_missed	1.01 (4.23%)	1.04 (7.90%)	0.98x	
90th_percentile_vsync_transitions_missed	1.10 (27.27%)	1.20 (33.33%)	0.92x	
99th_percentile_vsync_transitions_missed	1.10 (27.27%)	1.20 (33.33%)	0.92x	
  1. an engine with this PR that removes integral snapping
═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	2.54 (3.23%)	2.69 (4.24%)	0.94x
worst_frame_build_time_millis	24.00 (8.37%)	26.29 (10.64%)	0.91x
90th_percentile_frame_build_time_millis	4.64 (10.39%)	4.68 (11.77%)	0.99x
99th_percentile_frame_build_time_millis	21.53 (2.73%)	23.02 (5.77%)	0.94x
average_frame_rasterizer_time_millis	5.02 (1.30%)	5.14 (1.24%)	0.98x
worst_frame_rasterizer_time_millis	20.47 (7.70%)	21.88 (11.62%)	0.94x
90th_percentile_frame_rasterizer_time_millis	6.92 (4.96%)	7.03 (5.31%)	0.98x
99th_percentile_frame_rasterizer_time_millis	13.52 (7.95%)	13.86 (8.76%)	0.98x
average_vsync_transitions_missed	1.04 (5.44%)	1.07 (5.08%)	0.97x
90th_percentile_vsync_transitions_missed	1.10 (27.27%)	1.10 (27.27%)	1.00x
99th_percentile_vsync_transitions_missed	1.30 (35.25%)	1.60 (30.62%)	0.81x
  1. an engine with both this PR, and additionally removes the rect round-out in RasterCacheResult::draw
═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	2.48 (2.54%)	2.72 (3.41%)	0.91x	
worst_frame_build_time_millis	23.87 (7.15%)	26.75 (6.55%)	0.89x	
90th_percentile_frame_build_time_millis	4.33 (10.48%)	4.60 (9.75%)	0.94x	
99th_percentile_frame_build_time_millis	21.71 (2.27%)	23.53 (4.65%)	0.92x	
average_frame_rasterizer_time_millis	4.99 (1.78%)	5.15 (0.98%)	0.97x	
worst_frame_rasterizer_time_millis	19.81 (10.79%)	29.56 (12.06%)	0.67x	
90th_percentile_frame_rasterizer_time_millis	6.97 (4.12%)	6.77 (4.08%)	1.03x	
99th_percentile_frame_rasterizer_time_millis	13.18 (9.38%)	13.96 (9.41%)	0.94x	
average_vsync_transitions_missed	1.00 (0.00%)	1.12 (9.21%)	0.90x	
90th_percentile_vsync_transitions_missed	1.00 (0.00%)	1.40 (34.99%)	0.71x	
99th_percentile_vsync_transitions_missed	1.00 (0.00%)	1.70 (26.96%)	0.59x	

@LongCatIsLooong
Copy link
Contributor

Thank you!

@liyuqian liyuqian merged commit 99f8d00 into flutter:master Apr 16, 2020
@@ -31,9 +31,6 @@ void ImageFilterLayer::Preroll(PrerollContext* context,
if (!context->has_platform_view && context->raster_cache &&
SkRect::Intersects(context->cull_rect, paint_bounds())) {
SkMatrix ctm = matrix;
#ifndef SUPPORT_FRACTIONAL_TRANSLATION
ctm = RasterCache::GetIntegralTransCTM(ctm);
#endif
context->raster_cache->Prepare(context, this, ctm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases like these (and I'm guessing there are a large number of them, the ctm local variable is unnecessary and should be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #17915

@@ -31,9 +31,6 @@ void ImageFilterLayer::Preroll(PrerollContext* context,
if (!context->has_platform_view && context->raster_cache &&
SkRect::Intersects(context->cull_rect, paint_bounds())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of these if statements have identical tests? Could this be added as a boilerplate method either in the base Layer or in the PrerollContext?

Copy link
Contributor Author

@liyuqian liyuqian Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context.leaf_nodes_canvas->setMatrix(RasterCache::GetIntegralTransCTM(
context.leaf_nodes_canvas->getTotalMatrix()));
#endif

if (context.raster_cache) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, is this all boilerplate for many layers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #17791

@@ -51,9 +51,6 @@ void OpacityLayer::Preroll(PrerollContext* context, const SkMatrix& matrix) {
if (!context->has_platform_view && context->raster_cache &&
SkRect::Intersects(context->cull_rect, paint_bounds())) {
SkMatrix ctm = child_matrix;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctm not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #17915

@@ -87,8 +79,7 @@ void OpacityLayer::Paint(PaintContext& context) const {
// Skia may clip the content with saveLayerBounds (although it's not a
// guaranteed clip). So we have to provide a big enough saveLayerBounds. To do
// so, we first remove the offset from paint bounds since it's already in the
// matrix. Then we round out the bounds because of our
// RasterCache::GetIntegralTransCTM optimization.
// matrix. Then we round out the bounds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we introduced the offset into the CTM prematurely above? If we hadn't already added it, we wouldn't have to adjust for it here...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(aka line 70 in this method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the offset in ctm to draw the raster cache correctly. Meanwhile, the offset is subtracted from the paint_bounds(). We have to add the offset to paint_bounds() because it will be used by its parent layer.

@@ -26,9 +26,6 @@ void PictureLayer::Preroll(PrerollContext* context, const SkMatrix& matrix) {

SkMatrix ctm = matrix;
ctm.postTranslate(offset_.x(), offset_.y());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a counter-example for getting rid of the ctm local (that doesn't mean we shouldn't get rid of it in other methods where it really is superficial).

Is there something different about the way that picture_layer handles the offset that might benefit the other uses of the raster cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference seems to be that OpacityLayer is using MutatorsStack to track the offset due to possible PlatformView children.

@liyuqian
Copy link
Contributor Author

Thanks @flar ! Will address your comments in a future PR!

@aam
Copy link
Member

aam commented Apr 17, 2020

This seems to have broken flutter engine ios bot https://ci.chromium.org/p/flutter/builders/prod/Mac%20iOS%20Engine/4827

liyuqian added a commit that referenced this pull request Apr 17, 2020
liyuqian added a commit that referenced this pull request Apr 17, 2020
This reverts commit 99f8d00.

I found some problems. Will revise and reland later, and put more details about the problems in the new PR.

TBR: @chinmaygarde @flar
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Apr 17, 2020
liyuqian added a commit that referenced this pull request May 1, 2020
This reverts commit b5aedb3 and relands #17712.

Fixes flutter/flutter#53288 and flutter/flutter#41654.

Together with #17791, this reland addresses some of Jim's concerns in the original PR #17712.

The major part of this PR is still the same as the original PR, and the performance / golden image impacts should be the same.
@liyuqian liyuqian deleted the no_snapping branch May 8, 2020 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raster cache's integral translation snapping is broken
6 participants