Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(lantern): Remove min task duration from lantern graphs #9910

Merged
merged 24 commits into from
Nov 25, 2019

Conversation

warrengm
Copy link
Contributor

@warrengm warrengm commented Nov 1, 2019

This PR removes the minimum task duration from the page dependency graph in order to retain necessary edges. For example, this fixes at least two cases of missing paths in the lantern graph:

  1. Script A creates an iframe that issues request B
  2. An XHR load event handler of request A fires request B

In both cases the Lantern graph lacks the A -> B whenever the intermediate CPU nodes are fast (< 10 ms). We have observed wrong simulation results due to these issues in the Publisher Ads Audits plugin.

I suspect this PR may reduce some variance in cases where sites have some tasks that waffle around the 10 ms threshold (i.e. a particular task may non-deterministically included in the graph).

To mitigate run time, I minimize the graph size by pruning short tasks while inserting edges to retain paths between dependencies and dependents. For example, a -> b -> c becomes a -> c if b is a short task.

@warrengm warrengm changed the title (misc) Remove min task duration from page graph in lantern type(misc) Remove min task duration from page graph in lantern Nov 1, 2019
@warrengm warrengm changed the title type(misc) Remove min task duration from page graph in lantern type(misc) Remove min task duration from langtern page graphs Nov 1, 2019
@warrengm warrengm changed the title type(misc) Remove min task duration from langtern page graphs type(misc): Remove min task duration from lantern graphs Nov 1, 2019
@brendankenny
Copy link
Member

Feedback would be appreciated on whether I'm missing any important considerations.

we tried this somewhere recently and it slowed down Lighthouse execution time significantly if we dropped the minimum completely. I can't find where that was though. @patrickhulce, do you remember?

@patrickhulce
Copy link
Collaborator

ya some friends running LH tried this and it doubled total runtime, there are a lot of tasks under 10ms :/ #9627 (comment)

I think we do need to start including some tasks under our threshold, but we need to be more selective about it. we've already identified that the first paint/layout/parsehtml events should be saved regardless of length (#9627 (comment)), are there other specific patterns of tasks that we could be saving without removing the optimization entirely @warrengm that would help you out? :)

@patrickhulce
Copy link
Collaborator

The other alternative here is to try to radically improve the runtime of simulation with large numbers of tasks. There's likely some low hanging fruit we could tackle to improve things and lower the threshold but I doubt there's enough to let us include all tasks, we run quite a few simulations per run.

@connorjclark
Copy link
Collaborator

we run quite a few simulations per run.

can we look into parallelizing simulation? wasm / workers?

@patrickhulce
Copy link
Collaborator

can we look into parallelizing simulation? wasm / workers?

doing a Promise.all on audit computation and making the simulation somehow async off the main thread would certainly solve that problem, but the complexity of introducing workers to our flow seems like an order of magnitude above being slightly more careful here right now :)

certainly an interesting long-term solution to explore though as we increase the amount of analysis with js parsing and other heavy handed audit tasks

@warrengm warrengm changed the title type(misc): Remove min task duration from lantern graphs misc(lantern): Remove min task duration from lantern graphs Nov 4, 2019
@warrengm
Copy link
Contributor Author

warrengm commented Nov 4, 2019

I'll look into this a bit more. I think it should be relatively easy to determine which tasks should be kept by checking which tasks end up in the unfiltered lantern graph between network nodes. But I don't have a good sense on how much that will reduce the size. Hopefully we can do a simple filter on task name without having more complicated heuristics.

I briefly considered adding some external controls for callers. For example LCP simulations probably don't need the entire graph, but I'm not sure that's worth the complexity.

Any algorithm optimizations SGTM but I'm not so familiar with the LH codebase so I'll leave that to you all

@warrengm
Copy link
Contributor Author

warrengm commented Nov 4, 2019

@patrickhulce What are the main factors into lantern speed? For example, does the number of children in a CPUNode matter? Or is the runtime dominate purely by the number of edges and nodes in the graph?

I think most of the events we care about are children of a RunTask event node but I think we could potentially prune the children if that would have a positive impact on runtime.

@patrickhulce
Copy link
Collaborator

patrickhulce commented Nov 4, 2019

What are the main factors into lantern speed?

Largely dominated by the number of nodes in the graph. A CPUNode is created for every toplevel main thread task above the threshold, so by keeping that threshold high we limit the number of nodes in the graph that can slow things down.

For example, does the number of children in a CPUNode matter?

Only a marginal effect. It's used when cloning and for determining a multiplier but it's not nearly as dominant as node count.

I think most of the events we care about are children of a RunTask event node but I think we could potentially prune the children if that would have a positive impact on runtime.

Based on this and your original problem statement it sounds like we basically only want to look at these nodes for the purposes of creating relationships between other network nodes. I definitely see how this could meaningful improve accuracy so how about this....

  • Remove the threshold for CPU tasks on initial graph creation to build out the edges.
  • Before returning the created graph, do a pass on the graph to filter out small CPU nodes but preserve the dependencies that node would have created between other nodes.

i.e.

NetworkNode (/xhr-1.json) <-- CPUNode (XHR Callback, 2ms) <-- NetworkNode (/xhr-2.json)

becomes

NetworkNode (/xhr-1.json) <-- NetworkNode (/xhr-2.json)

instead of the situation today

NetworkNode (/xhr-1.json)
NetworkNode (/xhr-2.json)

@brendankenny brendankenny mentioned this pull request Nov 5, 2019
54 tasks
@warrengm
Copy link
Contributor Author

warrengm commented Nov 7, 2019

Started working on (1). I made some changes so the version you looked at is slightly out of date now, but MLB and vine are still two of the most affected sites.

  • For MLB (which is now more accurate), most of the shift in TTFCPUI seems to come from new TimerInstall and TimerFire nodes.
  • For vine, the regression in TTI are due some new edges between network nodes, since we're now no longer ignoring some XHRReadyStateChange events
    • This pushes back the last long task (from a ResourceReceivedData event) due to extra dependencies. This seems like correct behavior even though it's less accurate.

From what I can tell, the new short nodes are always pruned in my current implementation so the effect seems to be purely from introducing new edges.

@warrengm warrengm changed the title misc(lantern): Remove min task duration from lantern graphs core(lantern): Remove min task duration from lantern graphs Nov 7, 2019
@warrengm
Copy link
Contributor Author

warrengm commented Nov 7, 2019

I did some analysis by gradually introducing the new edges and got the following. My initial reaction is that the results seem acceptable. The biggest regression (in % terms) appears to be Vine, but the new paths between network nodes and ResourceReceivedData CPU nodes appear reasonable.

Adding short timer events

Metric                    p50 (% Error)        p90 (% Error)        p95 (% Error)       
roughEstimateOfFCP        16.2% ↔  0.0%        47.6% ↔  0.0%        61.0% ↔  0.0%       
roughEstimateOfFMP        16.4% ↔  0.0%        49.7% ↔  0.0%        57.8% ↔  0.0%       
roughEstimateOfTTFCPUI    22.2% ↔  0.0%        54.5% ↔  0.0%        75.0% ↔  0.0%       
roughEstimateOfTTI        23.6% ↔  0.0%        57.0% ↔  0.0%        107.1% ↔  0.0%      
roughEstimateOfSI         34.9% ↔  0.0%        75.0% ↔  0.0%        83.1% ↔  0.0%       
 
 ------- Fixes Summary -------
TTFCPUI on http://www.rakuten.ne.jp/ - 21867 (real) 14376 (cur) 13257 (prev)
TTFCPUI on https://weather.com/ - 10046 (real) 22804 (cur) 23104 (prev)
TTI on http://www.rakuten.ne.jp/ - 32059 (real) 23769 (cur) 23108 (prev)
TTFCPUI on http://www.metrolyrics.com/ - 27321 (real) 39256 (cur) 39593 (prev)
 
 ------- Regression Summary -------
TTI on http://www.skype.com/ - 7108 (real) 7654 (cur) 7554 (prev)
TTFCPUI on http://www.dawn.com/ - 8296 (real) 13859 (cur) 13751 (prev)
TTI on http://www.dawn.com/ - 12021 (real) 16330 (cur) 16175 (prev)
TTI on http://www.ynet.com/ - 4546 (real) 8153 (cur) 8096 (prev)
 
 ------- Bucket Summary -------
Good:      243   ↔  0
OK:        180   ↔  0
Bad:       71    ↔  0
 
 ------- % Error Summary -------
p50:       20.6% ↔  0.0%
p90:       56.1% ↔  0.0%
p95:       78.1% ↔  0.0%
 
 
Comparing to master computed values...
❌  FAIL    16 change(s) between expected and computed!

Adding short Timer, XHR, and ResourceSendRequest events

Metric                    p50 (% Error)        p90 (% Error)        p95 (% Error)       
roughEstimateOfFCP        16.2% ↔  0.0%        47.6% ↔  0.0%        61.0% ↔  0.0%       
roughEstimateOfFMP        16.4% ↔  0.0%        49.7% ↔  0.0%        57.8% ↔  0.0%       
roughEstimateOfTTFCPUI    21.3% ↓  0.9%        53.5% ↓  1.0%        75.0% ↔  0.0%       
roughEstimateOfTTI        23.6% ↔  0.0%        57.0% ↔  0.0%        107.1% ↔  0.0%      
roughEstimateOfSI         34.9% ↔  0.0%        75.0% ↔  0.0%        83.1% ↔  0.0%       
 
 ------- Fixes Summary -------
TTFCPUI on http://www.mlb.com/ - 23530 (real) 26390 (cur) 8141 (prev)
TTFCPUI on http://www.rakuten.ne.jp/ - 21867 (real) 14376 (cur) 13257 (prev)
TTFCPUI on https://vine.co/ - 7685 (real) 8276 (cur) 6822 (prev)
TTFCPUI on http://www.t-online.de/ - 20325 (real) 23033 (cur) 23649 (prev)
 
 ------- Regression Summary -------
TTI on https://vine.co/ - 7685 (real) 8572 (cur) 7846 (prev)
TTFCPUI on http://www.verizonwireless.com/ - 12468 (real) 14246 (cur) 13945 (prev)
TTI on http://www.skype.com/ - 7108 (real) 7691 (cur) 7554 (prev)
TTI on http://www.orange.fr/ - 14159 (real) 15495 (cur) 15250 (prev)
 
 ------- Bucket Summary -------
Good:      244   ↔  1
OK:        180   ↔  0
Bad:       70    ↓  1
 
 ------- % Error Summary -------
p50:       20.4% ↓  0.2%
p90:       55.4% ↓  0.6%
p95:       78.1% ↔  0.0%

Comparing to master computed values...
❌  FAIL    28 change(s) between expected and computed!

Adding in all new edges

Metric                    p50 (% Error)        p90 (% Error)        p95 (% Error)       
roughEstimateOfFCP        16.2% ↔  0.0%        47.6% ↔  0.0%        61.0% ↔  0.0%       
roughEstimateOfFMP        16.2% ↓  0.2%        49.7% ↔  0.0%        57.8% ↔  0.0%       
roughEstimateOfTTFCPUI    21.3% ↓  0.9%        53.5% ↓  1.0%        75.0% ↔  0.0%       
roughEstimateOfTTI        23.6% ↔  0.0%        57.0% ↔  0.0%        107.2% ↔  0.1%      
roughEstimateOfSI         34.9% ↔  0.0%        75.0% ↔  0.0%        83.1% ↔  0.0%       
 
 ------- Fixes Summary -------
TTFCPUI on http://www.mlb.com/ - 23530 (real) 26178 (cur) 8141 (prev)
FMP on http://www.metacafe.com/ - 1665 (real) 1849 (cur) 2048 (prev)
TTFCPUI on http://www.rakuten.ne.jp/ - 21867 (real) 14376 (cur) 13257 (prev)
TTFCPUI on https://vine.co/ - 7685 (real) 8276 (cur) 6822 (prev)
 
 ------- Regression Summary -------
TTI on https://vine.co/ - 7685 (real) 8572 (cur) 7846 (prev)
TTFCPUI on http://www.verizonwireless.com/ - 12468 (real) 14246 (cur) 13945 (prev)
TTI on http://www.skype.com/ - 7108 (real) 7691 (cur) 7554 (prev)
TTI on http://www.orange.fr/ - 14159 (real) 15495 (cur) 15250 (prev)
 
 ------- Bucket Summary -------
Good:      245   ↔  2
OK:        179   ↔  1
Bad:       70    ↓  1
 
 ------- % Error Summary -------
p50:       20.4% ↓  0.2%
p90:       55.4% ↓  0.6%
p95:       78.1% ↔  0.0%
 
 
Comparing to master computed values...
❌  FAIL    31 change(s) between expected and computed!

@patrickhulce
Copy link
Collaborator

Awesome job @warrengm! This is much much better accuracy-wise was it just the move to the second pass that improved things since I took a look?

I'd say we can upgrade this from a draft then, get some tests in, split off the Set changes, and re-analyze the run time to see if it's workable without major algo changes 👍

@warrengm
Copy link
Contributor Author

I've been busy with some other work but I've started working on unit tests now. In the meantime, here is the result of some performance testing in aggregate. There don't seem to be any major performance issue.

Running on all assets

Master:

multitime -q -n 40 node lighthouse-core/scripts/lantern/run-on-all-assets.js 
===> multitime results
1: -q node lighthouse-core/scripts/lantern/run-on-all-assets.js
            Mean        Std.Dev.    Min         Median      Max
real        77.964      2.622       73.685      77.929      83.838      
user        102.721     1.762       99.885      102.474     106.364     
sys         8.601       0.201       8.033       8.602       9.174   

My branch:

multitime -q -n 40 node lighthouse-core/scripts/lantern/run-on-all-assets.js 
===> multitime results
1: -q node lighthouse-core/scripts/lantern/run-on-all-assets.js
            Mean        Std.Dev.    Min         Median      Max
real        75.776      1.630       73.086      75.526      81.881      
user        101.777     1.132       99.776      101.645     105.934     
sys         8.665       0.215       8.226       8.660       9.254  

mlb.com

Master

===> multitime results
            Mean        Std.Dev.    Min         Median      Max
real        1.028       0.056       0.922       1.038       1.123       
user        1.354       0.053       1.259       1.344       1.446       
sys         0.104       0.023       0.054       0.101       0.16

My branch

===> multitime results
            Mean        Std.Dev.    Min         Median      Max
real        1.008       0.060       0.882       1.016       1.132       
user        1.359       0.051       1.260       1.363       1.479       
sys         0.095       0.019       0.055       0.095       0.141 

weather.com

Master

            Mean        Std.Dev.    Min         Median      Max
real        1.163       0.103       0.996       1.169       1.500       
user        1.515       0.087       1.351       1.523       1.716       
sys         0.100       0.015       0.073       0.096       0.136   

My branch

            Mean        Std.Dev.    Min         Median      Max
real        1.134       0.077       0.998       1.132       1.316       
user        1.517       0.067       1.367       1.512       1.665       
sys         0.097       0.018       0.053       0.096       0.152      

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great news and awesome job @warrengm! this tentatively LGTM!

let's undraft it and let it spin on travis for the accuracy+timing diff there too

package.json Outdated Show resolved Hide resolved
package.json Outdated
@@ -134,6 +134,7 @@
"configstore": "^3.1.1",
"cssstyle": "1.2.1",
"details-element-polyfill": "^2.4.0",
"global": "^4.4.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

@patrickhulce patrickhulce marked this pull request as ready for review November 18, 2019 20:17
@vercel vercel bot temporarily deployed to staging November 18, 2019 20:22 Inactive
Co-Authored-By: Patrick Hulce <patrick.hulce@gmail.com>
Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the huge contribution here @warrengm! Your patience throughout has been very much appreciated too :D

Everything looks great locally for me! 🎉

@patrickhulce patrickhulce merged commit 6c44281 into GoogleChrome:master Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants