core: ensure good and average scores start exactly at control points #13559

brendankenny · 2022-01-11T22:10:30Z

(this description refers to scores between 0 and 100 because it makes talking about rounding and flooring easier, but everything is actually done on scores between 0 and 1)

Our raw log-normal scoring method takes audit values and generates percentile scores based on control points. e.g. TBT's p10 control point of 200 ensures that TBT values ≤ 200ms get a 90+ score and TBT values > 200ms get a score less than 90.

The problem outlined in #13316 is that audit.js then rounds the score prior to adding it to the audit result. A score less than 90 can be rounded up to 90 if it's 89.5 or above, so in practice the TBT p10 control point has actually been ~204.86ms.

The simplest fix is to just floor the score rather than rounding it. With the guaranteed score precision provided by #13392, we can be confident that Math.floor(percentile) means that a value is never given a score higher than a 50 or a 90 unless it's faster than the median or p10 control points.

Here is how the before/after rounding looks between a score of 20 and 30:

the black line is the raw value returned by statistics.getLogNormalScore
the blue line is the previous Math.round() clamping, where half the time the rounded score is above the raw score and half the time it's below
the red line is the new Math.floor() clamping, where the rounded score is always below the raw score. A rounded score of 90 can only have been from a 90+ raw score.

This does mean that the new rounding matches with the old rounding half the time, but the other half of the time it's 1 point lower. This is for individual audits/metrics; as discussed in #13316, we should expect about half of sites to lose 1 point in their final perf score and the other half to be unchanged.

However! There's a big issue with this: attainability of a perfect score. Previously a score of 99.5 would get rounded up to 100, so there was enough room there for some sites to get a perfect score with metric numbers that are attainable in the real world. Flooring the scores takes those sites to 99 instead.

A couple of different approaches are possible to correct this. One lightweight way is to apply a small correction to transition from the new floor clamping at 90 (where we need it to precisely match the control point) to the old rounding at 100 (where we want the old rounding up behavior). This spreads scores between 90 and 100 to a range of 90 to 100.5, so the concluding Math.floor() takes those top end scores to a 100.

Here is what flooring vs flooring with the correction looks like:

Below a score of 90, it's identical to just flooring. As it rises above 90 and approaches 100, it gets closer and closer to what the old Math.round() scoring looked like, sometimes under and sometimes over the raw score).

This might sound more complicated than it is. It's a linear ramp from 0 to 0.5 between scores of 90 to 100 ((score - 0.9)/0.1 * 0.005)

Previously any score over 99.5 would get rounded up to 100, now it's any score over ~99.524 will get rounded up. As a result, over 95% of sites that previously got a 100 on a particular audit will still get a perfect score. The analysis of the impact on the overall perf score depends a lot more on the distribution of site scores than the above estimate, but since it's strictly a score boost for some sites, we can at least expect less than half of sites to lose a point on the overall perf score.

connorjclark

LGTM

Not much to say other than, pretty graphs 😍

core: ensure good and average scores start exactly at control points

8cfd7a5

brendankenny requested a review from a team as a code owner January 11, 2022 22:10

brendankenny requested review from connorjclark and removed request for a team January 11, 2022 22:10

devtools-bot assigned connorjclark Jan 11, 2022

devtools-bot added the waiting4reviewer label Jan 11, 2022

keep things simple, defer to statistics.getLogNormalScore

815792c

vercel bot deployed to Preview January 11, 2022 22:16 View deployment

unit test update

81cd275

vercel bot deployed to Preview January 11, 2022 22:19 View deployment

connorjclark approved these changes Jan 11, 2022

View reviewed changes

brendankenny merged commit 767036f into master Jan 12, 2022

brendankenny deleted the round-score branch January 12, 2022 00:19

brendankenny mentioned this pull request Jan 12, 2022

change individual metric score rounding #13316

Closed

This was referenced Jun 23, 2022

[Snyk] Security upgrade lighthouse from 8.6.0 to 9.3.0 MaxMood96/lighthouse#79

Open

[Snyk] Security upgrade lighthouse from 8.6.0 to 9.3.0 MaxMood96/lighthouse#80

Open

[Snyk] Security upgrade lighthouse from 8.6.0 to 9.3.0 MaxMood96/lighthouse#82

Open

ryan-ally mentioned this pull request Jun 21, 2023

[Snyk] Security upgrade lighthouse from 7.5.0 to 9.3.0 ryan-ally/lighthouse#30

Open

MaxMood96 mentioned this pull request Jun 22, 2023

[Snyk] Security upgrade lighthouse from 8.6.0 to 9.3.0 MaxMood96/lighthouse#94

Open

This was referenced Jun 23, 2023

[Snyk] Security upgrade lighthouse from 7.5.0 to 9.3.0 ryan-ally/lighthouse#32

Open

[Snyk] Security upgrade lighthouse from 7.5.0 to 9.3.0 ryan-ally/lighthouse#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: ensure good and average scores start exactly at control points #13559

core: ensure good and average scores start exactly at control points #13559

brendankenny commented Jan 11, 2022 •

edited

Loading

connorjclark left a comment •

edited

Loading

core: ensure good and average scores start exactly at control points #13559

core: ensure good and average scores start exactly at control points #13559

Conversation

brendankenny commented Jan 11, 2022 • edited Loading

connorjclark left a comment • edited Loading

Choose a reason for hiding this comment

brendankenny commented Jan 11, 2022 •

edited

Loading

connorjclark left a comment •

edited

Loading